Do the FAIR Data principles cover all data ethics values?
A hot topic of discussion is the way how data must be stored securely and open at the same time. Especially regarding research data, it is important to safeguard the scientific quality of research by being transparent in data usage. In governmental, business, and other private sectors as well, data maintenance is of great importance. What could happen when certain data or artificial intelligence is misused due to irresponsible regulation?
In 2016, Dr. Wilkinson and colleagues introduced the FAIR Data Principles, which are supposed to help researchers and other data scientists to manage their data rightly. ‘FAIR’ data should not be intertwined with ‘open’ data, where data is shared openly for anyone to access without any permissions and allows it to be used for any purpose. However, The FAIR principles are a guideline to share data with keeping ethical and legal restrictions in mind as well.
The FAIR Data Principles are mainly constructed to keep up with the world’s fast digitalization and are essential to understand the societal impacts and challenges that come along with it.
FAIR content & purpose
FAIR data stands for Findable, Accessible, Interoperable, and Reusable data. These principles are believed to enhance scientific accuracy, reliability, and integrity.
Findable: ‘Easy to find by both humans and computer systems and based on the mandatory description of the metadata that allows the discovery of interesting datasets.’
Accessible: ‘Stored for the long term such that they can be easily accessed and/or downloaded with well-defined license and access conditions.’
Interoperable: ‘Ready to be combined with other datasets by humans as well as computer systems.’
Reusable: ‘Ready to be used for future research and to be processed further using computational methods.’
FACT content & purpose
On top of the FAIR principles, the FACT principles have been formulated by the Responsible Data Science organization. This organization’s mission is to tackle four main challenges in the world of Big Data. They want to safeguard fairness, increase accuracy, ensure confidentiality and provide transparency.
Fairness: ‘Unfair conclusions are avoided even if they are correctly computed from available data and models.’
Accuracy: ‘Computed answers are given with a guaranteed level of accuracy as so to avoid misleading conclusions.’
Confidentiality: ‘Results are achieved in a safe and controlled manner without revealing secret (private, company) information.’
Transparency: ‘Computed answers can be understood and clarified such that they become indisputable and hence trustworthy.’
Implementations and examples
Besides the examples given by Dr. Wilkinson and colleagues themselves, I was wondering to what extent the FAIR principles live among the data science community. To get an unbiased perspective of the added value, I have been doing some research myself.
According to Jacobsen et al. (2020), the introduction of the FAIR principles also carries room for misinterpretation. Inconsistent interpretations of the principles could lead to incompatible implementations. The writers argue that the FAIR principles should be accompanied by certain considerations, which they provide for us. One of the examples that they give is the ‘machine actionability’ of biomedicine data. With AI in biomedicine, several legal and ethical challenges may rise up, such as misuse of data, incomplete or selective data, or no clear rules on consent for data usage. The FAIR principles, therefore, require that machines should be able to make optimal use of data resources. AKA, the data should be ‘AI-ready’. Jacobsen et al. argue whether this requirement is explicitly retrievable from the four FAIR principles. This is why they conclude that additional considerations are crucial for preventing misinterpretations.
“FAIR requires that the machine knows what we mean.” — Jacobsen et al. (2020)
In line with data ethics?
Now we know that the FAIR principles are lacking some completeness, the question arises to what extent these principles are in line with data ethics?
“Data ethics is a social movement, a cultural shift and a technological and legal development that increasingly places the human at the centre”.
— Hasselbach and Tranberg (2016)
Data ethics has four core values: social, legal, technological, and human-centric. In this case, we are mostly interested in the legal and technological perspectives of data ethics, which are aiming for more strict data protection frameworks and new types of data-driven businesses. To be more precise, we want to look at the tech values of data ethics. Then we can compare those with the FAIR principles to see whether the tech values are encompassed.
Multiple ethical values seem to be missing in the FAIR and FACT data principles. First of all, privacy & security are not well-mentioned. One could argue that confidentiality is covering these aspects, but this is not the same as privacy. The difference is subtle though, confidentiality refers to any information, while privacy refers to an individual’s data. As mentioned before, this could lead to misunderstandings.
Secondly, inclusiveness is missing. By ensuring inclusive data, bias and discrimination can be avoided. I understand that it does not relate to the main goal of implementing the FAIR principles, but it is still an important value to consider.
The third gap that I found, is accountability. Who will be held responsible when data is misused? The owner of the data, or the user? This is not included at all, while it is a very important issue in data sharing.
So…
The FAIR Data Principles are now shown to not cover all important data ethics values completely. Instead of blindly integrating the FAIR principles, data scientists and companies should consider additional data ethics tech values to compromise the gaps in the FAIR data principles. Though, I believe that the intentions for better data ethics implementations are definitely there and growing, so let’s keep that up!