The FAIR (Findable, Accessible, Interoperable, and Reusable) principles,represent a fundamental guiding framework for the management of scientific data in the digital age. These principles, created in 2016 as a result of a collaborative effort of a multidisciplinary community of scientists and experts, have become a pillar for improving the effectiveness and usability of research data.

 

FINDABILITY – The first step is finding the data. This principle focuses on the ability to easily locate data through metadata and unique identifiers, both for humans and computers. Machine-readable metadata is essential for automatically discovering datasets and services and is a pivotal component of the FAIRification process.

ACCESSIBILITY – Once the data needed for a particular purpose have been identified, it is essential to know how they can actually be accessed, including any authentication and authorization requirements. This principle focuses on the effective availability of data for anyone who needs it, guaranteeing clear and documented access methods.

INTEROPERABILITY – Oftentimes, data must be integrated and able to interact with applications or workflows for analysis, storage and processing. Interoperability requires that data are structured in a consistent way, following shared standards, and that relationships between data are clearly defined, facilitating integration and analysis.

REUSABILITY – The ultimate goal of the FAIR principles is to optimize the reuse of data. To achieve this, both the metadata and the data itself should be well described, allowing replication and/or combination across different contexts.

 

The adaptation of a data source to the FAIR principles is called FAIRificazione.This process is codified in 3 fundamental steps:

  1. A semantic model is defined for the source dataset [ in a format that is machine-readable (exploiting, where possible, already existing models for managing use cases of interest).
  2. Both data and metadata are made linkable to increase interoperability, leveraging the previously built semantic model.
  3. Human- and machine-readable interfaces are created to deploy FAIR data resources.

 

The process thus described is illustrated in fig.1

Fig.1: workflow of the FAIRification process

 

An objective and repeatable assessment of FAIRness, useful both for an initial assessment and for an estimate of the result of the FAIRification, can be obtained by attributing a score to the examined source based on compliance with uniquely codified requirements, the so-called maturity indicators. These criteria are drawn up by a special working group of the Research Data Alliance (RDA) community [https://www.rd-alliance.org/], an international initiative born in 2013 from the collaboration between the European Commission, US government bodies such as the National Science Foundation and the National Institute of Standards and Technology, and the Australian Department of Innovation, with the aim of promoting knowledge sharing and data-driven research.

A schematization of the maturity indicators is visible in fig.2.

Fig.2: FAIR maturity indicators

 

Technologies like OMOP/OHDSI contribute to data FAIRness. For example:

FINDABILITY (F4) – Data and metadata can be found using web-based search engines.
Among the tools made available by OHDSI there are Athena, an online search engine that allows users to query the CDM by navigating its standard vocabularies, Usagi, an application that can generate a tentative automated mapping of the sources, and Atlas, a software tool that facilitates the execution of analyses on OMOP-harmonized data sources, allowing the definition of meaningful concept sets and the creation of cohorts of interest.

ACCESSIBILITY (A1) – Data and metadata are retrievable by their identifier using a standardized resolution protocol.
OHDSI WebAPI is a robust and standardized tool for interacting with the CDM and navigating the database, mainly, but not exclusively, for analytical purposes, also taking advantage of appropriate R packages made available for the purpose.

INTEROPERABILITY (I2A) – Data and metadata use vocabularies/ontologies that are FAIR.
OMOP brings together and connects dozens of validated, standardized and widely used international terminology systems for representing data.

REUSABILITY (R1) – Data and metadata are richly described with a plurality of accurate and relevant attributes.

 

The terminology system of the OMOP CDM, a true “super-ontology” which, as already mentioned, is based on standard vocabularies, constitutes a metadata level that is very detailed (fig.3) and at the same time optimized to be queried with tools such as Atlas and Athena.

 

Fig.3: the concept “Superficial biopsy of muscle” as it appears in Athena