The use of real-world data (RWD) represents one of the most powerful tools for assessing the transferability of scientific evidence into clinical practice today. In recent years, several tools and networks have been developed with the aim of harmonizing collected clinical data for research purposes.

Today, the development and dissemination of these tools (e.g., I2b2 – Informatics for Integrating Biology & the Bedside) that allow for the acquisition and harmonization of information from electronic health records (diagnoses, therapies, procedures, etc.) open the prospect of creating automatic systems and extensive networks capable of gathering large amounts of information from various sources.

The Italian Society of Rheumatology (SIR), with Project REWIND, aims to provide preliminary data on the feasibility of creating a network for monitoring the impact of rheumatic diseases on healthcare and the transferability of new healthcare interventions into clinical practice, whether pharmacological or organizational.

In the centers involved in the project, using the i2b2 system, a “vertical” Rheumatology project has been created to support research questions on Rheumatic and Musculoskeletal Diseases (RMDs).

The “vertical” project contains integrated data within the center’s horizontal project but limited to patients diagnosed with RMDs, specifically:

  • Hospital discharge records (SDO);
  • Administered and dispensed pharmacological therapies;
  • Laboratory tests;
  • Outpatient visits with respective reports;

For the project, a pipeline has also been created to extract information from rheumatological outpatient reports using an algorithm that utilizes Natural Language Processing (NLP) techniques. The algorithm leverages an ontology related to rheumatological pathologies, containing all the specific information to be extracted, created in collaboration with SIR researchers. The following diagram shows the implementation phases of the NLP pipeline.

 

 

The initial phase involves training the model using an ontology and regular expressions to extract information from reports. These reports are then evaluated by a clinician to correct any errors, thus updating the model. The second phase involves validating the model on additional reports, followed by clinical review. If accuracy exceeds 90%, the model is ready for production; otherwise, further iterations of training and validation are pursued.

Several indicators were developed during the three phases of the project, allowing SIR researchers to monitoring patient activities in the three involved centers:

  • Phase 1: indicators on disease characteristics, quality of care, and treatment models.
  • Phase 2: indicators on care, care performance, and pharmacovigilance.
  • Phase 3: indicators on complex procedures/interventions, pathway mining, and comparative drug effectiveness.