Posts

i2b2-onco-pg23

 

Project status: Current

Coordinator: ASST Papa Giovanni XXIII, Bergamo, IT

Field: Oncologia

Start date: September 1st, 2016

Platform: i2b2

 

Since 2017, the “vertical” project specific to the Oncology Department has been active within the i2b2 system of ASST Papa Giovanni XXIII. The project contains the data integrated in the horizontal corporate project but limited to patients with an oncological diagnosis, in particular:

  • the oncology medical record
  • hospitalisation flow of the discharged patients
  • pharmacological therapies administered and delivered
  • outpatient procedures
  • outcomes of clinical chemistry and microbiology laboratory data
  • anatomic pathology reports.

In addition, the project includes information extracted from anatomic pathology reports related to patients with breast cancer using a Text-Mining algorithm. The algorithm uses NLP (Natural Language Processing) techniques that exploits an ontology referring to breast cancer. The ontology was defined based on a reference ontology (PATHLEX) and the specific information to be extracted. During validation, the algorithm exceeded 90% accuracy. The procedure was then applied to all breast cancer reports (about 20 thousand reports).
The project also integrates specific clinical studies performed with data of patients belonging to the institute, in particular the tool is very efficient to collect data for RWE (Real-World Evidence) studies.

References:

  • Viani N, Chiudinelli L, Tasca C, Zambelli A, Bucalo M, Ghirardi A, Barbarini N, Sfreddo E, Sacchi L, Tondini C, Bellazzi R. Automatic Processing of Anatomic Pathology Reports in the Italian Language to Enhance the Reuse of Clinical Data. Stud Health Technol Inform. 2018;247:715-719. PMID: 29678054Viani N, Chiudinelli L, Tasca C, Zambelli A, Bucalo M, Ghirardi A, Barbarini N, Sfreddo E, Sacchi L, Tondini C, Bellazzi R. Automatic Processing of Anatomic Pathology Reports in the Italian Language to Enhance the Reuse of Clinical Data. Stud Health Technol Inform. 2018;247:715-719. PMID: 29678054 (link, pdf).
  • Zambelli A, Ghirardi A, Masciulli A, Sfreddo E, Porcino R, Bucalo M, Barbarini N, Chiudinelli L, Chirco A, Labianca A, Barbui T, Tondini C. Ten-years electronic phenotyping archive and automated reconstruction of her2+ breast cancer patients careflow, through the exportable, open-source i2b2 data ware-housing platform. XX Congresso Nazionale AIOM 2018. (pdf)
  • Chiudinelli L, Viani N, Zambelli A, Gabetta M, Bucalo M, Ghirardi A, Sfreddo E, Sacchi L, Tondini C, Bellazzi R. i2b2 Ontology Curation leveraging clinical notes. NETTAB 2018. (pdf)
Project status: Terminated

Coordinator: IRCCS ICS Maugeri, Pavia (IT)

Field: Oncology

Funded by: Regione Lombardia

Start date: 01 January 2010

Platform: i2b2

 

ONCO-i2b2 is a research project of the University of Pavia and IRCCS ICS Maugeri of Pavia to support clinical research in oncology. ONCO-i2b2, funded by Regione Lombardia, adopts the i2b2 software. Using i2b2 and new software modules specifically designed during the project, data from multiple sources have been integrated to allow cross-querying. The core of the integration process lies in the retrieval and fusion of data from the biobank management software and the ICS hospital information system. The integration process is based on an oncology domain ontology and open-source software integration modules. A Text-Mining/NLP (Natural Language Processing) module has also been implemented. This module automatically extracts clinical information of oncology patients from unstructured reports from Anatomy Pathology. The system handles more than two thousand patients.

References:

Segagni D, Tibollo V, Dagliati A, Zambelli A, Priori SG, Bellazzi R (2012) An ICT infrastructure to integrate clinical and molecular data in oncology research. BMC Bioinformatics 13(Suppl 4): S5. (link, pdf)

Segagni D, Tibollo V, Dagliati A, Malovini A, Zambelli A, Napolitano C, Priori SG, Bellazzi R. Clinical and research data integration: the i2b2-FSM experience. AMIA Jt Summits Transl Sci Proc. 2013 Mar 18;2013:239-40. eCollection 2013. (link)

Segagni D, Tibollo V, Dagliati A, Perinati L, Zambelli A, Priori S, Bellazzi R. The ONCO-I2b2 project: integrating biobank information and clinical data to support translational research in oncology. Stud Health Technol Inform. 2011;169:887-91. (link, pdf)

Segagni D, Gabetta M, Tibollo V, Zambelli A, Priori SG, Bellazzi R. ONCO-i2b2: Improve patients selection through case-based information retrieval techniques. 8th International Conference on Data Integration in the Life Sciences, DILS 2012 (link)

 

Project status: Terminated

Coordinator: IRCCS ICS Maugeri, Pavia (IT)

Field: Cardiology

Start date: 01 January 2011

Platform: i2b2

 

 

The CARDIO-i2b2 project aims to customize the i2b2 bioinformatics platform to integrate clinical and research data in order to support translational research in cardiology at FSM (Fondazione Salvatore Maugeri). CARDIO-i2b2 collects data from the Molecular Cardiology Laboratory databases and combines them with clinical data from the TRIAD system, an information system to collect data related to arrhythmogenic diseases. Genetic information related to affected patients is also collected.

The data contained in the TRIAD relational database were exported to the i2b2 data warehouse. A dedicated extension of i2b2 was developed to include static R software within the architecture and exploit the statistical capabilities of R via the i2b2 web interface. A dedicated plugin was developed to allow researchers to dynamically perform Kaplan-Meier survival analysis on selected patients.

References:

Segagni D, Tibollo V, Dagliati A, Napolitano C, G Priori S, Bellazzi R. (2012) CARDIO-i2b2: integrating arrhythmogenic disease data in i2b2. Stud Health Technol Inform.180:1126-8. (link, pdf)

Segagni D, Tibollo V, Dagliati A, Malovini A, Zambelli A, Napolitano C, Priori SG, Bellazzi R. Clinical and research data integration: the i2b2-FSM experience. AMIA Jt Summits Transl Sci Proc. 2013 Mar 18;2013:239-40. eCollection 2013. (link)

Natural Language Processing (NLP) is branch of knowledge shared between Artificial Intelligence and Computational Linguistics. The primary objective of NLP is extracting meaningful information from chunks of text. These techniques leverage Machine Learning to detect and isolate patterns and regularities between the words of a sentence, implicitly learning grammar rules and semantic relationships. Such relationships can be then used to “understand” the human (i.e., natural) language. More precisely, this kind of AI provides solutions to analyse the syntactic structure of text, associating single words to different morphological categories (e.g. name, verb, adjective, etc.), identifying entities and classifying them in predefined groups (e.g. people, places, dates) based also on their semantics.

Tipica pipeline di Natural Language Processing

The rise of the Internet, together with the growth of computational resource availability and a gradual digitalization process, have made Natural Language Processing applicable in almost any professional field, sometimes reaching human-comparable levels. Thanks to the recent advancements, for instance, today we can easily conversate with virtual assistants, getting back coherent answers even for the most complex questions. And that’s not it: even this very piece of text could have been crafted by an advanced generative model!

For what concerns the healthcare domain, the digitalization of clinical services and processes pushed medical institutions to produce and store an ever-growing amount of medical data, most of them being in free-text format (i.e., unstructured data). Medical reports, nursing notes, discharge letters, first aid reports, and administrative documents and many other kinds of digital documents are generated in hospitals on a daily basis. This information is crucial in the new big-data healthcare framework because, thanks to Artificial Intelligence and NLP, we can leverage it to improve patients’ care and management while reducing costs and speeding up procedures.

Esempio di estrazione di informazione da testo libero

A classic NLP application in medicine is the automatic extraction of concepts and relations, mapping documents from free text to structured sets of clinical entities through ontologies. Clinical information extracted this way can then be used to classify patients, create cohorts, populate registries.

To sum up, Natural Language Processing in healthcare allows to analyse of an underused source of information that is nevertheless quantitatively and qualitatively important: free text. Extracting from clinical unstructured information that can be integrated with structured data is a huge opportunity to cut expenses, to speed up processes and to guarantee better care for patients.

Insights on the present (and future) of NLP for information extraction in medicine:

 

AUTHOR: Tommaso Buonocore