Natural Language Processing (NLP) is branch of knowledge shared between Artificial Intelligence and Computational Linguistics. The primary objective of NLP is extracting meaningful information from chunks of text. These techniques leverage Machine Learning to detect and isolate patterns and regularities between the words of a sentence, implicitly learning grammar rules and semantic relationships. Such relationships can be then used to “understand” the human (i.e., natural) language. More precisely, this kind of AI provides solutions to analyse the syntactic structure of text, associating single words to different morphological categories (e.g. name, verb, adjective, etc.), identifying entities and classifying them in predefined groups (e.g. people, places, dates) based also on their semantics.
The rise of the Internet, together with the growth of computational resource availability and a gradual digitalization process, have made Natural Language Processing applicable in almost any professional field, sometimes reaching human-comparable levels. Thanks to the recent advancements, for instance, today we can easily conversate with virtual assistants, getting back coherent answers even for the most complex questions. And that’s not it: even this very piece of text could have been crafted by an advanced generative model!
For what concerns the healthcare domain, the digitalization of clinical services and processes pushed medical institutions to produce and store an ever-growing amount of medical data, most of them being in free-text format (i.e., unstructured data). Medical reports, nursing notes, discharge letters, first aid reports, and administrative documents and many other kinds of digital documents are generated in hospitals on a daily basis. This information is crucial in the new big-data healthcare framework because, thanks to Artificial Intelligence and NLP, we can leverage it to improve patients’ care and management while reducing costs and speeding up procedures.
A classic NLP application in medicine is the automatic extraction of concepts and relations, mapping documents from free text to structured sets of clinical entities through ontologies. Clinical information extracted this way can then be used to classify patients, create cohorts, populate registries.
To sum up, Natural Language Processing in healthcare allows to analyse of an underused source of information that is nevertheless quantitatively and qualitatively important: free text. Extracting from clinical unstructured information that can be integrated with structured data is a huge opportunity to cut expenses, to speed up processes and to guarantee better care for patients.
Insights on the present (and future) of NLP for information extraction in medicine:
- Primer sul Neural NLP: Goldberg, 2015 – A Primer on Neural Network Models for Natural Language Processing [https://arxiv.org/pdf/1510.00726.pdf]
- Esempio di Modello di Neural NLP Biomedico: Lee et al., 2019 – BioBERT: a pre-trained biomedical language representation model for biomedical text mining [https://arxiv.org/ftp/arxiv/papers/1901/1901.08746.pdf]
- Medical Information Extraction: Hahn & Oleynik, 2020 – Medical Information Extraction in the Age of Deep Learning [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7442512/]
- Applicazioni Information Extraction Medicina: Wang et al., 2018 – Clinical Information Extraction Applications: a Literature Review [https://www.sciencedirect.com/science/article/pii/S1532046417302563]
AUTHOR: Tommaso Buonocore