top of page
  • Writer's picturePoojita Gangwani, Lokesh Srivastava, Vibhu Agarwal

Large Language Models: Wrecking Balls for Impregnable Clinical Data Silos (Part I)

Background

Sharing health information is becoming increasingly important.  While treating patients care providers need swift access to  information that often resides inside IT systems that were not designed to communicate with each other. The healthcare ecosystem comprising care providers, pharmacies, laboratories, insurers and patients also expect that relevant information will be available when needed and in a format that can be understood by native IT systems. However, transforming healthcare data from one proprietary format into another can introduce unacceptable delays, degrade  data quality and be prohibitively expensive. Furthermore, a significant amount of health information resides in documents in the form of free text. Conversion of such free text data into system-specific structured representations poses an even greater challenge and is the key reason why impregnable data silos exist in healthcare. 


Recent advances in the field of clinical text processing – an interdisciplinary joining of forces between computer scientists and clinical informaticians, have demonstrated the feasibility of inferring meaning encoded in clinical narratives through computational approaches alone. In this series we discuss the importance of data  interoperability from the standpoint of Indian healthcare entities, the hurdles to achieving interoperability at scale, as well as the goals that are now appear to be within striking distance due to the aforementioned advances. 


Health Information Standards in India

Towards its goal of achieving universal health care as part of sustainable development goals, the Indian government introduced the Ayushman Bharat Pradhan Mantri Jan Arogya Yojana (PM-JAY) in 2018 with aim of providing healthcare insurance to 100 million families which translates into nearly 540 million beneficiaries obtaining healthcare cover of INR 0.5 million per year. The PM-JAY is the largest government sponsored healthcare program in the world. Between 2018 through 2022 a total of 3,67,45,368 hospitalizations worth INR 453,761.5 million were recorded under this program. Therefore, not only does the PM-JAY represent a geographically diverse and large population, but also one that may be engaging with the healthcare ecosystem for the very first time and has remained understudied and underserved thus far. 


Digital interoperability is a foundational requirement for the timely review, adjudication and payment of claims generated at such a scale. The Ayushman Bharat Digital Mission (ABDM) under the Ministry of Health and Family Welfare (MoHFW), Government of India envisions the creation of a national digital health ecosystem that provides data, information and infrastructure services, based on interoperable digital systems. To enable the adoption of interoperable systems ABDM has published a guidance document that defines the minimum viable features of an “ABDM compliant”  Health Management Information Systems (HMIS). The said guidance document requires HMIS systems to support clinical information exchange via the HL7 FHIR version 4.  


Clinical Data Interoperability with HL7 FHIR

FHIR, an established healthcare data standard developed by HL7 International, addresses the pressing need for interoperability within the healthcare data landscape. Since it’s publication in 2014, FHIR adoption across healthcare institutions worldwide has been rising steadily with currently over 2 million FHIR compliant applications in use. Below are some of the key features of FHIR that make it suitable as a standardized health information exchange format.

1. Resource-driven Structure: FHIR structures healthcare data into resources, representing specific components such as patients, medications, diagnoses, and procedures. These resources follow defined formats like JSON or XML.


2. RESTful APIs: FHIR relies on RESTful APIs (Application Programming Interfaces) to facilitate data exchange. Comparable to how web browsers communicate with servers, these APIs enable systems to interact and share FHIR resources online.


3. Data standards: FHIR emphasizes interoperability by setting standard formats, elements, and protocols. This standardization allows diverse healthcare systems, applications, and devices to exchange information consistently and comprehensively.


4. Customizable Entity Definitions: FHIR allows customization of clinical concepts via profiles. specifying rules and limitations for resource use in different contexts. This flexibility enables organizations to adapt standard resources to their unique requirements.

  • Resources: Resources are the building blocks- These form the basic data

  • Profiles: FHIR profiles serve as structured definitions that specify how FHIR resources or elements should be used or constrained within a particular context or implementation, ensuring consistency and interoperability across different healthcare systems and settings. They customize the standard FHIR resources to suit specific use cases, enabling a more tailored and precise representation of healthcare data for diverse purposes.


5. Simple yet versatile Data Formats: FHIR accommodates various data formats, handling both narrative and coded data types. This versatility covers documents, messages, services, and RESTful interfaces, accommodating diverse healthcare information needs.


6. Adaptability and Incremental Scalability: FHIR's modular design permits gradual adoption. Healthcare entities can implement and expand FHIR capabilities as needed, facilitating adaptability and scalability based on specific requirements.


Interoperability standards play a crucial role in ensuring that data across systems seamlessly align both structurally (syntactic) and in its inherent meaning (semantic). In healthcare information systems, this need is paramount. These systems aim to provide continuous, lifelong clinical care, ensuring individuals can sustain their health at its peak levels.


Using standardized application programming interface (API) standards, FHIR allows developers to create apps that transcend this document-based environment. Applications can be plugged into a basic EHR operating system and feed information directly into the provider workflow, avoiding pitfalls of document-based exchange, which often requires providers to access data separately.


ABDM Compliant FHIR Implementation

The minimum health record artifacts, including the mandatory elements and terminologies that FHIR implementation must support in order to be ABDM compliant are specified through FHIR profile definitions. The following example illustrates FHIR version 4 resources mapped to clinical artifacts embedded within a patient’s discharge summary.




Figure 1: Key clinical concepts in a discharge summary represented as FHIR resources



The Role of Clinical Language Processing

It is well known the unstructured information captured within a patient’s medical record (eg. progress notes, imaging study reports and discharge summaries) provides the most complete picture of a patient’s health and the care received[1,2]. Herein lies a key challenge. While mapping the structured fields in  patient record to the corresponding FHIR resources is not difficult, doing so with unstructured information can be challenging. Clinical notes are frequently written as terse phrases that make idiosyncratic use of abbreviations and medical terminology, making it difficult to extract, normalize and map medical concepts to FHIR resources automatically. 


Clinical Language Processing (CLP) refers to a class of techniques developed initially within the field of Natural Language Processing and adapted for various clinical text processing tasks such as tokenization, word-sense-disambiguation, named entity recognition and relationship extraction. For example, an algorithm that recognizes entities of clinical interest in clinical notes may be used automatically extract such entities and subsequently map these to the corresponding FHIR resource elements.




Figure 2 NER Algorithms can extract entities of clinical interest from free text


The development of automatic information extraction tools based on CLP has faced numerous challenges. The unavailability of clinical text data for research, the lack of data annotation methodology and tools,  low rates of EMR adoption, as well as inter-institution variability in notes-taking are some of the well known rate limiters to CLP advancement internationally [3].  


Indian healthcare institutions have begun adopting electronic medical record systems. In addition to the aforementioned challenges, the development of traditional CLP methods for Indian patient records must deal with the issue of low compliance levels and a near absence of technical, ethical and legal frameworks for patient data-sharing.


Breaking Clinical Data Silos with Large Language models

Recent advances in artificial intelligence (AI) and particularly in the area of Natural Language Understanding have lead to the development of a number of “Foundation Models” [4] that since their inception have shown the ability to adapt to a variety of tasks with a relatively modest adaptation effort. For example MedBERT, based on the BERT architecture and pre-trained on very large dataset of patient records, delivers astonishing performance on disease classification tasks after very little fine-tuning[5]. This remarkable adaptability while retaining the vast repertoire of patterns learned during pre-training, may allow us to circumvent many of the hurdles faced by the conventional CLP methods. 


Generative Pre-Trained Transformers (GPT) are a type of Large Language Models that are trained on large unlabelled text corpora and as a consequence acquire the ability to generate highly context-aware responses to input text prompts. Moreover, adaption to a domain specific task requires only a few examples. For example, models with the ability to accept a large amount of input text  such as OpenAI’s GPT 4.0, could be prompted with the combination of notes-to-FHIR examples and carefully designed instructions to extract and normalize medical entities from clinical notes. The need to provide at most few notes-to-FHIR examples, is particularly attractive given the challenges in accessing and annotating clinical notes.  


In part II of this series we will share examples from our own experiments with LLMs and FHIR resources aimed at building high performing extraction and standardization systems. 


Questions?

Feel free to reach us at info@miimansa.com to discuss your projects on data standardization, FHIR adoption or ABDM compliance with our clinical data management experts.


References


  1.  D. C. Classen, R. Resar, F. Griffin, F. Federico, T. Frankel, N. Kimmel, J. C. Whittington, A. Frankel, A. Seger, and B. C. James. ’global trigger tool’ shows that adverse events in hospitals may be ten times greater than previously measured. Health Aff (Millwood), 30(4):581–9, 2011

  2. L. Poissant, L. Taylor, A. Huang, and R. Tamblyn. Assessing the accuracy of an inter-institutional automated patient-specific health problem list. BMC Med Inform Decis Mak, 10:10, 2010.

  3. Chapman WW, Nadkarni PM, Hirschman L, D'Avolio LW, Savova GK, Uzuner O. Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. J Am Med Inform Assoc. 2011 Sep-Oct;18(5):540-3. doi: 10.1136/amiajnl-2011-000465. PMID: 21846785; PMCID: PMC3168329.

  4. Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., ... & Liang, P. (2021). On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.

  5. Rasmy, L., Xiang, Y., Xie, Z. et al. Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction. npj Digit. Med. 4, 86 (2021). 

170 views0 comments

Comments


bottom of page