
Large language models (LLMs) stand ready to revolutionize healthcare. We have seen interesting use cases such as clinical decision support tools, optimizing hospital workflows, accelerating drug discovery, and automating administrative tasks. Despite promising solutions, one major roadblock remains: data. This article examines structured health data because it represents a significant obstacle to the further implementation of LLMs in healthcare.
While LLMs are designed to process any text, there are difficulties seen in different formats. The standardized formats of structured data including JSON, XML and tabular datasets create processing difficulties for LLMs. These formats structure data logically for repeatable use in software and analytics. Structured data can work with LLMs in small amounts but becomes problematic at scale. The issue stems from multiple sources, one being that LLMs were primarily trained on unstructured text. Although structured data is present in the training data, its proportion is lower. Additionally, due to the sensitive nature of medicine, there is even less health data that was available for LLM training.
The limitations of LLMs when working with structured data become clear when analyzing HL7’s FHIR (Fast Healthcare Interoperability Resources). FHIR functions as a JSON-based format that supports essential healthcare data features including interoperability and application development. FHIR has greatly contributed to the world of healthcare technology and is both critical and essential. It structures healthcare data into modular resources that represent different aspects of healthcare.
If you are unfamiliar with FHIR, here is an example of a synthetic patient’s Observation resource recording a body height of 177.2 cm.
{
"fullUrl": "urn:uuid:7ad56bab-4b61-64e5-dfb6-52165876222e",
"resource": {
"resourceType": "Observation",
"id": "7ad56bab-4b61-64e5-dfb6-52165876222e",
"meta": {
"profile": [ "http://hl7.org/fhir/us/core/StructureDefinition/us-core-body-height" ]
},
"status": "final",
"category": [ {
"coding": [ {
"system": "http://terminology.hl7.org/CodeSystem/observation-category",
"code": "vital-signs",
"display": "Vital signs"
} ]
} ],
"code": {
"coding": [ {
"system": "http://loinc.org",
"code": "8302-2",
"display": "Body Height"
} ],
"text": "Body Height"
},
"subject": {
"reference": "urn:uuid:41d62af4-6e01-9ee7-b346-436e34a49b6d"
},
"encounter": {
"reference": "urn:uuid:a701e6b6-1274-abb3-f86c-2a44552085ba"
},
"effectiveDateTime": "2018-06-14T04:42:50-04:00",
"issued": "2018-06-14T04:42:50.541-04:00",
"valueQuantity": {
"value": 177.2,
"unit": "cm",
"system": "http://unitsofmeasure.org",
"code": "cm"
}
}
Patients suffering from multiple chronic conditions generate hundreds of observations which alone can surpass the allowed LLM context limits. There are other equally as important resources such as patient, conditions, medications, procedures and so on. The strict schemas and hierarchical coding needed for healthcare creates extensive noise which LLMs must navigate through. The result? As FHIR data grows, the LLM analysis slows down, costs increase, and the system loses accuracy. FHIR aims to improve healthcare by providing better data, but its structure does not suit LLM applications.
For context, the synthetic data Observation shown earlier requires 393 tokens. While the token count is high with FHIR, there are even more complex data types, such as tabular, that further hinder LLMs from being applied to healthcare.
While we are highlighting FHIR in this article, many other structured healthcare standards encounter the same issue.
While it is often assumed that more data leads to better outcomes, LLM performance tends to decline when it is overloaded excessive details not relevant to the task. The clinically relevant details in data is often obscured by structural noise, limiting the potential of improving patient care and other processes. Beyond the data volume of structure, we tend to see duplicate representations from clinical notes leading to further data bloat in context. When early pilots with LLMs produce false or suboptimal results, innovation slows, as teams may lack the data background to be aware of these issues. Alternatively, given the size of data in healthcare, the cost of processing large datasets frequently prevents companies and healthcare institutions from even starting projects. One prevailing approach to fixing the data size problem is waiting for the context windows and costs to go down. However, while these improvements reduce expense and API times, they do nothing to resolve the underlying data issue.
The world is slowly working to improve health data compatibility with LLMs. Open-source tools are emerging, scientists are publishing, academic teams are creating benchmarks, companies are starting to build solutions, FHIR encodings are being created. These advancements mark important first steps, but the gap between structured health data and LLM performance remains significant. LLMs still struggle to process large-scale structured data efficiently, limiting their real-world impact. Without continued progress, the healthcare sector risks missing out on AI's full potential to transform patient care and clinical workflows.
James McCormack - CEO & Founder of XPawn, LinkedIn (james@xpawn.ai)
Thrushna Matharasi - Director of Engineering, Data & AI at Solera, AI Consultant for XPAWN (thrushna.matharasi@ieee.org)