GNS Healthcare Blog

GNS Healthcare Blog

Unstructured Data: Joining the Healthcare Party


The U. S. healthcare industry now produces an estimated 1.2 billion clinical care documents a year. Within those documents is an amazing amount of detail on an individual patient’s care plan, medical history and overall health.  So why is this type of data, not often touted, important? When analyzed, this detailed patient data has the potential to improve healthcare outcomes. The only problem is that about eighty percent of that data is unstructured¹.

Longitudinal data – data that includes all healthcare encounters of a patient or member over a continuum of care and time – is critically important to discovering insights about disease, understanding the patient health trajectory and optimizing treatments. Some of this data is considered structured—meaning that it has been put into a standardized format and easily accessed and shared through a database. Unstructured data, which makes up the overwhelming majority of longitudinal data sets, is much harder to standardize and therefore harder to access, share and analyze.

Unstructured data includes everything from email communications, physician clinical notes, patient phone call transcriptions, comment cards, wellness diary entries, lab reports, socioeconomic data, patient preferences, key lifestyle factors to X-ray images and faxed lab reports. It includes other interactions members have with the system, such as prescriptions, lab values, as well as all the information that is captured through call center and care manager interactions. Combining this unstructured data with available structured data provides a more complete picture of a patient’s overall health profile.

A final piece of the patient data puzzle comes from the five key areas of social determinants of health (SDoH) which the Office of Disease Prevention and Health Promotion defines as economic stability, education, social and community context, health and health care, and neighborhood and built environment². The National Academy of Medicine says SDoH accounts for 80-90 percent of the modifiable contributors to health outcomes for a population³, so it’s important to include these factors when creating a  holistic view of the patient.


Opportunities from unstructured data

Merging diverse structured and unstructured data provides enormous opportunities for the healthcare industry to glean the highest level of novel insights possible and impact patient care.

Early attempts to collect information from unstructured data in healthcare focused on simple analysis using keywords that might offer insights not available in the structured data of EMRs. These notes helped determine the level of severity that might not be readily apparent from clinical test results.

But there is much more that can be done. The healthcare industry should be modeling its efforts involving unstructured data after other industries that are already taking advantage of this rich source of information.  More than half of the data used by retail, travel/hospitality, energy, insurance, consumer goods, and financial services to derive insight is unstructured – more than any other type of data4. The retail industry uses this data to understand consumer behavior and engage consumers in targeted ways. The same can be done for patients.

The challenge is providing a level of precision and accuracy that will allow researchers, care managers, and physicians to access this information in real-time in every setting so they can quickly act on it. Artificial intelligence (AI) and machine learning are key to tapping into the wealth of unstructured data to reveal insights that may have otherwise been overlooked.


Using AI to leverage unstructured data

Processing the terabytes of available unstructured data and turning them into something actionable is no doubt a challenge. The task requires an enormous amount of computing power and AI powerful enough to unravel the complexity of this data to discern the signal from the noise and provide insights that are novel and actionable.

To cut through the hype and determine the realistic potential of AI in healthcare, including unleashing the insights of unstructured data, the Office of the National Coordinator for Health IT (ONC) and the Agency for Healthcare Research and Quality (AHRQ) turned to JASON – an independent group of scientists and academics that regularly advises the federal government5.

Among its recommendations, JASON supports capturing and leveraging unstructured data from smartphones as well as social and environmental data. The effort reinforces the commitment of the ONC and AHRQ to effectively use this type of data to improve the quality and safety of patient care.

Once the data is captured, one option is to work with Natural Language Processing (NLP) algorithms that are specifically designed to take unstructured text and convert it into a structured format. There are several efforts underway in converting EHR notes, clinician notes, texts, chat conversations, and other text fields into usable information.

The question then becomes, once this new influx of data is formatted and available how do you harness it? In a word, okay two, artificial intelligence (AI). AI is beginning to transform healthcare from how drugs are developed to how care is being provided to identifying the best treatment for an individual patient based on his or her biology. To effectively leverage this growing healthcare data set, you need an AI technology like causal machine learning that is data agnostic and can transform data from the very granular genomic to the more systemic EHR data to find the underlying causes and effects within the data. This is a very different technology than scanning available information, it is discovering insights we didn’t know we didn’t know.

So what does this mean for us as patients? It means that the data is bigger, richer and more accessible. It means the technology continues to advance and grow smarter over time. And it means that the idea of precision medicine where patients are treated based on their individual biology is no longer a pipe dream.


[1] More than meets the eye: Unstructured data’s untapped potential, by John Schneider, Healthcare Dive, February 2, 2017.

[2]  Social Determinants of Health, Health People 2020, Office of Disease Prevention and Health Promotion.

[3] Social Determinants of Health 101 for Health Care: Five Plus Five, by Sanne Magnan, National Academy of Medicine, October 9, 2017.

[4] Which Industries Use More Unstructured Data? Big Data, Global Trend Study, Tata Consultancy Services, 2013

[5] Hype to Reality: How Artificial Intelligence (AI) Can Transform Health and Healthcare, by Teresa Zayas Caban, Kevin Chaney, MGS, Michael Painter and Chris Dymek, HealthITBuzz, January 17, 2018

Subscribe to the GNS Newsletter

Recent Posts: