HealthMay 30, 2017

NLP: Unlocking the potential of unstructured text in healthcare

Hospitals and health systems are sitting on a wealth of patient information that has potential to transform care delivery. Yet analytics infrastructures designed to fuel performance improvement have traditionally overlooked much of that data because it resides in health IT systems as unstructured free text.

The industry has made notable inroads with structured patient data through the introduction of standards. Now, healthcare organizations must leverage strategies that enable access and retrieval of critical patient data housed in unstructured information—accounting for as much as 80 percent of clinical documentation, according to industry estimates.

Advances with natural language processing (NLP) technology hold great promise as the answer to this challenge. Increasingly recognized as a powerful tool for unlocking vital clinical data, NLP turns free text into shareable data that can be analyzed and acted upon.

The reality is that unstructured data is important to care delivery. A 2009 survey found that 96 percent of physicians were concerned about “losing the unique patient story with the transition to point-and-click (template-driven) EHRs.” Additionally, 94 percent said that “including the physician narrative as part of patients’ medical records is ‘important’ or ‘very important’ to realizing and measuring improved patient outcomes.”

The bigger question is: How much of this unstructured data is useful for complete and accurate quality measure reporting and analytics? Healthcare organizations often miss critical information when conducting analytics initiatives due to limitations with free text. In fact, one study found that EHR-derived quality measures can undercount practice performance when compared to a manual review of electronic charts.

Without the right infrastructure in place to address both structured and unstructured patient data, lab information, qualitative clinical information, and patients are often excluded from quality measure calculations. Here are some examples of how information is missed:

Information that qualifies a patient for exclusion criteria in a quality measure

Quality measure PQRS 116 (NQF 58) penalizes the performance score for every patient who receives antibiotics for acute bronchitis—antibiotics don’t help acute bronchitis and may cause harm. The measure provides exclusion criteria indicating that if the patient has a secondary condition, such as cystic fibrosis or HIV, then it is acceptable to prescribe antibiotics so they are excluded from the measure. Often, the information related to these excluded conditions resides in free text, requiring NLP for accurate identification and quality measure calculation. Thus, the right infrastructure can ultimately help a healthcare organization accurately report quality measures to gain higher scores, avoid negative payment adjustments, and generate positive payment adjustments.

Qualitative unstructured clinical data useful for quality measure reporting

Many quality measures rely on patient clinical data that often resides in free text, such as smoking status and dietary orders. For instance, a physician note for dietary counseling may read “referred the patient to see a nutritional counselor” or “enrolled in Weight Watchers from previous conversation.” Or, in the case of smoking status, a physician might document, “smoked in the past” or “smoking 2 packs a day but is now down to 1 cigarette a day.” NLP can identify the relevant qualitative statements within a patient record, derive the meaning to categorize phrases like the above into structured observations such as “is a smoker,” “former smoker,” and “not a smoker.” When NLP is leveraged to extract this information from free text, healthcare organizations are equipped to more accurately calculate quality measures and avoid negative payment adjustments associated with low scores. This information can also be used for clinical decision support that guides physicians to the best medication choices and therapies.

Extracting data that qualifies a patient for a population cohort

Even in well documented diagnoses such as diabetes, you can miss vital information on your specific population cohorts. Eye and feet exams, for example, are valuable data to determine the severity of diabetes patients’ health. This information can be easily lost in unstructured text. Although practices have a good read on diabetes populations, there is still some information that falls between the cracks. More complicated diagnoses get tracked even less. In fact a survey recorded that “the majority of practices correctly documented diagnoses of hypertension and diabetes over 80% of the time, but rates of appropriate documentation for dyslipidemia and ischemic cardiovascular disease were substantially lower.” NLP can help practices identify more diabetes patients through unstructured text that might have been missed and can also help identify patients with diagnoses that are infrequently documented, such as dyslipidemia. NLP enables aggregation of this critical patient data to reflect performance within diabetes and other critical chronic conditions and give you a more accurate representation of your population.

Leveraging structured data is an important first step to elevating analytics strategies. In tandem with these strategies, forward-looking healthcare organizations need to incorporate infrastructures that support NLP to more accurately represent patient populations and positively affect quality measure reporting.

Chris Funk, Ph.D.
Senior Medical Informaticist of Health Language, Wolters Kluwer, Health

As a Senior Medical Informaticist, Christopher supports the company’s Health Language solutions by providing physician documentation within the electronic medical record, along with integrating advanced technology, such as clinical natural language processing.