HealthUpdatedApril 24, 2020

NLP: Unlocking the potential of unstructured text in healthcare

Hospitals and health systems are sitting on a wealth of patient information that has the potential to transform care delivery. Yet analytics infrastructures designed to fuel performance improvement have traditionally overlooked much of that data because it resides in health IT systems as unstructured free text.

The industry has made notable inroads with structured patient data through the introduction of standards. Now, healthcare organizations must leverage strategies that enable access and retrieval of critical patient data housed in unstructured information—accounting for as much as 80 percent of clinical documentation, according to industry estimates.

Advances with natural language processing (NLP) technology hold great promise as the answer to this challenge. Increasingly recognized as a powerful tool for unlocking vital clinical data, NLP turns free text into shareable data that can be analyzed and acted upon.

The reality is that unstructured data is important to care delivery. A 2009 survey found that 96 percent of physicians were concerned about “losing the unique patient story with the transition to point-and-click (template-driven) EHRs.” Additionally, 94 percent said that “including the physician narrative as part of patients’ medical records is ‘important’ or ‘very important’ to realizing and measuring improved patient outcomes.”

The bigger question is: How much of this unstructured data is useful for complete and accurate quality measure reporting and analytics? Healthcare organizations often miss critical information when conducting analytics initiatives due to limitations with free text. In fact, one study found that EHR-derived quality measures can undercount practice performance when compared to a manual review of electronic charts.

Without the right infrastructure in place to address both structured and unstructured patient data, lab information, qualitative clinical information, and patients are often excluded from quality measure calculations. Here are some examples of how information is missed:

Information that qualifies a patient for exclusion criteria in a quality measure

Quality measure PQRS 116 (NQF 58) penalizes the performance score for every patient who receives antibiotics for acute bronchitis—antibiotics don’t help acute bronchitis and may cause harm. The measure provides exclusion criteria indicating that if the patient has a secondary condition, such as cystic fibrosis or HIV, then it is acceptable to prescribe antibiotics so they are excluded from the measure. Often, the information related to these excluded conditions resides in free text, requiring NLP for accurate identification and quality measure calculation. Thus, the right infrastructure can ultimately help a healthcare organization accurately report quality measures to gain higher scores, avoid negative payment adjustments, and generate positive payment adjustments.

Qualitative unstructured clinical data is useful for quality measure reporting

Many quality measures rely on patient clinical data that often resides in free text, such as smoking status and dietary orders. For instance, a physician's note for dietary counseling may read “referred the patient to see a nutritional counselor” or “enrolled in Weight Watchers from previous conversation.” Or, in the case of smoking status, a physician might document, “smoked in the past” or “smoking 2 packs a day but is now down to 1 cigarette a day.” NLP can identify the relevant qualitative statements within a patient record, derive the meaning to categorize phrases like the above into structured observations such as “is a smoker,” “former smoker,” and “not a smoker.” When NLP is leveraged to extract this information from free text, healthcare organizations are equipped to more accurately calculate quality measures and avoid negative payment adjustments associated with low scores. This information can also be used for clinical decision support that guides physicians to the best medication choices and therapies.

Extracting data that qualifies a patient for a population cohort

Even in well-documented diagnoses such as diabetes, you can miss vital information on your specific population cohorts. Eye and foot exams, for example, are valuable data to determine the severity of diabetes patients’ health. This information can be easily lost in unstructured text. Although practices have a good read on diabetes populations, there is still some information that falls between the cracks. More complicated diagnoses get tracked even less. In fact, a survey recorded that “the majority of practices correctly documented diagnoses of hypertension and diabetes over 80% of the time, but rates of appropriate documentation for dyslipidemia and ischemic cardiovascular disease were substantially lower.” NLP can help practices identify more diabetes patients through unstructured text that might have been missed, and can also help identify patients with diagnoses that are infrequently documented, such as dyslipidemia. NLP enables aggregation of this critical patient data to reflect performance within diabetes and other critical chronic conditions and gives you a more accurate representation of your population.

Leveraging structured data is an important first step to elevating analytics strategies. In tandem with these strategies, forward-looking healthcare organizations need to incorporate infrastructures that support NLP to more accurately represent patient populations and positively affect quality measure reporting.

Data Quality Workbench

Chris Funk, PhD

Senior Medical Informaticist of Health Language, Wolters Kluwer, Health

As a Senior Medical Informaticist, Christopher supports the company’s Health Language solutions by providing physician documentation within the electronic medical record, along with integrating advanced technology, such as clinical natural language processing.

Explore related topics

Health Language

Data Quality Workbench

Unlock the power of your healthcare data and improve your data quality with tools for managing, enriching, and mapping healthcare datasets. Establish a trusted data foundation to improve clinical accuracy in analytics and more efficiency into reimbursement processes.

Data Quality Workbench

Related Insights

Article

Health

Updated June 26, 2026

How to use SAAR data to improve antimicrobial stewardship

SAAR not only helps hospitals benchmark their results against their peers. It can help hospitals optimize their antimicrobial stewardship program.

Learn More
Webinar

Health

June 23, 2026

Build a Trusted Data Foundation for AI in Healthcare

AI in healthcare data quality is critical. Watch this on-demand webinar to build a trusted data foundation for AI that’s secure, compliant, and reliable.

Learn More
Article

Health

May 15, 2026

Top 5 payer and PBM priorities from AMCP 2026: Focusing workflows on balancing efficiencies, volumes, and precision

Payers focus on addressing operational realities through AI, real-world evidence, data governance, interoperability, and specialty drug management.

Learn More
Podcast

Health

May 12, 2026

Mountain model for evidence-based practice and quality improvement

Dr. Julee Waldrop and Dr. Jayne Jennings Dunlap introduce the Mountain Model, a clear, practical approach uniting evidence-based practice and quality improvement in nursing.

Learn More

Brazil

Canada

Latin America

United States

Belgium

Czech Republic

Denmark

France

Germany

Hungary

Italy

Netherlands

Norway

Poland

Portugal

Romania

Slovakia

Spain

Sweden

United Kingdom

Australia

China

Hong Kong

India

Japan

Malaysia

New Zealand

Philippines

Singapore

South Korea

Taiwan

Thailand

Vietnam

Health

Tax & Accounting

Financial & Corporate Compliance

Legal & Regulatory

Corporate Performance & ESG

Useful Links

Solutions

Roles

Solutions

Solutions

Roles

Solutions

Solutions

Business Insights Hub

Featured Reports

Trending Topics

Insights

Trending Topics

Insights

Trending Topics

Insights

Trending Topics

Insights

Trending Topics

Insights

Brazil

Canada

Latin America

United States

Belgium

Czech Republic

Denmark

France

Germany

Hungary

Italy

Netherlands

Norway

Poland

Portugal

Romania

Slovakia

Spain

Sweden

United Kingdom

Australia

China