Skip to main content
for

Researchers develop approach to accurately predict pneumonia outcomes

Machine-learning approach using health records identifies five pneumonia ‘states’ with predictable prognoses

  • Pneumonia is a leading cause of death worldwide and represents 20% of U.S. hospital admissions
  • Disease historically classified by how or where it was acquired, which is not a good predictor of individual outcomes
  • Three of the identified clinical states are strongly associated with outcomes, with one disease state predicting a 7.5% death rate within 24 hours
  • Pneumonia caused by COVID-19 is distinct from other forms

EVANSTON, Ill. --- Two patients being treated for pneumonia, an infection that causes difficulty breathing due to fluid-filled sacs in the lungs, can look vastly different and have opposing outcomes. Yet doctors struggle to accurately predict patients’ prognoses and determine the most effective treatments.

Now, by applying a sophisticated machine-learning approach to electronic health records (EHRs) of patients with pneumonia, researchers at Northwestern University have uncovered five distinct clinical states in pneumonia, three of which are strongly associated with disease outcomes and two that can help physicians determine the disease’s cause. One of the states was associated with a 7.5% chance of dying within 24 hours.

The paper that describes the novel approach and the data used to develop it published this week in the journal Proceedings of the National Academy of Sciences (PNAS). The researchers say the approach has potential to help clinicians make better informed treatment decisions for critically ill patients and to be applied much more broadly.

Pneumonia, a leading cause of death globally, is inherently difficult to treat due to the diverse ways it can present and be acquired and its potential for antibiotics overuse. Physicians have historically used cause to differentiate pneumonia patients in intensive care units, grouping them into three categories: community-acquired (which could mean a previous bacterial or viral infection), hospital-acquired and ventilator-acquired (developed after a patient requires mechanical ventilation).

But Northwestern’s Luís Amaral, the study’s lead author, said this data actually tells physicians surprisingly little about a patient’s chance of recovery.

“Other approaches to classifying the state of pneumonia patients are not as discriminatory,” Amaral said. “They do a worse job of predicting disease progression and prognosis, which is particularly relevant for end-of-life decisions. Our study is the first to demonstrate the existence of robustly identifiable, distinct, clinical states.”

Amaral, an expert in complex systems and data science, is the Erastus Otis Haven Professor of Engineering Sciences and Applied Mathematics in Northwestern’s McCormick School of Engineering

Amaral said understanding individuals’ chances of survival can help prepare family members for loss and help physicians avoid over-treatment.

The five states integrate many types of data (body temperature, breathing rate, glucose levels, oxygenation levels, etc.) to establish relationships between different measures. The researchers found that linear combinations of the variables characterizing motor response, renal function, heart rate, systolic blood pressure, respiratory rate and high blood pressure provided the most information about the state of a patient.

The team overcame several challenges as they developed a suite of machine-learning tools to cluster patient conditions from two EHR data sources, one a Northwestern project called SCRIPT and the other from a standard clinical dataset. First, many types of data had to be integrated despite being collected at distinct frequencies. They also needed to develop a new test that would indicate the reliability of the approach. Third, they had to determine whether the information contained in these physiological variables could be “compressed” into a much smaller number of combinations of those variables.

The resulting data enabled the researchers to identify five distinct clusters — which they equated with distinct clinical states — whose value in predicting mortality of patients was considerably higher than that of current approaches. Surprisingly, one of the clusters identified collected most patients whose pneumonia was associated with a COVID-19 infection.  

The technical advances developed during this research may be useful in other contexts. In fact, according to Feihong Xu, the study’s lead author and a graduate student in the Amaral lab, the team is “now applying these techniques to experimental data from a mouse model of sepsis.”

For now, their analysis is yet to investigate why some patients move from one state to another, something the researchers are now studying. Future research, both on pneumonia and other diseases, could ultimately be the basis for more effective and predictable treatment options.

The study, “Robust extraction of pneumonia-associated clinical states from electronic health records,” was supported the National Heart, Lung, and Blood Institute of the National Institutes of Health (R01HL140362), the National Institute of Allergy and Infectious Diseases (U19AI135964) and an NIH training grant (T32GM153505).