Skip to main content

When lab-trained AI meets the real world, ‘mistakes can happen’

Tissue contamination distracts AI models from making accurate real-world diagnoses

  • First study to examine the impact of tissue contamination on AI models
  • ‘If it’s paying attention to the tissue contaminants, it’s paying less attention to the patient’s tissue that is being examined’
  • ‘Pathologists fear — and AI companies hope — that the computers are coming for our jobs. Not yet.’

CHICAGO --- Human pathologists are extensively trained to detect when tissue samples from one patient mistakenly end up on another patient’s microscope slides (a problem known as tissue contamination). But such contamination can easily confuse artificial intelligence (AI) models, which are often trained in pristine, simulated environments, reports a new Northwestern Medicine study.

“We train AIs to tell ‘A’ versus ‘B’ in a very clean, artificial environment, but, in real life, the AI will see a variety of materials that it hasn’t trained on. When it does, mistakes can happen,” said corresponding author Dr. Jeffery Goldstein, director of perinatal pathology and an assistant professor of perinatal pathology and autopsy at Northwestern University Feinberg School of Medicine.

“Our findings serve as a reminder that AI that works incredibly well in the lab may fall on its face in the real world. Patients should continue to expect that a human expert is the final decider on diagnoses made on biopsies and other tissue samples. Pathologists fear — and AI companies hope — that the computers are coming for our jobs. Not yet.”

In the new study, scientists trained three AI models to scan microscope slides of placenta tissue to (1) detect blood vessel damage; (2) estimate gestational age; and (3) classify macroscopic lesions. They trained a fourth AI model to detect prostate cancer in tissues collected from needle biopsies. When the models were ready, the scientists exposed each one to small portions of contaminant tissue (e.g. bladder, blood, etc.) that were randomly sampled from other slides. Finally, they tested the AIs’ reactions.

Each of the four AI models paid too much attention to the tissue contamination, which resulted in errors when diagnosing or detecting vessel damage, gestational age, lesions and prostate cancer, the study found.

The findings were published earlier this month in the journal Modern Pathology. It marks the first study to examine how tissue contamination affects machine-learning models.

‘For a human, we’d call it a distraction, like a bright, shiny object’

Tissue contamination is a well-known problem for pathologists, but it often comes as a surprise to non-pathologist researchers or doctors, the study points out. A pathologist examining 80 to 100 slides per day can expect to see two to three with contaminants, but they’ve been trained to ignore them.

When humans examine tissue on slides, they can only look at a limited field within the microscope, then move to a new field and so on. After examining the entire sample, they combine all the information they’ve gathered to make a diagnosis. An AI model performs in the same way, but the study found AI was easily misled by contaminants.

"The AI model has to decide which pieces to pay attention to and which ones not to, and that’s zero sum,” Goldstein said. “If it’s paying attention to tissue contaminants, then it’s paying less attention to the tissue from the patient that is being examined. For a human, we’d call it a distraction, like a bright, shiny object.”

The AI models gave a high level of attention to contaminants, indicating an inability to encode biological impurities. Practitioners should work to quantify and improve upon this problem, the study authors said.

Previous AI scientists in pathology have studied different kinds of image artifacts, such as blurriness, debris on the slide, folds or bubbles, but this is the first time they’ve examined tissue contamination.

‘Confident that AI for placenta is doable’

Perinatal pathologists, such as Goldstein, are incredibly rare. In fact, there are only 50 to 100 in the entire U.S., mostly located in big academic centers, Goldstein said. This means only 5% of placentas in the U.S. are examined by human experts. Worldwide, that number is even lower. Embedding this type of expertise into AI models can help pathologists across the country do their jobs better and faster, Goldstein said.

“I'm actually very excited about how well we were able to build the models and how well they performed before we deliberately broke them for the study,” Goldstein said. “Our results make me confident that AI evaluations of placenta are doable. We ran into a real-world problem, but hitting that speedbump means we're on the road to better integrating the use of machine learning in pathology.”

The study is titled, “Tissue contamination challenges the credibility of machine learning models in real world digital pathology.” Funding for the study was provided by the National Institute of Biomedical Imaging and Bioengineering (grant K08EB030120); the National Center for Advancing Translational Sciences (grant number UL1TR001422), both of the National Institutes of Health; the Walder Foundation Fund to Retain Clinician Scientists; and the Department of Health and Human Services (grants R01LM013523 and U01CA220401).