New Method Evaluates Reliability of Radiologists' Diagnostic Reports

Introduction to Medical Imaging and Uncertainty

Due to the inherent ambiguity in medical images like X-rays, radiologists often use words like “may” or “likely” when describing the presence of a certain pathology, such as pneumonia. But do the words radiologists use to express their confidence level accurately reflect how often a particular pathology occurs in patients? A new study shows that when radiologists express confidence about a certain pathology using a phrase like “very likely,” they tend to be overconfident, and vice-versa when they express less confidence using a word like “possibly.”

The Challenge of Quantifying Uncertainty

Using clinical data, a multidisciplinary team of MIT researchers in collaboration with researchers and clinicians at hospitals affiliated with Harvard Medical School created a framework to quantify how reliable radiologists are when they express certainty using natural language terms. They used this approach to provide clear suggestions that help radiologists choose certainty phrases that would improve the reliability of their clinical reporting.

Decoding Uncertainty in Words

A radiologist writing a report about a chest X-ray might say the image shows a “possible” pneumonia, which is an infection that inflames the air sacs in the lungs. In that case, a doctor could order a follow-up CT scan to confirm the diagnosis. However, if the radiologist writes that the X-ray shows a “likely” pneumonia, the doctor might begin treatment immediately, such as by prescribing antibiotics, while still ordering additional tests to assess severity. Trying to measure the calibration, or reliability, of ambiguous natural language terms like “possibly” and “likely” presents many challenges.

Assessing and Improving Calibration

The researchers leveraged prior work that surveyed radiologists to obtain probability distributions that correspond to each diagnostic certainty phrase, ranging from “very likely” to “consistent with.” For instance, since more radiologists believe the phrase “consistent with” means a pathology is present in a medical image, its probability distribution climbs sharply to a high peak, with most values clustered around the 90 to 100 percent range. In contrast, the phrase “may represent” conveys greater uncertainty, leading to a broader, bell-shaped distribution centered around 50 percent.

Improving Radiologists’ Reporting

To improve calibration, the researchers formulated and solved an optimization problem that adjusts how often certain phrases are used, to better align confidence with reality. They derived a calibration map that suggests certainty terms a radiologist should use to make the reports more accurate for a specific pathology. “Perhaps, for this dataset, if every time the radiologist said pneumonia was ‘present,’ they changed the phrase to ‘likely present’ instead, then they would become better calibrated,” says Peiqi Wang, lead author of the paper.

Applications Beyond Medical Imaging

In addition, the researchers evaluated the reliability of language models using their method, providing a more nuanced representation of confidence than classical methods that rely on confidence scores. This approach has the potential to improve the accuracy and communication of not just radiologists but also AI models in various fields.

Conclusion

By helping radiologists more accurately describe the likelihood of certain pathologies in medical images, this new framework could improve the reliability of critical clinical information. The researchers plan to continue collaborating with clinicians in the hopes of improving diagnoses and treatment. They are working to expand their study to include data from abdominal CT scans and are interested in studying how receptive radiologists are to calibration-improving suggestions.

FAQs

Q: What is the main challenge in medical imaging?
A: The main challenge is the inherent ambiguity in medical images, which leads to uncertainty in diagnoses.
Q: How do radiologists express uncertainty?
A: Radiologists use words like “may” or “likely” to express their confidence level when describing the presence of a certain pathology.
Q: What is the goal of the new framework developed by MIT researchers?
A: The goal is to quantify how reliable radiologists are when they express certainty using natural language terms and provide suggestions to improve the reliability of their clinical reporting.
Q: Can this framework be applied beyond medical imaging?
A: Yes, the framework can be used to evaluate and improve the reliability of language models in various fields, providing a more nuanced representation of confidence.