Leveraging Large Language Models for Improved Medical Diagnostics through Structured Data Extraction

Abstract

Tabular data, often hidden in medical diagnostic reports, poses challenges for traditional machine learning (ML) models. While text-based models excel at handling unstructured text, they struggle with tabular data’s inherent structure. In this study, we propose a novel methodology, TEMED-LLM, which combines large language models (LLMs) for extracting structured tabular data from textual medical reports and interpretable ML models for end-to-end predictions. Our approach employs the Reason and Extract (RExtract) module to generate prompts for LLMs, guiding them to extract relevant features from medical texts accurately. The extracted data is then processed using interpretable ML models, such as decision trees and logistic regression, to deliver highly accurate and interpretable predictions. Evaluations on multiple datasets demonstrate that our methodology outperforms state-of-the-art text classification models in medical diagnostics. The TEMED-LLM approach not only enhances predictive performance but also maintains interpretability, crucial for clinical applications. The significant improvements in accuracy and trustworthiness underscore the potential of leveraging LLMs to bridge the gap between unstructured text and structured data, facilitating more effective and interpretable medical diagnostics.

Publication
In Proceedings of 51st Symposium on Operational Research SYM-OP-IS 2024
Sandro Radovanović
Sandro Radovanović
Assistant Professor at University of Belgrade

My research interests include machine learning, development and design of decision support systems, decision theory, and fairness and justice concepts in algorithmic decision making.