Leveraging Large Language Models for Improved Medical Diagnostics through Structured Data Extraction

Andrija Petrović, Milija Suknović, Sandro Radovanović, Boris Delibašić

September, 2024

Abstract

Tabular data, often hidden in medical diagnostic reports, poses challenges for traditional machine learning (ML) models. While text-based models excel at handling unstructured text, they struggle with tabular data’s inherent structure. In this study, we propose a novel methodology, TEMED-LLM, which combines large language models (LLMs) for extracting structured tabular data from textual medical reports and interpretable ML models for end-to-end predictions. Our approach employs the Reason and Extract (RExtract) module to generate prompts for LLMs, guiding them to extract relevant features from medical texts accurately. The extracted data is then processed using interpretable ML models, such as decision trees and logistic regression, to deliver highly accurate and interpretable predictions. Evaluations on multiple datasets demonstrate that our methodology outperforms state-of-the-art text classification models in medical diagnostics. The TEMED-LLM approach not only enhances predictive performance but also maintains interpretability, crucial for clinical applications. The significant improvements in accuracy and trustworthiness underscore the potential of leveraging LLMs to bridge the gap between unstructured text and structured data, facilitating more effective and interpretable medical diagnostics.

Type

Conference paper

Publication

In Proceedings of 51st Symposium on Operational Research SYM-OP-IS 2024

Leveraging Large Language Models for Improved Medical Diagnostics through Structured Data Extraction

Abstract

Sandro Radovanović

Assistant Professor at University of Belgrade