A Framework for Integrating Domain Knowledge in Logistic Regression with Application to Hospital Readmission Prediction

Abstract

It is commonly understood that machine learning algorithms discover and extract knowledge based on data at hand. However, a huge amount of knowledge is available which is in machine-readable format and ready for inclusion in machine learning algorithms and models. In this paper, we propose a framework that integrates domain knowledge in form of ontologies/hierarchies into logistic regression using stacked generalization. Namely, relations from ontology/hierarchy are used in stacking manner in order to obtain higher, more abstract concepts. Obtained concepts are further used for prediction. The problem we solved is unplanned 30-days hospital readmission, which is considered as one of the major problems in healthcare. Proposed framework yields better results compared to Ridge, Lasso, and Tree Lasso Logistic Regression. Results suggest that the proposed framework improves AUC by up to 9.5% on pediatric datasets and up to 4% on morbidly obese patients’ datasets and also improves AUPRC by up to 5.7% on pediatric datasets and up to 2.6% on morbidly obese patients’ datasets on average. This indicates that the inclusion of domain knowledge improves the predictive performance of Logistic Regression.

Publication
In International Conference on INnovations in Intelligent SysTems and Applications
Sandro Radovanović
Sandro Radovanović
Assistant Professor at University of Belgrade

My research interests include machine learning, development and design of decision support systems, decision theory, and fairness and justice concepts in algorithmic decision making.