Heart Disease Prediction Using Machine Learning: A Stacked Ensemble Approach with Imbalanced Data Handling

    DOI: https://doie.org/10.10399/JBSE.2025446080

    Aadarsh Chaudhary , Aditi Sharma


    Keywords:

    Heart Disease Prediction, Machine Learning, Ensemble Learning, Stacking Classifier, Class Imbalance, SMOTE, Feature Selection, Clinical Risk Prediction


    Abstract:

    Heart disease is a major cause of death globally, and hence there is a demand for precise and timely prediction models. In this research, a holistic machine learning model for heart disease classification is introduced based on a real dataset with 16 features derived from various clinical and physiological data,with total size of approximately 320,000 records . The dataset comprised demographic, behavioral, and clinical attributes and was characterized by substantial class imbalance. For this purpose, we have utilized a resampling pipeline that incorporates SMOTE, Borderline SMOTE, and Tomek Links. Various models were trained, such as Random Forest, XGBoost, Logistic Regression, and Deep Learning, all with correlation and mutual information-based feature selection. Optimal performance was realized by employing a stacking classifier with Random Forest and XGBoost as base learners and  Logistic Regression as meta-learner, which resulted in 81% accuracy.The proposed ensemble demonstrated outstanding performance overall with accuracy being 80.97% and AUC being 0.8158. However, the heart disease instances' recall remained moderate at 54.68%, confirming the continuing challenge in detecting minority class instances. These findings highlight the strength of ensemble learning. Advanced resampling techniques can be used to enhance clinical risk prediction when there are imbalanced datasets.


    PDF

Indexed By