HYBRID EVENT: You can participate in person at London, UK or Virtually from your home or work.

6th Edition of Cardiology World Conference

September 15-17, 2025 | London, UK

September 15 -17, 2025 | London, UK
Cardio 2025

AI-driven heart disease risk prediction using transformers: Insights from Framingham and Cleveland datasets

Sai Koundinya Upadhyayula, Speaker at Heart Conferences
Kempegowda Institute of Medical Sciences, India
Title : AI-driven heart disease risk prediction using transformers: Insights from Framingham and Cleveland datasets

Abstract:

Background: Accurate prediction of cardiovascular risk is essential for timely prevention and intervention. While classical Machine Learning (ML) models such as XGBoost and Random Forest have shown promise, they often struggle with class imbalance and limited capacity to capture nonlinear interactions. In this study, we evaluate the utility of transformer-based deep learning models for predicting Cardiovascular Disease (CVD) risk using the Framingham Heart Study dataset (n = 4,240) and the Cleveland Heart Disease dataset (n = 303), incorporating interaction terms and explainability via SHAP (SHapley Additive exPlanations).

Methods: We compared FT-Transformer, SAINT, TabNet, XGBoost, LightGBM, Random Forest, and logistic regression-based stacking. SMOTETomek was used to address class imbalance (15% positive class in Framingham), and missing values were imputed using multivariate imputation by chained equations (MICE). Five-fold cross-validation was performed on both datasets.

The feature set included demographics (age, sex), vital signs (blood pressure, heart rate, mean arterial pressure, pulse pressure), laboratory markers (cholesterol, glucose, renal and electrolyte panels), clinical history (diabetes, hypertension, medications, smoking, alcohol, thalassemia), symptoms (chest pain, angina), and ECG findings and cardiac parameters (ST changes, max heart rate, vessel visualization). Mean arterial pressure and pulse pressure were introduced as engineered features along with existing variables in the Farmingham dataset.

Results: The FT-Transformer achieved the best performance on both datasets—Framingham: F1 score 0.82, accuracy 0.87, AUC 0.91; Cleveland: F1 score 0.89, accuracy 0.93, AUC 0.95. Transformer-based models consistently outperformed classical ML models and ensemble methods. Stacking underperformed relative to its base learners. SHAP analysis consistently identified mean arterial pressure, age, cholesterol levels, prior cardiovascular events, and smoking intensity as the most predictive features across high-performing models.

Conclusion: Transformer-based models offer substantial advantages in modelling imbalanced clinical datasets by effectively capturing complex variable interactions. FT-Transformer outperformed established ML baselines while maintaining interpretability through SHAP. These findings support the use of transformer-based architectures in CVD risk prediction workflows, with the potential to enhance clinical trust and utility through transparent, data-driven insights.

Biography:

Sai Koundinya Upadhyayula is a junior resident in the Department of General Medicine with a strong interest in internal medicine, cardiology, and translational research. His work spans clinical medicine and AI-based research, including publications in neurology, pulmonology, and diabetes. Sai previously worked at the Indian Institute of Science (IISc), where he developed deep learning tools for cardiac and neuro imaging. He is also passionate about medical education and has conducted seminars on AI in healthcare. With experience in both high-volume clinical settings and research environments, Sai is committed to advancing patient care through evidence-based practice, innovation, and continuous learning.

Watsapp