Department

Computer Science and Cybersecurity

Document Type

Poster

Abstract

Machine learning is increasingly used in healthcare for disease prediction, but small datasets pose challenges. With limited samples, the choice of sampling strategy, how data is divided for training and validation, can significantly affect reported model performance. Two common approaches, K-Fold Cross-Validation and Repeated Random Sub-sampling, may yield different results for the same model. Understanding which algorithms are most sensitive to this variability is critical for clinical applications where consistent, reliable predictions matter. This study evaluates the sensitivity of three classification models: K-Nearest Neighbors, Gaussian Naive Bayes, and Neural Networks to sampling strategy using the Heart Disease dataset.

Publication Date

Spring 4-9-2026

Comments

Spring 2026: Student Research Conference

Distinguished Presenter Award: Amina Mohamud

Excellence in Knowledge Sharing Award: Amina Mohamud

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.