Department
Computer Science and Cybersecurity
Document Type
Poster
Abstract
Machine learning is increasingly used in healthcare for disease prediction, but small datasets pose challenges. With limited samples, the choice of sampling strategy, how data is divided for training and validation, can significantly affect reported model performance. Two common approaches, K-Fold Cross-Validation and Repeated Random Sub-sampling, may yield different results for the same model. Understanding which algorithms are most sensitive to this variability is critical for clinical applications where consistent, reliable predictions matter. This study evaluates the sensitivity of three classification models: K-Nearest Neighbors, Gaussian Naive Bayes, and Neural Networks to sampling strategy using the Heart Disease dataset.
Publication Date
Spring 4-9-2026
Recommended Citation
Haxton, H., Mohamud, A., Reitz, S. & Perez-Villa, I. (2026, April 9). Sensitivity of machine learning models to sampling variability [Poster presentation]. Student Research Conference Spring 2026, Saint Paul, MN, United States. https://metroworks.metrostate.edu/student-scholarship/37
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Comments
Spring 2026: Student Research Conference
Distinguished Presenter Award: Amina Mohamud
Excellence in Knowledge Sharing Award: Amina Mohamud