Department
Computer Science and Cybersecurity
Document Type
Poster
Abstract
The prevalence of imbalanced data in network intrusion detection, where benign instances vastly outnumber malicious instances, poses a significant challenge to machine learning classifiers. Boosting algorithms, while powerful in their own right, often show poor performance when classifying minority classes within the data. This study provides an empirical analysis of how stratified sampling may alleviate some of the issues that arise with such a data imbalance. In this study, we will be looking at the performance and robustness of algorithms such as LightGBM, XGBoost, and CatBoost when trained on datasets processed with stratified sampling. The dataset that these models will be trained on will be the CSE-CIC-IDS2018 from the University of New Brunswick. The criteria that these models will be evaluated on are metrics like F1-score, AUC-ROC, and per-class accuracy. The results indicate that stratified sampling does have a relatively positive effect on each model's ability to perform on a novel test set. Although there is likely much room for improvement.
Publication Date
Fall 12-4-2025
Recommended Citation
McDonald, Lance; Olson, Jake; and Campoverde-Lema, Dennis, "An Empirical Analysis of the Impact of Stratified Sampling on the Performance and Robustness of Boosting Algorithms for Imbalanced Network Intrusion Detection" (2025). Student Scholarship. 15.
https://metroworks.metrostate.edu/student-scholarship/15
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Comments
Fall 2025: Student Research Conference