Department

Computer Science and Cybersecurity

Document Type

Poster

Abstract

The prevalence of imbalanced data in network intrusion detection, where benign instances vastly outnumber malicious instances, poses a significant challenge to machine learning classifiers. Boosting algorithms, while powerful in their own right, often show poor performance when classifying minority classes within the data. This study provides an empirical analysis of how stratified sampling may alleviate some of the issues that arise with such a data imbalance. In this study, we will be looking at the performance and robustness of algorithms such as LightGBM, XGBoost, and CatBoost when trained on datasets processed with stratified sampling. The dataset that these models will be trained on will be the CSE-CIC-IDS2018 from the University of New Brunswick. The criteria that these models will be evaluated on are metrics like F1-score, AUC-ROC, and per-class accuracy. The results indicate that stratified sampling does have a relatively positive effect on each model's ability to perform on a novel test set. Although there is likely much room for improvement.

Publication Date

Fall 12-4-2025

Comments

Fall 2025: Student Research Conference

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.