Preprocessing and Feature Selection Techniques for Enhancing AI Model Performance on Intrusion Detection System Datasets

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Effective preprocessing and feature selection are pivotal for optimizing AI model performance in cybersecurity. This work focuses on the application of advanced preprocessing techniques to the CSECICIDS2017 and CICIDS2018 datasets, emphasizing the use of Naive Bayes and Random Forest algorithms for feature selection alongside Correlation-based Feature Selection (CFS). These methods identify the most relevant features, ensuring the refinement of data for subsequent analysis. Additionally, t-SNE (t-distributed Stochastic Neighbor Embedding) is employed for visualizing high-dimensional data, providing insights into feature distribution and model performance. These methodologies aim to streamline the preprocessing pipeline, improve feature relevance, and facilitate better understanding of data patterns, ultimately advancing the utility of machine learning models in cybersecurity.

Original languageEnglish
Title of host publicationDigital Technologies and Applications - Proceedings of ICDTA 2025, Volume 1
EditorsSaad Motahhir, Badre Bossoufi, Josep M. Guerrero
PublisherSpringer Science and Business Media Deutschland GmbH
Pages113-125
Number of pages13
ISBN (Print)9783032077172
DOIs
Publication statusPublished - 2026
Event5th International Conference on Digital Technologies and Applications, ICDTA 2025 - Ifrane, Morocco
Duration: 17 Apr 202518 Apr 2025

Publication series

NameLecture Notes in Networks and Systems
Volume1639 LNNS
ISSN (Print)2367-3370
ISSN (Electronic)2367-3389

Conference

Conference5th International Conference on Digital Technologies and Applications, ICDTA 2025
Country/TerritoryMorocco
CityIfrane
Period17/04/2518/04/25

Keywords

  • CFS
  • CIC-IDS datasets
  • Feature selection
  • Naive Bayes
  • Preprocessing
  • Random forest
  • t-SNE

Fingerprint

Dive into the research topics of 'Preprocessing and Feature Selection Techniques for Enhancing AI Model Performance on Intrusion Detection System Datasets'. Together they form a unique fingerprint.

Cite this