RO  EN
IMI/Publicaţii/CSJM/Ediţii/CSJM v.33, n.3 (99), 2025/

Improving Anomaly Detection in the HDFS Dataset with Novel Machine Learning Models and Techniques

Authors: Mohammed Bekkouche, Sidi Mohammed Benslimane
Keywords: Anomaly Detection, Log Data, HDFS dataset, Machine Learning, Feature Extraction.

Abstract

With the growing scale and complexity of log data, manual anomaly detection has become increasingly time-consuming and error-prone, necessitating the development of robust machine learning-based solutions. The HDFS (Hadoop Distributed File System) dataset, a large-scale real-world log collection, serves as a standard benchmark for evaluating both supervised and unsupervised anomaly detection methods. Available in two variants—a reduced and a complete version—this dataset facilitates comprehensive performance comparisons. Our empirical analysis reveals significant limitations in existing approaches: supervised methods exhibit poor performance on the reduced dataset, while most unsupervised techniques underperform across both versions. To address these shortcomings, we introduce several novel machine learning approaches for log-based anomaly detection. Additionally, we investigate the effects of alternative feature extraction techniques. We also examine the application of Synthetic Minority Over-sampling Technique (SMOTE) to mitigate class imbalance in supervised learning, as well as the incorporation of temporal features encoding inter-log time intervals. Our experimental results demonstrate that the proposed methods achieve statistically significant improvements in detection accuracy over existing approaches on both HDFS dataset variants, establishing new benchmarks for log-based anomaly detection.

Mohammed Bekkouche
ORCID: https://orcid.org/0000-0002-8305-0542
LabRI-SBA Laboratory, Ecole Superieure en Informatique,
Sidi Bel Abbes, Algeria
BP. 73, Bureau de poste EL WIAM, Sidi Bel Abbes, 22016, Algeria
E-mail:

Sidi Mohammed Benslimane
ORCID: https://orcid.org/0000-0002-7008-7434
LabRI-SBA Laboratory, Ecole Superieure en Informatique,
Sidi Bel Abbes, Algeria
BP. 73, Bureau de poste EL WIAM, Sidi Bel Abbes, 22016, Algeria
E-mail:
351

DOI

https://doi.org/10.56415/csjm.v33.16

Fulltext

Adobe PDF document0.31 Mb