TY - JOUR
T1 - Datasets for distributed denial-of-service detection in healthcare internet of things environments
AU - Akhi, Mirza
AU - Eising, Ciarán
AU - Dhirani, Lubna Luxmi
N1 - Publisher Copyright:
© 2025 The Author(s).
PY - 2025/12
Y1 - 2025/12
N2 - The growing number of Internet of Things (IoT) devices in healthcare settings raises critical concerns about security, particularly in defending against Distributed Denial-of-Service (DDoS) attacks. These attacks can cause operational downtime in IoT environments. To mitigate DDoS-based attacks, advanced defense-in-depth strategies and well-labeled datasets are required for Healthcare-IoT (H-IoT), IoT, and other distributed computing contexts. This article presents two labeled datasets, UL-ECE-MQTT-DDoS-H-IoT2025 and UL-ECE-UDP-DDoS-H-IoT2025 , generated by simulating realistic traffic patterns under both normal and DDoS conditions using the Cooja and ns-3 simulators. In Cooja, the raw dataset records healthcare-specific Message Queuing Telemetry Transport (MQTT) traffic (e.g., simulated oxygen level of 100 %) randomly generated by emulated H-IoT sensors. It also includes message counts and network metadata that enable detailed analysis across both application and network layers. In ns-3, the raw data comprises 5G-enabled H-IoT network traces from all nodes, capturing timestamps, payload size, and the header details of the User Datagram Protocol (UDP). Existing benchmark datasets mainly consist of generic network traffic attributes, including packet IDs, protocol types, and timestamps. In contrast, the proposed datasets address this gap by incorporating H-IoT-specific communication parameters that closely resemble real-world conditions, such as node-level message counts and monitoring frequencies. This inclusion provides a realistic representation of communication patterns for security and performance research in H-IoT. The datasets enable detailed analysis of key features for detecting DDoS threats, including UDP flood variants extending beyond the H-IoT domain. This characteristic makes them directly usable for developing, testing, and comparing machine learning (ML) and deep learning (DL) models across diverse IoT security contexts. The MQTT-based dataset is derived from a 5-hour simulation run using the Cooja simulator, which emulates wearable sensors such as body temperature, heart rate, and oxygen saturation. In this setup, normal H-IoT nodes transmit data to the server at 60-second intervals, while DDoS-affected nodes publish data at 20-second intervals to simulate a higher transmission frequency. The UDP-based dataset is derived from a 120-second simulation conducted using the ns-3 simulator, which simulates a 5G-enabled H-IoT environment. In this scenario, normal and malicious nodes transmit data at 124 kbps and 248 kbps, respectively. Both datasets are processed from raw simulation logs converted into structured CSV files using Python scripts. The CSV files contain features such as timestamp, payload size, message frequency, and node-level communication statistics. The UL-ECE-MQTT-DDoS-H-IoT2025 and UL-ECE-UDP-DDoS-H-IoT2025 datasets contain approximately 20,080 and 99,887 records, respectively. The primary objective of creating these datasets is to enhance security in healthcare IoT ecosystems by enabling robust detection of advanced cyber threats. In line with this objective, the datasets support the development of ML/DL-based cybersecurity mechanisms. In addition, this resource forms a foundation for future research, motivating the creation of new datasets for emerging attack scenarios.
AB - The growing number of Internet of Things (IoT) devices in healthcare settings raises critical concerns about security, particularly in defending against Distributed Denial-of-Service (DDoS) attacks. These attacks can cause operational downtime in IoT environments. To mitigate DDoS-based attacks, advanced defense-in-depth strategies and well-labeled datasets are required for Healthcare-IoT (H-IoT), IoT, and other distributed computing contexts. This article presents two labeled datasets, UL-ECE-MQTT-DDoS-H-IoT2025 and UL-ECE-UDP-DDoS-H-IoT2025 , generated by simulating realistic traffic patterns under both normal and DDoS conditions using the Cooja and ns-3 simulators. In Cooja, the raw dataset records healthcare-specific Message Queuing Telemetry Transport (MQTT) traffic (e.g., simulated oxygen level of 100 %) randomly generated by emulated H-IoT sensors. It also includes message counts and network metadata that enable detailed analysis across both application and network layers. In ns-3, the raw data comprises 5G-enabled H-IoT network traces from all nodes, capturing timestamps, payload size, and the header details of the User Datagram Protocol (UDP). Existing benchmark datasets mainly consist of generic network traffic attributes, including packet IDs, protocol types, and timestamps. In contrast, the proposed datasets address this gap by incorporating H-IoT-specific communication parameters that closely resemble real-world conditions, such as node-level message counts and monitoring frequencies. This inclusion provides a realistic representation of communication patterns for security and performance research in H-IoT. The datasets enable detailed analysis of key features for detecting DDoS threats, including UDP flood variants extending beyond the H-IoT domain. This characteristic makes them directly usable for developing, testing, and comparing machine learning (ML) and deep learning (DL) models across diverse IoT security contexts. The MQTT-based dataset is derived from a 5-hour simulation run using the Cooja simulator, which emulates wearable sensors such as body temperature, heart rate, and oxygen saturation. In this setup, normal H-IoT nodes transmit data to the server at 60-second intervals, while DDoS-affected nodes publish data at 20-second intervals to simulate a higher transmission frequency. The UDP-based dataset is derived from a 120-second simulation conducted using the ns-3 simulator, which simulates a 5G-enabled H-IoT environment. In this scenario, normal and malicious nodes transmit data at 124 kbps and 248 kbps, respectively. Both datasets are processed from raw simulation logs converted into structured CSV files using Python scripts. The CSV files contain features such as timestamp, payload size, message frequency, and node-level communication statistics. The UL-ECE-MQTT-DDoS-H-IoT2025 and UL-ECE-UDP-DDoS-H-IoT2025 datasets contain approximately 20,080 and 99,887 records, respectively. The primary objective of creating these datasets is to enhance security in healthcare IoT ecosystems by enabling robust detection of advanced cyber threats. In line with this objective, the datasets support the development of ML/DL-based cybersecurity mechanisms. In addition, this resource forms a foundation for future research, motivating the creation of new datasets for emerging attack scenarios.
KW - Anomaly detection
KW - Cooja simulator
KW - Cybersecurity
KW - IoT traffic features
KW - Machine Learning for cyber defense
KW - Network simulation
KW - Ns-3 simulator
KW - Traffic analysis
UR - https://www.scopus.com/pages/publications/105024443093
U2 - 10.1016/j.dib.2025.112222
DO - 10.1016/j.dib.2025.112222
M3 - Article
AN - SCOPUS:105024443093
SN - 2352-3409
VL - 63
JO - Data in Brief
JF - Data in Brief
M1 - 112222
ER -