Hadoop Dataset for Job Estimation in the Cloud with Limited Bandwidth

Mohammed Bergui, Nikola S. Nikolov, Said Najah

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Hadoop MapReduce is a well-known open source framework for processing a large amount of data in a cluster of machines; it has been adopted by many organizations and deployed on-premise and on the cloud. MapReduce job execution time estimation and prediction are crucial for efficient scheduling, resource management, better energy consumption, and cost saving. In this paper, we present our new dataset of MapReduce job traces in a cloud environment with limited network bandwidth; we describe the process of generating and collecting the dataset in this paper. We believe that this dataset will help researchers develop new scheduling approaches and improve Hadoop MapReduce job performance.

Original languageEnglish
Title of host publicationAdvances in Information and Communication - Proceedings of the 2023 Future of Information and Communication Conference FICC
EditorsKohei Arai
PublisherSpringer Science and Business Media Deutschland GmbH
Pages341-348
Number of pages8
ISBN (Print)9783031280726
DOIs
Publication statusPublished - 2023
Event8th Future of Information and Computing Conference, FICC 2023 - Virtual, Online
Duration: 2 Mar 20233 Mar 2023

Publication series

NameLecture Notes in Networks and Systems
Volume652 LNNS
ISSN (Print)2367-3370
ISSN (Electronic)2367-3389

Conference

Conference8th Future of Information and Computing Conference, FICC 2023
CityVirtual, Online
Period2/03/233/03/23

Keywords

  • Bandwidth
  • Cloud computing
  • Estimating the runtime
  • Hadoop
  • MapReduce

Fingerprint

Dive into the research topics of 'Hadoop Dataset for Job Estimation in the Cloud with Limited Bandwidth'. Together they form a unique fingerprint.

Cite this