DAG-CPM Scheduler for Parallel Execution of Critical Jobs
D C Vinutha1, G T Raju2

1D C V*, Research Scholar, Dept. of CSE, RNS Institute of Technology, Bengaluru, Associate Professor, Dept. of ISE Vidyavardhaka College of Engineering, Mysuru. Visvesvaraya Technological University, Belagavi, Karnataka India.
2G T R, Professor, Dept. of CSE, RNS Institute of Technology, Bengaluru, Visvesvaraya Technological University, Belagavi, Karnataka India.
Manuscript received on July 20, 2019. | Revised Manuscript received on August 10, 2019. | Manuscript published on August 30, 2019. | PP: 467-474 | Volume-8 Issue-6, August 2019. | Retrieval Number: E7862068519/2019©BEIESP | DOI: 10.35940/ijeat.E7862.088619
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Map Reduce applications having multiple jobs may be dependent on each other such as iterative Page View application [2] performs the required operation in several iterations before generating the result. Each iteration is considered as single job. Conventional Hadoop MapReduce schedules the jobs sequentially, but not customized to handle multi job application. Also, it will not perform the parallel execution of the dependent jobs. This prolongs the execution time to complete all the jobs. Therefore a new scheduler DAG–CPM Scheduler uses the critical path job scheduling model, to identify the jobs present in the critical path. Critical path job scheduling is optimized to offer support for multi job applications, critical path job is a series of jobs, if execution of a job is delayed, then time required to execute all jobs will be prolonged. DAG–CPM Scheduler schedules multiple jobs by dynamically constructing the job dependency in DAG for the currently running job based on the input and output of a job. DAG represents the dependency among the jobs, this dependency graph is used to insert a pipeline between the output of one job as input for map tasks of another job and it executes the dependent jobs in parallel which results into a substantial reduction in the execution time of an application. Experimental analysis on the proposed approach has been carried out on Page View application on Academic and research web server log file, such as, NASA and rnsit.ac.in of 10 GB data set. PigMix2 is executed on 8GB data set. Experimental results reveal that the average execution time is decreased by 41% compared to Hadoop in respect Page View application and Execution speed is 37.7% faster compared to Pig and DAG–CPM Scheduler can run 24.3% faster when compared to DAG–CPM Scheduler without pipeline.
Keywords: Critical Path, DAG, MapReduce, Pipeline.