Research On MapReduce And Related Sorting Problems In Big Dat | | Posted on:2024-11-12 | Degree:Doctor | Type:Dissertation | | Country:China | Candidate:Q C Zheng | Full Text:PDF | | GTID:1520307211996829 | Subject:Operational Research and Cybernetics | | Abstract/Summary: | PDF Full Text Request | | Scheduling theory is an important branch of combinatorial optimization,which has wide applications in computer systems,cloud manufacturing,transportation scheduling,and other fields.This thesis mainly considers two kinds of scheduling problems in big data:MapReduce scheduling problems and scheduling problems with rejection.MapReduce is a programming model designed for data analysis and computation in big data environments.MapReduce scheduling mainly studies the scheduling problem of maximizing the minimum machine completion time and the semi-online scheduling problem of minimizing the maximum completion time in the MapReduce system.In the MapReduce scheduling problems,each job has a set of map tasks and a set of reduce tasks.Each map task can be arbitrarily split into multiple subtasks,which can then be processed simultaneously on different machines.Reduce tasks,on the other hand,do not allow for parallel processing but can be interrupted and resumed at a later time.Additionally,the reduce tasks of a job cannot be processed until all related map tasks have been completed.In big data processing,utilizing scheduling with rejection allows for more efficient handling of large datasets and improves performance and efficiency.Scheduling with rejection mainly studies the scheduling problem with the goal of minimizing the maximum completion time under the constraint that the total rejection cost does not exceed a given threshold.The entire paper is divided into six chapters,with the following specific arrangements:In Chapter 1,we first introduce the basic concepts related to scheduling problems and the necessary preliminary knowledge.We then summarize the research findings related to MapReduce scheduling problems,machine covering problems,semi-online scheduling problems,and scheduling problems with rejection pertinent to this article.Finally,a brief summary of the main results of this thesis is provided.In Chapter 2,we primarily discuss the uniform MapReduce off-line scheduling problem,where the reduce tasks are preemptive.The goal is to maximize the minimum machine completion time.We first present an offline optimal algorithm for the case of two uniform machines and then design an offline optimal algorithm for three uniform machines based on this.In Chapter 3,we primarily investigate the MapReduce online problem for two parallel machines.The jobs arrive over-list.The objective function is to maximize the minimum machine completion time.We suppose that reduce tasks are preemptive,and consider both the same and different machine speeds,respectively.For the case of two identical machines,we prove the lower bound of the problem and design a best possible algorithm with a competitive ratio of(2+(?))/2.We extend the case of two identical machines to the case of two uniform machines and prove that the lower bound of the problem is((?)+s2+s+2)/(2s+2),where s is the ratio of the speeds of the two machines.We then provide a best possible algorithm with an equal competition ratio and lower bound.In Chapter 4,we mainly focus on the MapReduce semi-online scheduling problem for two identical machines,with the objective of minimizing the makespan.We consider a special case where each job consists of only one map task and one reduce task,and the processing time of the map task is not greater than that of the reduce task.Under this assumption,we study two cases:first,the largest processing time of the reduce task among all jobs is known;second,the largest processing time of the reduce task and the sum of the processing times of all jobs are known.For the first case,a best possible semi-online algorithm with a competitive ratio of 4/3 is designed.For the second case,an approximation algorithm with a competitive ratio of 4/3 is developed,which is best possible under certain conditions.In Chapter 5,we primarily investigate the scheduling problem with rejection,where the total cost of rejected jobs does not exceed a given threshold.The aim is to minimize the maximum completion time of the accepted jobs.We show that this problem is NP-hard for a single machine,which also implies that the other scheduling problems with rejection studied in this chapter are NP-hard.For the case of m identical machines,a pseudo-polynomial time algorithm and two fully polynomial-time approximation schemes are devised.For the case with m identical machines where jobs have arrival times,a pseudo-polynomial time dynamic programming algorithm and a fully polynomial-time approximation scheme are designed for fixed m,and an approximation algorithm with an approximation ratio of 2 is developed when m is arbitrary.Finally,the online scheduling problem with rejection for a single machine is considered,demonstrating that no online algorithm can have a constant competitive ratio,even if there are only two distinct job arrival times.In Chapter 6,we summarize the main achievements of this study and propose a series of questions and research directions that need to be further explored in response to the limitations of current research and potential future development trends. | | Keywords/Search Tags: | Scheduling problem, Best possible algorithm, Fully polynomial-time approximation scheme, Dynamic programming algorithm, Competitive ratio, Machine covering, Semi-online, Scheduling with rejection, MapReduce | PDF Full Text Request | Related items |
| |
|