Font Size: a A A

Scalable DEA Model Calculation Research In The Big Data Context

Posted on:2021-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:R F WangFull Text:PDF
GTID:2428330614959902Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Data envelopment analysis(DEA),as a method to evaluate the relative efficiency of Decision Making Units(DMUs),has been widely used in many practical scenarios because of its excellent performance in management and decision-making methods.The traditional DEA efficiency computation is to solve the standard DEA model for each DMU in turn.As the number of DMUs increases,the solution time increases rapidly,making it unsuitable for solving the needs of large-scale DMUs' applications.After combing the relevant literature in the field of DEA big data computing,this paper summarizes the existing methods into two categories,which correspond to the solutions in low-density and high-density data environment.According to these two different data environments,this paper puts forward corresponding strategies to improve the calculation efficiency and reduce the calculation time.In the low-density big data environment,this paper proposes a two-stage calculation method by making full use of the characteristics of DEA model,that is,the first stage is to identify the efficient DMUs' set,and the second stage is to use this set to calculate the efficiency value of the remaining inefficient DMUs.In the first stage,a scalable hierarchical decomposition algorithm based on message passing interface(MPI)is proposed to decompose large-scale datasets into multiple small datasets for parallel computing.At the same time,in a small dataset,a single machine uses MPI to start multiple processes for parallel computation,and combines preprocessing method to quickly remove inefficient DMUs to reduce computing time.In the second stage,for the situation that the number of efficient DMUs is still not suitable for the reference set,an improved trial and error method is proposed,which uses the process information obtained in the first stage to select a better reference point set,and then continuously performs steps of optimality test and re-selection of sample points until the optimum is reached,and the efficiency values of all remaining inefficient DMUs are obtained.Subsequent experiments show that when the dimension is low,the improved method can reduce the number of iterations by more than half.In the context of high-density big data,this article uses Danzig-Wolfe decomposition as the starting point to explore the characteristics of the DEA model.The model is first transformed into a diagonal structure,and then divided into a main problem and multiple sub-problems.In the combination of MPI master-slave mode programming and column generation algorithm,the efficiency value of each DMU is obtained one by one.Finally,in the datasets of multiple different sources,the strategy proposed in this paper is compared with the two methods in the latest literature.In the low-density big data environment,our combination strategy reduces the time by several times.The paper theoretically compares the algorithm proposed in this article with another high-density algorithm,the trial-and-error method,and finds in experiments that in most cases the calculation time is less than the trial-and-error method.
Keywords/Search Tags:Data envelopment analysis(DEA), Big data, Message Passing Interface(MPI), Parallel computing, Dantzig-Wolfe decomposition
PDF Full Text Request
Related items