Font Size: a A A

Study On The MapReduce Framework For Genetic Algorithm Based Distributed Data Mining

Posted on:2017-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:L M HanFull Text:PDF
GTID:2348330515464179Subject:Computer technology engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the information technology in recent years,generated incalculable mass data by directly or indirectly way,which presented a new challenge to traditional data mining algorithms,how to improve the versatility and performance of traditional data mining algorithms under massive data environment become a research hotspot.To solve this problem,the researcher integration traditional data mining algorithms and emerging technologies such as cloud computing platform,the use of distributed computing capabilities improve the performance of the algorithm and achieved good results.However,due to a wide range of data mining algorithms,a single data mining algorithm require specific implement mode,there is no common framework to meet the diversity of data mining algorithms and can also improve the performance of the algorithm.Based on previous experience,we propose a MapReduce framework for Genetic Algorithm based distributed data mining,designed to help users deal with more general data mining algorithms and improve the performance of the algorithm.MapReduce is one of the key elements of the framework to provide a good distributed computing power,the other key element is Genetic Algorithm,which has good global search and optimization capabilities,by simulating population evolutionary way to search the optimal solution,so the users only need to implement their Genetic Algorithm without having to worry about the algorithm's parallelization.The main contribution of this paper is as follows,we propose a MapReduce framework for Genetic Algorithm based distributed data mining in the big data environment.The framework divide into the Core layer and the User layer,the Core layer of the framework encapsulates the MapReduce algorithm operation,User layer provides the User interface to expand and achieve specific Genetic Algorithm of the issues,can deal with data mining algorithms effectively with the massive data.The framework include six components,the main componet of the framework is Driver,its main function is to achieve user interaction and responsible for starting MapReduce Jobs on the cluster.Generator component main role is to call Genetic Algorithm in the User layer,and the start the Job with Driver to complete the evolution of the population.The role of Terminator component assembly process is to determine whether to terminate the Generator conditions are met.Initialiser component is responsible for initializing the population and it's optional.The Migrator component is responsible for population migration strategy,implemented by The User layer function.Final assembly is SolutionFilter component,it is the qualified individuals screened,each component of mutual cooperation to complete the framework function.In this paper,we verify the performance with three algorithms.The first is clustering algorithm K-Medoids,we designed and implemented the Genetic Algorithm for K-Medoids,and take clustering accuracy as individual fitness value,MapReduce strengthening clustering calculation,experiments have shown good result in clustering result and performance.Followed by the Traveling Salesman Problem,we designed and implemented the Genetic Algorithm for TSP,and take the reciprocal distance of the trip as fitness value,the higher the fitness value of the shorter distance of the individual,the experimental results show that the TSP under the framework running efficiently than the same level algorithm and can quickly find the optimal solution under the big data.Finally is the Feature Subset Selection problem,we designed and implemented the Genetic Algorithm of FSS,and take the classification accuracy as fitness value,experimental results show that the FSS running in the framework can be more rapid convergence and improve the classification accuracy.In summary,the MapReduce framework for Genetic Algorithm based distributed data mining in the big data environment with a good performance when mining algorithms,implemented by the genetic algorithm-specific issues,the use of distributed computing algorithms to improve performance,while taking advantage of global Genetic algorithm search optimization ability to quickly find the optimal solution,studies have shown that the architecture of data mining algorithms help in dealing with the effect of massive data and performance has been improved.
Keywords/Search Tags:Big data, MapReduce, Genetic Algorithm, Data Mining, Framework
PDF Full Text Request
Related items