Font Size: a A A

Research And Application Of Data Mining Algorithms Based On Hadoop

Posted on:2020-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhouFull Text:PDF
GTID:2428330578465311Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the information age,more and more people pay attention to the importance of data.How to deal with large-scale data integration is a research hotspot because of many types and huge amounts of data.Although data mining algorithm can process data,it is difficult for data mining algorithm to deal with large-scale data sets because of its own shortcomings and large data volume.Combining data mining algorithm with Hadoop platform is the future research direction of data mining algorithm.There are many data mining algorithms,which are difficult to study one by one.So K-means clustering algorithm is taken as an example to study.The main contents of this paper are as follows:Parallelization of improved clustering algorithm based on attribute weight,namely WK-means.In the objective function of K-means clustering algorithm,the weight of each attribute is 1,which means that the status of each attribute is equal.In fact,the impact of each attribute on sample classification is different.Based on this point of view,an improved K-means clustering algorithm based on attribute weight is proposed.In order to verify the effectiveness of the algorithm,the algorithm is migrated to Hadoop.Data sets are used to test the improved clustering algorithm running on Hadoop.Parallelization of Cluster Genetic Algorithms(CAGAK) with Genetic Algorithms.The classical K-means has many shortcomings,such as the determination of K value,easily falling into local optimal solution and affected by initial centers.Genetic algorithm can be used to solve the problem that K-means algorithm is easy to fall into local optimal solution because of its global and parallel nature.In view of the shortcomings of genetic algorithm and its improved genetic algorithm,an improved genetic algorithm is proposed to verify the rationality of the algorithm.Data sets were used to verify the clustering effect of the improved genetic clustering algorithm.Data sets are used to test the Cluster Genetic Algorithm running on Hadoop.Design and Implementation of Data Mining and Analysis System Based on Cloud Platform.The improved clustering algorithm based on attribute weight(WK-means) and genetic clustering genetic algorithm(CAGAK) are migrated to the algorithm library of data mining and analysis system.Users of the system can select appropriate data mining algorithms according to the nature of the problem,configure appropriate parameters,and process the selected data.Running results are displayed visually.The development environment of the analysis system is Eclipse,the frame structure is SSH(Spring+Struts+Hibernate),and the external interface is Rest API.WK-means and CAGAK are improved algorithms for the different shortcomings of K-means,and they are independent of each other.The two algorithms are located in Chapter 3 and Chapter 4 respectively,and the relationship between Chapter 3 and Chapter 4 is juxtaposed.
Keywords/Search Tags:attribute weight, genetic algorithm, K-means clustering algorithm, Hadoop
PDF Full Text Request
Related items