Font Size: a A A

Research On Optimization And Application Of Association Rules Algorithm Based On Cloud Platform

Posted on:2018-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:W H TaoFull Text:PDF
GTID:2348330518968593Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,the network has penetrated into all aspects of life.The Internet is rich,convenient for the public's life,and even to some extent,changed the way which people work.With the wide application of Internet technology,the scale of the data generated by the background information is massive.How to dig out the value of information in the big data has been the focus of the industry.From the large noisy data sets mining association rules between things,data mining is a widely used algorithm technology.But the traditional stand-alone data mining cannot achieve a comprehensive analysis of massive data,cloud computing provides a new way for data mining industry,Hadoop cloud platform Apache foundation developed technology to reduce the threshold of the development of cloud computing,cloud computing platform of the parallel association rules algorithm and improved technology combine to achieve the massive data mining operation better,that contains information on the data centralized rules,so as to provide a better decision for commercial applications.In this thesis,the traditional Apriori algorithm for the study of the theory,through the analysis of the execution procedure of the algorithm to identify the key points can be optimized,the algorithm was improved,the improved Apriori algorithm and Hadoop platform combining the deployment algorithm parallelization algorithm in cloud platform,in order to achieve the processing of massive data.In this thesis,the current research status and development of cloud computing and data mining technology are discussed in detail.In the Hadoop technology,two core technologies,HDFS and MapReduce,are emphasized.In the third chapter,the traditional Apriori association algorithm is analyzed,and the shortcomings of the algorithm implementation are discussed in the form of examples.The existing methods of optimizing the algorithm are introduced,and the performance comparison is listed.The fourth and the fifth chapter is the core content of the research,the main contents are: the fourth chapter in the traditional Apriori algorithm is proposed to improve the algorithm,reduce the time complexity of the implementation,to improve the execution efficiency of the algorithm;then it introduces the concept of the interest threshold for mining the rules for further screening,improve and the availability of strong association rules,and the line graph will experiment and analysis results show that compared with the conclusion.The fifth chapter mainly introduces the construction of Hadoop platform process and conventional configuration,expounds the algorithm parallelism,the retail industry association analysis technology demand for cloud computing,the Apriori algorithm to optimize the deployment on the Hadoop platform and the ordinary serial algorithm efficiency compared to the experimental results,discusses the feasibility of parallel and the advantage.
Keywords/Search Tags:cloud computing, data mining, Hadoop, Apriori
PDF Full Text Request
Related items