Font Size: a A A

Research And Application Of Improved Apriori Algorithm On Hadoop

Posted on:2017-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:S S ChenFull Text:PDF
GTID:2308330485491221Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Today, we are surrounded by data.After investigation, These are more than four billion phones and two billion Internet users.so many users keep generating data every hour and moment. At the same time, people also use mobile phones to send text messages or upload to produce their own video or dynamic update their personal information on social networking sites or micro-blogging forwarded to others and so on.Data so quickly to those who grow giant Internet companies presented a great challenge such as Baidu, Taobao, Tencent, Facebook, Amazon, Microsoft.They require to analysis and process huge amounts of data every day.in order to discover what sites people like to click and read or which consumers prefer to buy goods or which attract users to click on ads.But the traditional algorithms and tools are more and more inefficient to deal with the processing ability of the huge amounts of data meanwhile subject to the memory sizeFor the requirements of the subject,these are introduce the research progress and achievements of Hadoop and parallel Apriori algorithms at home and abroad.On this basis, this paper introduces Hadoop technology and data mining technology related concepts and knowledge in detail, which in the Hadoop technology focuses on the two core of Hadoop:HDFS and MapReduce. Next to these are the traditional Apriori algorithm thought, implementation and so on. So I put forward a kind of parallel Apriori algorithm which can improve to analysis and process large data.The improved algorithm is mainly the idea of using Hadoop’s MapReduce to parallel split the original database meanwhile reverse data.At last, the improved algorithm is introduced in detail, and it is also used to verify the feasibility of the algorithm by using the method of case analysis. Through the comparative analysis, the improved algorithm has been greatly improved and its performance has been improved greatly.
Keywords/Search Tags:distributed, the apriori algorithm, data mining, hadoop
PDF Full Text Request
Related items