Research On The Technology Of Mass Data Parallel Mining

Posted on:2015-03-15

Degree:Master

Type:Thesis

Country:China

Candidate:F F Sun

Full Text:PDF

GTID:2268330425488968

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

Data Mining is the period of picking up unknown and interested knowledge from large amount of data by special algorithm.In the age of network information, the data growth is explosive, the traditional serial algorithm is inefficient when processing mass data, and how to improve the efficiency of massive data mining becomes an urgent problem, the parallel data mining is an effective way to solve this problem. Incremental mining is another thought to improve mining efficiency, which use existing knowledge to mining the data set that have been updated.MapReduce is a programming model simple which is proposed by Google and can process mass data with the distributed parallel mode. Compared with other parallel programming models, it’s unnecessary to consider some issues such as the division of data, the allocation of data and scheduling in the process of programming, and can handle nodes failure in the cluster.Association rules has been widely used in electronic commerce, medical diagnosis, weather forecast, Banks, telecommunications and other industries。It has always been hot topics in the study of data mining. In this paper, founding the frequent item in the association rules sets is the starting point, in order to improve the efficiency of finding frequent items in mass data, the parallel and incremental association rule mining algorithm are studied on the basis of the MapReduce.Firstly, the thesis analyzes the association rules algorithm and finds out the shortage of Apriori algorithm. Combining with the logic operation of vectors, the algorithm is improved in three major areas:scanning frequency and the method to generate candidate itemsets and transaction compression. Then, an improved association rules algorithm Apriori_M is devised。Secondly, MapReduce parallel programming model is deeply analyzed. In order to improve the capacity of processing massive data capacity of Apriori_M algorithm, the parallel improvement ideas of algorithm is put forward based on the idea of Partition and implemented with MapReduce programming model.Thirdly, the thesis researches the incremental association rules mining algorithm. Two kinds of parallel incremental association rules mining algorithm are proposed on the basis of the FUP algorithm which can deal with the data set which is dynamically added. The whole algorithm can be divided into generating candidate items and verifying candidate items. MFUP1algorithm sequently generates candidate itemsets, and then parallelly elects frequent itemset from them, which is suitable for the new data set on a smaller scale. However, MFUP2algorithm parallelly generates candidate itemsets, and then parallelly verifies which is frequent, which is suitable for the new data set larger on a larger scale (Combining with with the original data set, it’s still small).Finally, the thesis tests the performance of the parallel association rules algorithm and parallel incremental mining algorithm which are proposed on MapReduce. To verify the performance of the algorithm, the simulation platform which is constructed by the open source Hadoop cloud platform is used to analysis the algorithm. The experimental results show that the parallel Apriori_M algorithm, MFUP1and MFUP2algorithm can efficiently find frequent itemsets from mass data, therefore the improved algorithm is feasible and effective.

Keywords/Search Tags:

Mass data, Parallel mining, asocciation rules, Incremental mining

PDF Full Text Request

Related items

1	Research And Application Of Web Log Mining System Based On Association Rule
2	The Research And Application Of Data Mining In Mining Rules Of Medical Diagnosis
3	Research And Application Of Incremental Association Rules Algorithm Based On An Improved FP-tree
4	Research Of Paralleled Frequent Subgragh Mining Algorithm PG-Miner Based On Claster Environment
5	Incremental Mining Method Research Of Association Rules Of Web Log Based On Clustering Partition
6	Research On Association Rules Algorithm In Data Mining
7	Association Rule Mining Algorithms, Data Mining Technology
8	Design Of Frequent Pattern Mining Algorithm LPS-Miner And Research On Parallel Formulations
9	Research On Incremental Mining Algorithm And Application For Dynamic Databases
10	Research On Incremental Updating Association Rules Mining Based On Apriori Algorithm