Research On Pattern Mining Based On Sampling In Big Data

Posted on:2015-12-12

Degree:Master

Type:Thesis

Country:China

Candidate:J Ai

Full Text:PDF

GTID:2298330422491924

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the concept of cloud computing and mobile Internet comes into the livesof ordinary people more and more deeply, the big data is also becoming increasinglypopular. In today’s competitive business battlefield, the one who can grasp the keyto crack the big data will be the one who could be able to stay ahead in the businessworld. However, for the exploration and research on algorithms of the big data isunable to meet the people’s need to dig out a lot of valuable knowledge from themassive information. Therefore, the study of data mining algorithms on big data isextremely important.Frequent patterns mining is a subject of extensive research and a very valuableresearch subject. In the past20years, a variety of frequent pattern miningalgorithms have been proposed. Briefly, it includes three types of algorithms in total.The first class is "Candidate-test" pattern-based apriori algorithm and its expansionalgorithm. The second class is the FP-Growth algorithm and the expansionalgorithm. The last is vertical mining algorithms. However, all of the frequentpattern mining algorithms of the three type have common shortcomings. In today’ssharp increase in the amount of data, these algorithms have been unable to meet theneeds of a large amount of data mining. On the one side, the data is so large that itcan’t be stored in the memory. What more, the amount of data improve rapidly thatit boost the running time of the algorithm, so it can’t meet the actual requirementsof the people. The efficiency of Mining algorithm still needs to be improved, whilethe research of mining algorithms on big data is not enough. So to propose new，efficiency and efficient pattern mining algorithm is meaningful. Boley et alproposed a direct sampling method in the pattern space, greatly improved the timecomplexity, while the effectiveness of its excavated pattern can’t be guaranteed.This paper improves the direct sampling algorithm by verify and updatesampling results. What more, the paper improve the two-step random procedure. Weadjust the length of the excavation pattern by control the probability threshold, so asto increase effectiveness of the mined pattern with the cost of not a very big timecomplexity. Through experiments, we can see the enhanced direct sampling methodcan be a good method to improve the effect of mining algorithms. Meanwhile, we propose a distributed enhanced two-stage random samplingalgorithm based on Map-Reduce. The algorithm solve the problem of sampling withweights (WAS) by A-RES/A-ExpJ algorithm, to solve the sampling problem inMap-Reduce framework. And we find solution the obtain of low-frequency itemsetsby lossy-counting algorithm, to facilitate pattern validation process. Thus, thealgorithm is well migrate to the Map-Reduce framework.

Keywords/Search Tags:

pattern mining, sampling, big data, Map-Reduce

PDF Full Text Request

Related items

1	Research On Algorithm Of Large Data Set Sequential Pattern Mining
2	The Research On Sampling For Data Mining
3	Research On Optimal Reduce Placement Algorithm Based On Data Skew
4	Research On Mining Adjoint Pattern Of Spatial-Temporal Trajectory Data In Cloud Computing Environment
5	Based On Data Mining Techniques To Reduce The False Alarm Rate Of Intrusion Detection Systems
6	Pattern Mining Algorithms Over Data Streams
7	The Development And Research Of Audit Sampling Systems On The Basis Of Data Mining Technology
8	Research And Application Of Parallel Data Mining Algorithms Based On MapReduce
9	Analysis On Sampling Complexity Of Association Rule Mining
10	Research And Application Of Mining Access Sequential Pattern In Weblog