Font Size: a A A

A Distributed Vertical Frequent Pattern Ninon Metadata Intemig Algorithm Based Gration

Posted on:2015-10-12Degree:MasterType:Thesis
Country:ChinaCandidate:J N YinFull Text:PDF
GTID:2298330431986352Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The coming massive data era caused that the size of data sharp grew andsearching space expanded.It was the new challenge to the data mining, the demandfor mass data of data mining is stronger and stronger. In order to solve theeffectiveness and poor efficiency problem of traditional data mining techniques in themassive and high-dimensional data sets, improved the existed data mining algorithmsto adapt to the actual conditions of the massive and high-dimensional data, improvedthe algorithm’s execution efficiency, targeted, and the quality of the mining resultsunder the massive data.In the full understanding of the principle of the vertical frequent pattern miningand the problem under the massive data, a distributed vertical frequent pattern miningalgorithm based on metadata integration was proposed in this paper, including adistributed vertical frequent pattern mining algorithm based on metadata integrationand a loading balance strategy for distributed vertical frequent pattern mining. Thealgorithm included three parts. Firstly, sampled a small amount of data samplesand calculated attribute correlation based on the rules generated on samples. Secondly,finished the data partition which can divide the data into several independent datablock according to the attribute correlation. Finally, built the frequent pattern tree forevery data blocks, mining the vertical frequent pattern tree to generate rules. Becausethe model of vertical frequent storage structure can ensure that the mining resultswere global results, therefore there was no need to combine the local mining results.At the same time, a load balancing strategy for distributed vertical frequent patternmining needed to differentiate the status for every site which was calculated by thelocal site processing capacity and network capacity firstly, and then assigned the taskaccording to the different status of the site. Finally, through the experiment analysisto verify the proposed the distributed vertical frequent pattern mining algorithm basedon metadata integration, experiment shows that the proposed the distributed verticalfrequent pattern mining algorithm based on metadata integration was more efficiency in time under the large-scale data set compared with the traditional association rulemining algorithm. The load balance strategy used to improve the efficiency ofalgorithm showed that it was better than the traditional load balance strategy in timeefficiency and load balancing degree.In conclusion, this paper included the following three parts: first, analyzed theresearch background and significance, reviewed the pertinent literature; And then putforward a distributed vertical frequent pattern mining method based on metadataintegration and the load balancing strategy for distributed vertical frequent patternmining was presented in order to further improve the algorithm performance; Finally,did the experiment analysis for the proposed method, summary and forecasting.
Keywords/Search Tags:distributed association rules, metadata integration, load balancing, vertical frequent pattern, data partition
PDF Full Text Request
Related items