Font Size: a A A

Research On Frequent Itemsets Mining Algorithm Based On Vertical Format

Posted on:2016-07-01Degree:MasterType:Thesis
Country:ChinaCandidate:T Y LiFull Text:PDF
GTID:2308330464967969Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Mining frequent itemsets in the "explosion" of the modern era of data, has an increasingly important role, which is to solve practical problems of a very importan t tool. Many scholars have proposed many effective and efficient frequent item sets mining algorithms to solve practical problems in the last 20 years.However, With our data middleweight grow exponentially multiply our frequent itemsets mining alg orithm efficiency increasing emphasis.In this paper, Firstly, the research results at home and abroad about the freque nt item sets were summarized, and the results of these studies conducted in-depth a nalysis. And we analyze the Apriori algorithm and FP-growth algorithm which bas ed on the level of data formats,as well as Eclat algorithm and Eclat_opt algorithm which based on the vertical of data formats, and The specific circumstances of the theory of algorithms, implementation process and performance in time and space pe rformance. The goal is to raise the performance of a more efficient algorithm to m ake full theory reserves. This paper is based on the combined results of these studi es are some of the advantages of the proposed new mining algorithm frequently FDSL algorithm,which based on the vertical of data formats. The algorithm make the level of the data structure into a vertical data storage structure ordered and con structed in accordance with the structure corresponding bitmap. By scanning this bit map, to construct an ordered search list, Then use the depth-first search strategy, th is orderly depth search search list, so you can generate a set of candidate support and candidate sets simultaneously, which in O (n) time complexity, we can generate the corresponding itemsets.To prove FDSL proposed algorithm performance, we use C++ language were realized Apriori count, Eclat algorithms and algorithms FDSL, And the use of different data sets of these three algorithms were compared with full points and get a lot of experimental data. Experimental results show that the proposed algorithm with the traditional based on horizontal format Eclat algorithm Apriori algorithm and vertical formats on different support threshold had a full comparison found that the algorithm (FDSL) proposed in Time Performance with high efficiency.
Keywords/Search Tags:vertical format, ordered search list of the search, Apriori, Eclat, FDSL
PDF Full Text Request
Related items