Font Size: a A A

Research On Frequent Closed Itemsets Mining Algorithms

Posted on:2017-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:S X ShenFull Text:PDF
GTID:2308330485964108Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Under the background of the era of big data, people are drowned in information, but don’t get more useful knowledge. Therefore, data mining technologies arises at the historic moment. In recent years, association rules mining research has become a hot issue in data mining, and is widely used in financial, marketing, business analysis, etc. The main task of conventional algorithms of association rules mining is mining frequent itemsets, however, mining all frequent itemsets may produce too many redundant. Because the order of magnitude of all frequent closed itemsets is far less than the number of all frequent itemsets. Moreover, frequent closed itemsets don’t lose any information. So instead of mining all frequent itemsets, mining frequent closed itemsets is a good choice.In recent years, the uncertain data has get more and more extensive attention. Uncertain data often plays a key role in many areas, such as the economic, financial, telecommunications, logistics and other areas of the application. Uncertain data mining has also become a very important research topic in data mining. Therefore, this paper mainly studies the problems of mining frequent closed itemsets over the exact and uncertain data.The main works of this paper include:(1) states the details of the concepts and related theories about mining frequent closed itemsets on the deterministic and uncertainty data; (2) summarizes the two kinds of mainstream frequent itemsets mining frameworks:the breadth-first mining based on Apriori and the depth-first mining based on FP-tree; (3) detailedly introduces the frequent closed itemsets mining algorithms over exact data, generalizes the advantages and disadvantages of related algorithms, and compared the performance of each algorithm through experiments; (4) deeply analyzes the latest frequent closed itemsets mining algorithm in uncertain data: A-PFCIM algorithm; (5) proposes a novel algorithm called NA-PFCIM (Normal Approximation-based Probabilistic Frequent Closed Itemsets Mining). The new method regards the itemset mining process as a probability distribution function, and it mines frequent itemsets by using the normal distribution model at first, because this model supports large databases and extracts frequent itemsets with a high degree of accuracy. Then, the algorithm adopts the depth-first search strategy to obtain all probabilistic frequent closed itemsets, in order to reduce the search space and avoid redundant computation. Two probabilistic pruning techniques, superset prune and subset prune are also used in this method. Finally, we verify the effectiveness and efficiency of the proposed methods by comparing with the Possion distribution based algorithm called A-PFCIM. The experimental results show that NA-PFCIM can decrease the number of extending itemsets, and reduce the complexity of calculation, it has better performance than the compared algorithm.
Keywords/Search Tags:association rules, frequent itemsets, frequent closed itemsets, depth first strategy, uncertain data
PDF Full Text Request
Related items