Font Size: a A A

Research On Algorithms Of Mining Frequent Closed Itemsets From Uncertain Data

Posted on:2017-06-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y M MiaoFull Text:PDF
GTID:2348330482991341Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of the Internet and the advances in science and technology, all walks of life emerge vast amounts of data, the traditional mining frequent itemsets techniques can not handle these data. For example, in the economic, financial, telecommunications and other industries emerged vast amounts of data. In this context data mining technology become more and more important. It can effectively solve the issue of we have vast amounts of data but lack of knowledge. The problem of frequent patterns mining always been the focus in the field of data mining problem. Mining results can guide people get a more effective program. Such as the classic case beer and diapers.However, the results of data mining are often not satisfactory. Because of the vast amount of data and the min support lower the number of frequent itemsets and the quantity of mined association rules often is huge. Obviously this is not the result we want. The general solution idea is to make the results produced more representative subset. Such as the maximum frequent itemsets and frequent closed itemsets. But the maximum frequent itemsets will lost information.So we choose the frequent closed itemsets to replace the frequent itemsets.In recent years, uncertainty data widespread emerged sensor networks, satellite imagery information, Web applications, radio frequency technology as well as economic, logistics, telecommunications and other applications. So mining frequent itemsets from uncertain data become very urgent. However, the current classical algorithms are mining for certain data. We need to make a new data model to deal with the complexity of uncertain data.This paper make depth study in the algorithm of mining frequent closed itemsets, propose improved algorithms and research data model for uncertain data. The main results are as follows:1. Two strategies for mining frequent itemsets.Make depth study for the classical algorithm of mining frequent itemsets, there are two main classical algorithm in frequent itemsets mining areas. One is Apriori algorithm and Apriori-based algorithm, such algorithm uses a bottom-up, breadth-first strategy. The main drawback of this method is this algorithm need multiple scan the transaction database and generate a lot of candidates set. This increases a lot of time and space cost. Two is the FP-Growth algorithm and the algorithm based on FP-Tree structure. This kind of algorithm uses depth-first traversal FP-Tree structure strategy. Only scan database once and does not generate candidate sets, the efficiency has been greatly improved.2.There are two algorithms for mining frequent closed itemsets, Apriori-base algorithm and FP-growth-based algorithm. One of the most efficient algorithm for mining frequent closed itemsets is DCI_Closed. The algorithm proposed the concept of the generator is order preserving, it proved every closed itemset has a unique generator sequences in order to pruning and improve the efficiency of the algorithm.3. Propose an improved algorithm of DCI Closed.Analysis the disadvantages of DCI_Closed, propose an improved algorithm of DCI_Closed, and introduce the nature of co-occurrence itemset and twin itemset to make effective pruning operation. In order to improve the efficiency of DCI_Closed.4. Propose a new algorithm U_DCI_Closed.Mining frequent itemsets from uncertain data have become hot issues in the field of data mining, however, the algorithm of mining maximum frequent itemsets and frequent closed itemsets from uncertain data are rare. After study the data model of uncertain data field and research the classical algorithms, use possible world model theory and combine DCI_Closed algorithm, in this paper propose a new algorithm U_DCI_Closed for mining frequent closed itemsets from uncertain data.
Keywords/Search Tags:data mining, frequent closed itemsets, uncertain data, expected support
PDF Full Text Request
Related items