Font Size: a A A

Multi-level Mutli-dimesional Frequent Itemset Mining

Posted on:2005-07-30Degree:MasterType:Thesis
Country:ChinaCandidate:G H HeFull Text:PDF
GTID:2168360125963898Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
What is the trouble when the people face to the "information bomb"? It is difficult to get useful informations from the sea of the data quickly. KDD coming for the need has become one of the strongest weapons that people can use to solve the paradoxical problem. Data mining is the nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data.Algorithm is the key part in KDD, because it is crucial to efficient of KDD. On one hand, data mining is used to process large database, and so the efficiency of algorithm is the most important; on the other hand the computer in use is not satisfied to the processing of Large database. Consequently, we should modify present algorithm to fit the need which we refer above. This paper studies the Sequence Mining Algorithm deeply. Apriori-based Algorithm need scan database many times, which decreases the efficiency of Apriori-based Algorithm. At the same time, Apriori-based Algorithm produces a large number of candidate sets. FP-tree Algorithm is a revolution of Apriori-based Algorithm, because it only need scan database two times. But FP-tree Algorithm pushes uniform minimum support, which losses the advantage of the algorithm. Usually, real life transaction database contain both item information and dimension information. Knowledge about multi-level and multi-dimensional frequent itemset is interesting and useful. The classic frequent itemset mining algorithms based on a uniform minimum support, either miss interesting patterns of low support or suffer from the bottleneck of itemset generation.In this paper, we extend FP-growth to attack the problem of multi-level multi-dimensional frequent itemset mining. We call it E-FP. To increase the efficiency, we push various support constraints into the mining process. Our E-FP algorithm can discover both inter-level frequent itemset and intra-level frequent itemset. Moreover, we take dimension into account in our E-FP algorithm. We show that our E-FP algorithm is more flexible at capturing desired knowledge than previous studies.Clustering analysis has been a very active area of research 。 It has been applied in data mining, web mining, E-commence etc. However, most algorithms ignore the fact that physical obstacles exist in the real world and could affect the result of clustering dramatically. In this paper, we will explore the problem of clustering in the presence of obstacles. We provide an algorithm called ADP-Chameleon to solve it.
Keywords/Search Tags:data mining, FP-tree Algorithm, multi-level multi-dimensional itemset, E-FP Algorithm, clustering
PDF Full Text Request
Related items