Font Size: a A A

The Extended Research On Association Rules Mining

Posted on:2004-04-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:H F ZhouFull Text:PDF
GTID:1118360092975020Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Recent 20 years, the human's ability of collecting and using the data and information to produce is improved dramatically. The size of data expanded like rocket up. So people wish there be a new generation techniques and tools to analyze these data intelligently and automatically, which is too large and has cost too much human efforts and money, to find useful knowledge to support the decision-make. Facing the challenge of "drowning in data but starving for knowledge", Data Mining emerges as the times require and develops flourishingly.Data Mining means the process of nontrivial extraction of implicit, previous unknown and potentially useful information and knowledge from the large amount, incomplete, noisy, fuzzy and random data. Association rules mining is one of the main subjects in this field, which is used to determine the relations among the attributes or objects, to find out credible and valuable dependencies among the fields. The work in the dissertation is strictly bounded in such field by following the two phases, frequent pattern acquirement and rules generation, to deep into the extended research step by step.First, the rules generation is discussed. The concept of interestingness is redefined within the scope of probability, which is the base of the introduction of the negative items. With the bound of the negative items, an algorithm IAR, which can generate the rules with negative items, is proposed. These works complete the semantics of the rules, as well as make the rules more meaningful, especially in the case of concept hierarchy consideration. As a result of these work, a data mining tools based on association rules, ARMiner, is introduced.Next, after near 10 years research and development, the most essential phase in association rules mining, frequent pattern acquirement, and its techniques have been improved dramatically. Most of the works process the single item. However, there are many relations among objects in the real world, which construct a net. It means that to deal the items isolated is not suitable. On the other hand, these relations can be presented as graphs, as well as more and more applications have been put up. All urge people to put more and more attentions to the frequent pattern mining in graphs.Choosing the unique labeled graph as the object, two new algorithms, Ma-tricon and SFP, are proposed. It is based on the fact that unique labeled graph can be transformed into the format of itemset, on which recent 10 years researchon frequent pattern mining can be applied. Connectivity is the only difference that should be faced. The former algorithm is based on the Apriori idea and the latter uses the features in FP-Growth. The adjacent matrix graph presentation in Matricon and vertex-overlapping connectivity determination are the essential tools in the next work on un-unique labeled graph mining. As to the applications, since the Web nodes and pages can be labeled using URL, these algorithms are used to analyze the authoritative resources on the Web.When the constraint of unique labeling is removed, the ordered labeled tree becomes the objective. Two algorithms, Chopper and XSpanner, are proposed. Both of them show the superiority on performance against other same class methods. The major distributions are tree sequential expression and the idea of "iso-meromorphism". They can greatly improve the performance of the algorithms by delaying the hard problem of isomorphism, which is the bottleneck of most current methods. Some XML documents and Web logs are analyzed using these two algorithms, which gain some interesting results.At last, the problem of frequent subgraph extraction, whose core is the graph isomorphism, is solved. Referring to the successful AcGM and FSG, after comprehensive comparison, a new algorithm Topology is proposed. The framework of Topology is based on Apriori with the idea of "isomeromorphism", using the techniques of graph sequential expression and label-connectivity determination. Topology can analyze the complex relations among the obj...
Keywords/Search Tags:Association
PDF Full Text Request
Related items