Font Size: a A A

Association Rule Mining Expansion Of Research In The Area Of ​​disaggregated Data

Posted on:2011-07-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y X MaoFull Text:PDF
GTID:1118330335492029Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the wide application of computer technology in society, people are becoming more and more dependent on information systems. Facing to the condition of "rich in data but poor in information", based on the development of the statistics, database, machine learning, artificial intelligence, pattern recognition and virtualization technology, a new interdisciplinary subject—data mining came into being.Association rule mining is one of the most important research areas in data mining. Because of its wide applicability and special value in the application such as the retail data analysis, customer relationship management. WEB mining, network intrusion detection, the equipment failures diagnosis, spectral analysing, the protein structure diagnosis, software bug detection etc.. Data mining has been payed much attention from both the business and academia and has been continuously extended to the new area in despite of its twenty years'history already.In this thesis, we will extend the study from the traditional area to the environment of taxonomy data, which is the new trend in association rule mining. The taxonomy data not only widely exists in lots of applications, but also can provide richer, more flexible and more useful information for decision making. So the expansion of research in this area is very important in both practice and theory.The main works of this paper are as follows:Firstly, we will study the multilevel association rules mining problem in the taxonomy environment, and propose a top-down algorithm TD-CBP-MLARM and a bottom-up algorithm BU-CBP-MLARM. These two algorithms modify the traditional similarity function with the domain knowledge, which makes it more suitable for measuring the similarity of items. Then they use the hierarchy cluster technology to partition the item of the taxonomy based on the modified similarity function. After that we can partition the transaction database by pruning the items which does not belong to the same cluster. All that can reduce the I/O scan time over the partitioned database by replacing the original database, which makes the algorithms more efficient.Secondly, we will study the generalized association rules mining problem in the taxonomy environment. We propose two algorithms based on the candidate generation and without candidate generation policy. Then SET-BFS, a new efficient algorithm based on the generate-and-test policy will be given out. This algorithm generates the frequent itemsets into a SET enumeration tree by the breath-first travel, which will improve the efficiency by pruning lots of items which are not frequent. Based on the divide-and-conquer policy, we will propose a new algorithm GEAOT-tax. This algorithm replaces the FP-tree with the GEAOT-tree, which is projected from the transaction database sorted by the ascending order of the frequent 1-itemset. It uses the top-down and depth-first travel, together with the merging and pruning operation, which makes the algorithm save lots of time.Thirdly, we will extend the study from the static data to the dynamic data in taxonomy environment. We will propose an incremental updating algorithm GECT-IM. It projects all the transactions to a more compressed prefix tree called GECT by just one scan of the transaction database. Using the special dual header table structure to get the count of changed items and count most of the items by calculation instead of rescanning, it saves lots of time. But the capacity of GECT is still large, we then propose an improved structure called PGECT, which only projects the pre-frequent items in transaction database and therefore is smaller than the GECT. The algorithm PGECT-IM based on the PGECT is so much more efficient than the GECT-IM.
Keywords/Search Tags:Association Rule, Multilevel Association Rule, Generalized Association Rule, Taxonomy, Set Enumeration Tree, Prefix Tree, Incremental mining
PDF Full Text Request
Related items