Font Size: a A A

Research And Application On Association Rule Mining

Posted on:2006-07-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y B LiuFull Text:PDF
GTID:1118360155453718Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data mining is an emerging field, whose goal is to extract implicit, previously unknown, and potentially useful information out of large amounts of collected data. In the field of Data Mining the research of association rules is carried abroad. The prototypical application is the analysis of supermarket sales or basket data .The association rule discovery task identifies the group of items most often purchased along with another group of items, and predict the latent relation between the items, which are purchased always together so that the supermarket can make decision correctly. Association rule discovery has emerged as an important problem in knowledge discovery and data mining. Association rule mining provides a useful mechanism for discovering correlation among the underlying data. The problem of mining association rules is to generate all association rules that have support and confidence not less than the user-specified minimum support and minimum confidence respectively. It can be broken into two steps. The first step consists of finding all frequent item sets that occur in the database with a certain user-specified minimum support. The second step consists of forming the rules among the frequent item sets with a certain user-specified minimum confidence. In its general form an association rule can be viewed as the expression:A?B,where A and B are both conjunctions of conditions. In the background of data mining and association rule mining, the thesis conducts research and application on the method of association rule mining. The thesis presents a new method of the maximal frequent item set discovery, the conception of ordinal patterns which can be applied in data cleaning, a new approach of user-association mining for recommender system, the new form of association rules—dominance association rules which can be applied in predict the unknown values of criteria. The main contributions and results included in thesis are as follow: Firstly, survey the research of data mining, especially of association rule mining. The thesis analyses and discusses the background of data mining, the basic process of data mining, the main tasks of data mining; introduces the concept of association rule mining, the current research state and achievements of association rule mining, the key technologies of association rule mining; describes the typical algorithm Apriori of association rule mining and its variants. The contents discussed above are the base of further research. Secondly, study the problem of the maximal frequent item set discovery. Identifying the frequent itemsets is the key technique and the computationally intensive step in association mining task. In fact, any frequent itemset is a subset of a maximal frequent itemset. The thesis presents an efficient algorithm P&M to find all maximum frequent item sets, which adopts the improved set-enumeration tree to describe the item sets. P&M has the following properties: first, it shows the generation process of maximum frequent item sets by improving the set-enumeration tree; second, it combines the bottom-up and top-down searches; third, it makes use of the infrequent itemsets to reduce the candidates'number of the maximum frequent itemsets so that the efficiency is increased. P&M provides an efficient and fast method for maximal frequent itemset discovery. Thirdly, study the problem of mining and applying ordinal patterns. The thesis expands frequent patterns to ordinal pattern, gives the method of ordinal pattern mining, and then proposes and implements the approach of applying ordinal patterns in data cleaning. Based on the order between different items in frequent itemsets (frequent pattern), an ordinal pattern is an ordered list of items (attributes), ordinal patterns have the following properties: first, the values of different items in an ordinal pattern occur in increasing order in data; second, an item in an ordinal pattern appears in only one item set, and ordinal patterns can be mined by the algorithms which are designed for sequential pattern mining; third, an ordinal pattern reflects a kind of ordinal relationship between the values of attributes, so those records which break most of the ordinal patterns maybe identified as possible error records in data cleaning. The research work of analyzing, verifying and optimizing our data cleaning method should be studied further.
Keywords/Search Tags:Data Mining, Association Rule, Association Rule Mining, Support, Confidence, Minimal Support, Minimal Confidence, Frequent ItemSet, Maximal Frequent ItemSet, Sequence Pattern, Ordinal Sequence, Ordinal Pattern, Data Cleaning, Recommender System
PDF Full Text Request
Related items