Font Size: a A A

Research For Several Key Technologies In Descriptive Pattern Rule Mining

Posted on:2007-01-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:J P LuFull Text:PDF
GTID:1118360212965590Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Descriptive pattern mining is an important issue in data mining. The task of data mining is to discover patterns in massive data, it can be divided into two categories according to the function of patterns, that is predictive pattern mining and descriptive pattern mining. Descriptive pattern mining makes description to the potential rules in the dataset, which contains some important research issue of data mining, such as association rule mining, sequential pattern mining and so on.Considering the potential requirements of the construction of Jiangsu intellectual property public service center and according the use of patent data and existed research in descriptive pattern mining, the research work is done in sequential pattern mining, association rules mining in distributed environment and outlier detection based on association analysis.The process of sequential pattern mining is repeated and mutual, an efficient mutual algorithm based on PrefixSpan is proposed, which can make full use of the previous results and generate new patterns efficiently when the minimum support threshold is changed; considering the problem of incremental updating of sequential pattern, algorithm ISPMP (Incremental Sequential Patterns Mining Based Projected Database) is proposed, which updates the frequent items and patterns found previously by implicit merging and discoveries new patterns by projection database; furthermore, to improve the efficiency of sequential pattern mining, the method of binary coding is applied to generate candidate sequential pattern and compute corresponding counts, which only need some simple operation as"or","and"and"xor". Considering the problem of association rule mining in massive datasets, an algorithm of fast mining global maximum frequent itemsets(FMGMFI) is proposed, which can conveniently get the global frequency of any itemset from the corresponding paths of every local FP-tree by using frequent pattern tree, and requires far less communication overhead by the searching strategy of bottom-up and top-down. Finally, considering the problem that most of current outlier detection algorithms are not capable to high dimensional dataset and in less of semantic interpretation, an outlier detection algorithm based on association analysis and rough set theory is proposed, the algorithm can efficiently discover those abnormal data points which are distinguished from normal association patterns. A lot of experiments are taken on during the research process. The processing strategies and experimental results prove the rationality and availability of the novel method.
Keywords/Search Tags:data mining, descriptive pattern, sequential pattern, association analysis, outlier detection
PDF Full Text Request
Related items