Font Size: a A A

Research On Association Rules Mining And Associative Classification Based On Bit Table

Posted on:2010-05-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:J DongFull Text:PDF
GTID:1118360275957910Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
With the increase of human ability of using information technology to produce and collect data,the scale of data inflates rapidly.It is very important to discover the hidden and unknown knowledge in the databases.Data mining is a powerful tool to solve these problems. Association rules mining is an important filed of data mining.In a sense,association rules mining is the essence of data mining.The research and application of it occupy an important proportion of data mining research and have been developed rapidly.The research on how to mine the association rules from the massive databases efficiently and use them reasonably is of great theoretical and practical significance.Based on the analysis of current mining algorithms,an inter-transaction frequent itemsets mining algorithm and an intra-transaction frequent itemsets mining algorithm are proprosed to solve the problem of remote sensing image classification.In this dissertation,the research work can be summarized as the following three aspects:1.The research on the fast mining algorithm of complete frequent itemsets.Most of modern complete frequent itemsets mining algorithms are based on the Apriori algorithm, called Apriori-like algorithms.When generating candidate itemsets,they need to check if any two itemsets have the same n-1 items and when counting the support,the whole or part of the databases needs to be scanned one by one,which wastes a lot of CPU time and I/O operations. The two problems are the main bottlenecks of the Apriori-like algorithms.According to the two problems,the dissertation proposes a special data structure named BitTable and its bitwise operation.BitTable is adopted to compress databases and generate candidate itemsets quickly by the bitwise And/Or operation to avoid scaning databases.It also horizontally compresses the candidate itemsets and frequent itemsets,and generates candidate itemsets directly to avoid the operation of comparing each item.This data structure can be applied in Apriori-like algorithms directly and improve their performance effectively.Moreover,an association rules mining algorithm named BitTableFI is proposed based on BitTable.The experiment results demonstrate the effectiveness of the BitTableFI algorithm.2.The research on inter-transaction frequent closed itemsets and its fast mining algorithm. Compared with intra-transaction frequent itemsets,the inter-transaction frequent itemsets can effectively reveal the relevance of various attributes at different moments,and are the expansion of intra-transaction frequent itemsets.However,the amount of inter-transaction frequent itemsets increases rapidly with the increase of sliding time window,which will reduce the efficiency of the mining algorithm.It can effectively reduce the amount of itemsets without loss of information to utilize closed itemsets to represent inter-transaction frequent itemsets.This dissertation proposes an inter-transaction frequent closed itemsets mining algorithm,by analyzing the internal relation between the inter-transaction and the intra-transaction frequent itemsets.The proposed algorithm adopts division and condition database technology to avoid the generation of huge extended database,utilizes the extended BitTable to compress the transaction and improves the counting efficiency of the support. Dynamic ordering and hash table decrease the testing times of the candidate closed inter-transacation itemsets.Simulations show that the algorithm is a fast and efficient inter-transaction frequent closed itemsets mining algorithm.3.The research on fuzzy associative classification and its application on remote sensing image classification.Associative classification utilizes association rules to solve the classification problem.Fuzzy concept is introduced to associative classification,which can avoid the problem of "sharp boundary".However,most of fuzzy associative classification algorithms adopt the fixed membership function to generate fuzzy sets,without considering the intrinsic characteristic of data.To address this issue,the dissertation proposes a fuzzy associative classification algorithm FARC based on the adaptive interval partition.According to the intrinsic characteristic of data,FARC employs fuzzy c-means to partition continuous attributes,adopts new jointing and pruning technique to avoid generating unuseful candidate itemsets and introduces a weighted parameter to score the fuzzy association rules.The experiments on UCI datasets show that the method proposed in this dissertation not only has a higher classification accuracy,but also is insensitive to the variation of amount of the training data set.In this dissertation,the fuzzy associative classification is introduced to the research on remote sensing image classification.However,in the actual remote sensing applications, training data is hard to obtain,which affects the classification accuracy of traditional classifiers greatly.The proposed algorithm FARC can effectively overcome the problem of lacking training data set in the actual remote sensing classification and get high classification accuracy.
Keywords/Search Tags:Data mining, Association rules, Associative classification, Frequent itemsets, Remote sensing classification
PDF Full Text Request
Related items