Font Size: a A A

Research On Association Rule Algorithm For Massive Data Set

Posted on:2008-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:X X LiuFull Text:PDF
GTID:2178360215482639Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Data mining is the discovery of information or patterns that are interesting, non-trivial, implicit, previously unknown and potentially useful in large databases. Association rule mining is one of the most important research methods, which was developed by Agrawal to find out relations of different commodities in transaction databases. But with the rapid development of Internet technology and database technology, Data mining makes the data processing which needs a growing scale. Massive classic algorithms of Association Rules, consumed a lot of time and space. The result is not satisfactory. So, it has put in a lot of improved data reduction strategy, includes distributed parallel processing, batch processing, incremental processing and so on.The article aims at the characteristics of the massive data sets , and does some research about the association rules mining algorithm. Firstly it aims at the skew distributed characteristic of the large data sets, and puts forward the weighted association rules mining arithmetic based on the density biased sampling, density biased sampling can produce the representative sampling when deals with skew data sets, comparing with the random sampling.Then do a supporting counting with the weights of part density calculating samples gained when sampling. There is no need to reduce the minsupport. frequency itemset is produced by Fk-1×F1 mode of connection and apriori previous knowledge. Only scanning the data set a time, the experiment proved that when dealing with massive data sets of skew distributed, it not only has a good efficiency, but also improves correctness, so it is a high valid algorithm on dealing with association rules mining of massive data sets. Finally, this algorithm is used in the field of intrusion detection system。Secondly according to the character of massive data set density, using Granular Computing and rough set, combining with association rules mining algorithm, an method based on Granular Computing about association rules mining is given. Through the nature of Granular, a number of candidate itemsets are minimized and frequency itemset is mined by the application of depth first search strategy. Finally, the effectiveness is proved through the experiment.
Keywords/Search Tags:massive dataset mining, association rule, granuclar computing, density baised sampling
PDF Full Text Request
Related items