Font Size: a A A

Research On Association Rules Mining Precision Over Data Stream

Posted on:2012-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:G H HaoFull Text:PDF
GTID:2218330338465397Subject:E-commerce and information technology
Abstract/Summary:PDF Full Text Request
Nowadays, at the digital age, with the development of telecommunication and World Wide Web, the volume of data is increasing extremely. The data stream comes up. Discovering the useful information and knowledge in the data, just like mining the precious in the huge ocean, is a challenge that we face up to. Mining the frequent patterns in the data stream is a new task in recent years, it is meaningful to the social production and our daily life, it can be widely used in telecommunication, facilities maintenance, security exchange and etc.The data mining works and researchers make great effort on the data stream mininig and advance a lot of new design on the mininig procedures and algorithms. However most of the researches only put the attention on the mining process, lack of the work on the mining result. The aim of the data mining is the precise, credible and useful information, and as the reverse of our expect, the result of association rules mining on the data stream can only be approximative. So the precision of the mining result should be the parameter key of the association rules mininig.In the essay, it illustrate a new method on mining frequent patterns in data stream and algorithms ensuring the precise result. It modify the details of the mining method on the obtaining data, data storage and information discovering. Our research consist of three part:obtaining data, data storage and knowledge discovering, it works for ensuring the precision of the mining result on every method detailsFirstly the sliding time windows divide the data stream to itemsets, drop the items which appears more than twice, then sort the itemset as the sequence of the first layer child node in the FP-Atree from the left to right. After that the itemset can be seen as transactions.Secondly the data storage consist of storage structure, data update algorithm and computing the maximum error. Our researcher advanced a new data storage structure named FP-Atree, different the FP-tree, it consist of a prefix tree, without the head-node list and the head-node point. The data update algorithm divide the whole time to time frames, after each frame, the node which support less than the maximum error should be pruned.Finally it proposes the polynomial strategy to estimate the value of the maximum error, and in Chapter 4 the minimum support threshold has been modified. The proper value of the maximum error and modified minimum support threshold are two parameter keys for increasing the precision of the mining result.
Keywords/Search Tags:Frequent Patterns, Maximum Error, Minimum Support Threshold, Mining Precision
PDF Full Text Request
Related items