Font Size: a A A

Research On Frequent Itemset Mining Algorithm Of Uncertain Data Stream

Posted on:2019-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:C YangFull Text:PDF
GTID:2428330566491421Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Frequent itemsets mining plays an indispensable position in date mining.Due to improving of collection tools and processing methods,uncertain data spread throughout all aspects of life.With the development of Internet,ecommerce,and analysis which has resulted in data streams coming.Uncertain data stream means the combination of uncertain data and stream data,whose frequent itemsets mining algorithm becomes a hot issue in the industry research.Frequent itemsets mining algorithms of traditional data streams and static uncertain data have a relatively simple summary structure and have been difficult to apply to the design of summary structures of uncertain data stream.Therefore,designing a summary structure of uncertain data flow and an efficient mining algorithm have become an important issue in current research.This paper proposes two effective improved algorithms which bases on the existing algorithms.The main work is as follows:First,Considering that the existing uncertain data stream mining algorithm basically uses a single window technology,this paper designs the DSUFIM-mine algorithm based on the hybrid model.The algorithm combines the sliding window with the attenuation window,so that considering the different windows have different importance of the incoming transaction at different times,it is more in line with the reality.At the same time,the algorithm introduces false-positive,taking into account the problem that the storage expectation support may be inaccurate,so that the accuracy is improved.In addition,the SRUF-mine algorithm summary structure adopts a queue method to update the sliding window.This paper uses the flag field in table to update the sliding window.Experimental results show that the algorithm improves memory consumption and accuracy.Secondly,For the existing algorithm,the summary structure is mostly stored in a tree of prefix trees.The summary structure using trees as a storage method not only requires the same set of items,but also requires the same probability to share the prefix path.This results in node redundancy and wasted memory.Therefore,this paper uses a matrix to store probabilities of node and tree storage nodes and proposes a matrix-based MQT-mine mining algorithm for uncertain data streams.When performing frequent itemset mining,frequent item set mining is performed through the information in the summary structure tree linked by items in the queue,without having to traverse the entire tree,which saves the running time of the algorithm.Experimental results show that the algorithm has a better time-space efficiency.
Keywords/Search Tags:Data mining, uncertain data stream, Mixed model, matrix
PDF Full Text Request
Related items