Font Size: a A A

Mining Maximum Frequent Itemsets Over Uncertain Data Streams

Posted on:2017-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:M L HouFull Text:PDF
GTID:2308330485964137Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, many areas continuously produce large amounts of data every day, such as data transferred in sensor networks, etc. Most of these data exist in the form of uncertain data stream. Data mining provides the method to analysis and interpret data reasonably. Frequent itemsets mining is used for exploring the valuable relationship between items in massive data. Maximum frequent itemsets can reduce the output number of all frequent itemsets. In some applications, people only care about maximum frequent itemsets, so researching and mining maximum frequent itemsets is particularly important.Based on consulting a large number of relevant papers and documents both at home and abroad, this paper summarizes the features, causes, manifestations and processing models of data stream and uncertain data. Save storage space and reduce the search space have become two big breakthrough points of designing and optimizing data mining algorithms. According to the above two points, the paper proposes SUFMax algorithm to mine maximum frequent itemsets over uncertain data stream based on attenuation window. While its mining efficiency is not high, so we further present another algorithm TUFSMax.This paper’s main work includes:1. It summarizes the related concepts about maximum frequent itemsets mining, and introduces frequent itemsets and maximum frequent itemsets mining algorithms on certain and uncertain data streams.2. The main difficulty of mining maximum frequent itemsets lies in two aspects: first, how to design proper data structures to store the profile information of uncertain data stream. Second, how to use efficient superset detection method to mine maximum frequent itemsets rapidly. From the point of view of saving memory space, and taking the "old" data may affect the new data into account, this paper proposes the SUFMax algorithm, which is based on attenuation window, to mine maximum frequent itemsets over uncertain data stream. SUFMax saves memory space, as it takes the strategy of mining local maximum frequent itemsets at first, and then mining global maximum frequent itemsets from the results of the previous step.3. Because SUFMax algorithm may omit some results, this paper puts forward another algorithm named TUFSMax. It uses UF-stream tree structure to store the profile of data stream, adopts the method of tag tree nodes to avoid superset test, so TUFSMax can save storage space and reduce search time. Experimental results show that TUFSMax has higher mining efficiency and less running time, compared with SUFMax algorithm.At last,it summarizes the whole paper, points out the deficiencies of our current research, and further prospects the future work.
Keywords/Search Tags:Data stream, Uncertainty, Maximum frequent itemsets, Superset test
PDF Full Text Request
Related items