Font Size: a A A

Research On Frequent Pattern Mining Of Uncertain Data

Posted on:2017-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y J WeiFull Text:PDF
GTID:2348330509963606Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Data mining has received wide attention in academic circles since it appeared. Association rules mining is one of the contents in the research of data mining, which aimed to find hidden relationships in the massive data, and the key of association rules mining is frequent patterns mining. There are precise data and uncertain data in reality. As to the mining over precise data, the breadth and depth of its research is relatively mature. However, research on uncertain data mining is relatively less. Since these two kinds of data have different structures, leading to traditional frequent patterns mining algorithms applies only to precise data but not to uncertain data, how to mine frequent patterns over uncertain data is an urgent problem to be solved.According to the basic problems of mining frequent patterns over uncertain data, we studied and analyzed the existing algorithms through checking lots of domestic and foreign literature, then put forward two effective improved algorithms.Firstly, for uncertain data in the static database, on the basis of analyzing and studying the existing results, an effective algorithm UPro-Eclat for mining frequent patterns was proposed, which mine in vertical mode. The algorithm is a method to mine probabilistic frequent patterns based on confidence, which include format conversion of data sets, appropriate pre-pruning strategies, establishment of an extended subset search tree and mining process. And adopt topdown depth-first traversal of the tree, without traversing all data sets. In addition, the method of estimation is used to mine probabilistic frequent patterns instead of establishing a recursive dynamic calculation model to do precise calculation. The improvement of the above two points can effectively save the execution time of the algorithm and improve the efficiency. The experimental results show that UPro-Eclat algorithm has better performance.Secondly, for uncertain data in data stream, on the basis of analyzing and studying the existing results, we proposed an algorithm DSUF-mine to mine frequent patterns in the stream. DSUF-mine is based on probability decay of sliding window and is also an algorithm based on expected support. The concept of sliding window probability decay is introduced to the SUFmine algorithm, adding a decay factor to each expected support of items in each sliding window, in order to distinguish between new and old transactions. Then store information of the data stream in windows' temporary tables and a global tree. Finally, mine frequent patterns in the stream by traversing the global tree. After testing, the algorithm also has good performance.
Keywords/Search Tags:Data mining, uncertain data, frequent patterns, data stream, sliding window
PDF Full Text Request
Related items