Font Size: a A A

Research On Top-k Pattern Mining With Decreased Candidate

Posted on:2016-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:M F ChenFull Text:PDF
GTID:2308330479484829Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data mining can find useful information hidden in data, it plays an important role in the field of data analysis. Especially in the retail business, DM need to analyze a large number of daily sales data generated from distance branch, to help businesses make decisions to make favorable sales, such as inventory preparation, product placement and promotion. In order to make the business run according to plan, to make efficient analysis of the sales data generated from each branch continuously, is very necessary.As an important technology of data mining, frequent pattern mining can discover useful patterns formed by the Item. For a typical mining algorithm, such as Apriori, FPTree, the user needs to set a threshold value to obtain a useful pattern from the database, but in real-world, the user is difficult to give an appropriate threshold.To solve this problem, Top-k frequent pattern mining algorithms have been proposed. Unlike set the threshold, the user is simply needed to set the number k and then can get the patterns that value of the top-ranked k. The algorithm uses a closeddown property to reduce the search space, and is able to greatly improve the processing efficiency of mining.Real-world applications such as supermarket retail data analysis, the profit is an significant item. In addition, commodity can appear more than once in the same goods. But the Top-k frequent pattern mining algorithm does not consider the above characteristics. Although researchers can use the concept of utility mining to deal with it, but it does not meet the downward closure property. Therefore, they can’t take advantage of the closed-down features to improve the efficiency of the utility pattern mining. In recent years, algorithms using overestimation strategy has been proposed, but they will produce a large set of candidate patterns. So, inutility pattern mining, reducing the number of candidate patterns is an important issue.Inspired, by raising the threshold by referring to precise and pre-estimate utility, this paper proposes the algorithm: mining Top-k high utility patterns with decreased candidate. The main works of this paper are:① Propose three strategies to raise the threshold used by the construction of UPTree, and reduce the size of the tree, thereby, save more time. They refer to the precise and estimated utility of pattern.② After constructing the UP-Tree, a new strategy is used to raise the threshold again, so that the generated candidate pattern is less③ Compared with UP-Growth and TKU, this algorithm produce less candidates and scan dataset with less time.④ Comprehensive experiment on realistic and synthetic data shows that algorithm has nice time and memory performance.The first part introduces the background, status and content; the second part introduces the basics of data mining; the fourth part, explain the process of the algorithm in detail; Part five analysis and presentation of the results of the experiment. Finally, give summary.
Keywords/Search Tags:Threshold, High utility pattern, Candidate pattern
PDF Full Text Request
Related items