Font Size: a A A

The Study Of Mining Algorithm For High Utility And Frequent Itemsets Based On Multiple Minimum Support

Posted on:2016-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:L J WangFull Text:PDF
GTID:2308330464968509Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Frequent itemsets mining is an important research topic of data mining field, but it only considers itemsets’support value, which leads to lose some interest itemsets in the mining process. In addition, because data streams are real-time, boundless and continuous, mining algorithms for data streams should be more efficient on space and time.The problems of mining algorithms on multiple minimum supports and high utility itemsets are described in this paper. The existing algorithms’ advantages and disadvantages are analyzed and summarized from data structure and processing methods. Based on the above, the paper makes the research work as follow:1. The existing mining algorithms of multiple minimum supports for static dataset will produce the large amount of middle candidate sets in the whole process, which increase the cost of time and memory, and do not take into consideration the utility value. To these problems, this paper constructed a data structure MHU-Tree with multiple minimum supports and utility value. And two strategies PG and PL are proposed, which PG is used to prune global MHU-Tree at the construction process and PL is used to prune local MHU-Tree at the mining process. Moreover, a high utility and frequent itemsets mining algorithm MHU-Growth are proposed, which decreases the number of middle candidate set and discovers quickly high utility and frequent itemsets. Finally, comparing with CFP-Growth++ algorithm by the experimental, MHU-Growth with the strategies outperforms in run time, the number of candidate sets and memory overhead, etc.2. Point at the existing mining algorithms for high utility itemsets in the data streams environment need to scan database many times. Moreover, it is more difficult for users to set the utility threshold, which will influence the mining effect when it is set too high or too low. The paper constructed a new data structure TKHUF-Tree that is applicable for data streams mining by combining multiple minimum supports with utility, builded two matrices such as PMD and RMD to store utility information, and used the threshold raising strategies such as PEU, RTS to raise automatically the utility threshold for decreasing the number of middle candidate sets.3. On above basis, the paper proposed a mining algorithm TKHFDS, which employes sliding window model to handle data streams, and uses proposed minTKUtil strategy to raise the utility threshold of next window, quickly and efficiently dig out high utility and frequent itemsets in next window. At last, the test identified the proposed algorithm outperforms to the algorithm TKU and T-HUDS in the overhead of time and memory.
Keywords/Search Tags:data streams, multiple minimum supports, high utility and frequent itemsets, top-k
PDF Full Text Request
Related items