Font Size: a A A

Research On Top-k High Utility Item Set Mining Based On The Utility Matrix And Index

Posted on:2014-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:R WangFull Text:PDF
GTID:2308330473951222Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of the information era, quantities of data are generated, where lots of information and knowledge people need to know are hidden. People are eager to convert them into useful information. The traditional frequent item set mining methods may discover a large amount of frequent but low utility item sets. Hence, frequent item set mining cannot satisfy the requirements of users who desire to discover item sets with high utilities such as high profits. To address these issues, high utility mining emerges as an important topic in data mining. Although many studies have been carried out on this topic, setting an appropriate minimum utility threshold is a difficult problem for users. If the threshold is set too low, too many high utility item sets will be generated, which may make the mining algorithms inefficient or even make the system run out of memory. On the other hand, if the threshold is set too high, no high utility item sets will be found. The process to acquire appropriate one by adjusting thresholds constantly is much too verbose for users.To addresses this problem, this thesis proposes a novel top-k high utility item set mining algorithm based on utility matrix and index. Users do not need to set the threshold. They only need to give the number of item sets to be mined. The main contributions of this thesis are summarized as follows:First, the algorithm proposed by the thesis is the first one using the true utility values of item sets when mining top-k high utility item sets, which can increase boundary threshold effectively.Second, this thesis proposes a utility matrix structure. It can avoid multiple scanning of database during calculating the utility values of quantities of item sets.Third, this thesis proposes a reduction strategy based on index structure. It can solve the problem that the algorithm cannot be pruned during top-k high utility item sets mining, using a top-down mining process.Finally, the thesis abandons the idea of generating long item sets by connecting short ones and uses top-down mining according to the features of top-k high utility item set mining, which can not only improve the boundary threshold effectively, but also reduce the number of item sets generated during mining.The theoretical analysis and experimental evaluations show that the algorithm of the top-k high utility item sets mining based on utility matrix and index is accurate and efficient...
Keywords/Search Tags:utility matrix, index, high utility item sets, top-k mining, reduction strategy
PDF Full Text Request
Related items