Font Size: a A A

Improvement And Application Of High Utility Itemset Mining Algorithm

Posted on:2022-12-22Degree:MasterType:Thesis
Country:ChinaCandidate:G Y XieFull Text:PDF
GTID:2518306788456804Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
High utility itemset mining is one of the hot research contents in the field of data mining.It aims to find highly important itemsets in transaction database.In recent years,scholars have proposed a large number of corresponding algorithms,and have acquired certain achievements.However,most traditional high utility itemset mining algorithms only focus on the utility measure.As a result,most of the mined itemsets are weakly correlated and have no significance in real life.Firstly,aiming at the above issue,this thesis proposes an algorithm called ULBCHMiner to find stronger correlated itemset.The algorithm introduces all-confidence constraint on ULB-Miner algorithm,and puts forward a new concept of CorrelationUtility.To improve the mining performance,buffered utility-lists is improved,and pruning is carried out based on the proposed correlation utility upper bound and estimated correlation utility cooccurrence structure.Experimental results on different data sets show that ULB-CHMiner can prune numerous weakly correlated high utility itemsets,and perform better in terms of time performance,memory consumption and scalability than the selected comparison algorithm.Then,due to the difficulty of setting the minimum threshold in ULB-CHMiner,this thesis proposes an algorithm called ULB-TKCH.The algorithm aims to mine the K itemsets with the largest correlation-utility.Users only need to set the number K.ULB-TKCH uses five strategies to improve the mining efficiency: pre-evaluation strategy,threshold raising strategy,pruning strategy based on correlation utility upper bound and so on.In addition,the algorithm adopts the improved buffered utility-lists to store and retrieve utility information effectively.Finally,the performance of the algorithm is evaluated in multiple data sets.The experimental results show that ULBTKCH is faster and consumes less memory consumption than TKO,and has good scalability.Finally,this thesis designs and implements a correlated high utility MOOC pattern visualization platform.Firstly,two different transaction databases are obtained through preprocessing MOOC data,and course patterns and course category patterns are mined respectively.Then a visual platform is built to intuitively display the mining results,so as to better analyze the user's course selection behavior.
Keywords/Search Tags:high utility itemset mining, correlation, buffered utility-lists, top-k algorithm
PDF Full Text Request
Related items