Font Size: a A A

Research On Algorithm For Mining High Utility Itemset With Negative Item Values

Posted on:2019-08-08Degree:MasterType:Thesis
Country:ChinaCandidate:L J ChenFull Text:PDF
GTID:2428330575450308Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
High utility itemset mining is a popular research direction in data mining.It measures the importance of itemset through utility value,reflects the user's interest and solves user's problems well.The research on high utility itemset mining is relatively mature.The utility value is positive in high utility itemset mining,but in real-world transactional databases,it founds that items often have negative item values with the in-depth study.For example,a common sale strategy in stores is to give free products together with other products,the free products have negative item values.For another example,some products may be put on a shelf and take off it multiple times due to season reasons.Theresfore,it is more practical to consider negative item values and on-shelf periods in high utility itemset mining.However,the traditional high utility itemset mining can not solve it.To enhancing the performance of high utility itemset mining with negative item values,the main research of the paper includes high utility itemset mining with negative item values and on-shelf utility itemset mining with negative item values two aspects.The main works of the paper is:(1)This paper studies how to improve the efficiency of high utility itemset mining with negative item values.The algorithms for mining high utility itemset with negative item values only consider the items with positive item values but ignore items with negative item values in the pruning process,which affects the pruning effect.To address this issue,this paper presents an efficient algorithm EHINM.EHINM algorithm proposes an improved utility list structure to save the utility information of transactional database.The improved utility list structure is a three-tuple structure,which is different from the utility list with four tuple used in the past The improved utility list structure compresses the information stored in the utility list with four tuple by considering items with negative item values,it can not only reduce the memory usage,but also reduce search space based on the information stored in the structure.The algorithm proposes three pruning strategies according to the redefined pruning upper bound and improved utility list structure.At the same time,the extension property of itemsets with negative item values is proposed to reduce the traversal of search space.Experimental results show that EHINM algorithm can effectively improve the mining efficiency.(2)This paper studies the parallel on-shelf utility itemset mining with negative item values.The same to high utility itemset mining,on-shelf utility itemset mining is also an exponential problem.Therefore,how to effectively improve the mining efficiency is need to be solved.Parallelization is an effective way to improve the mining efficiency.The paper proposes a parallel algorithm for mining on-shelf utility itemset with negative item values named DTP-Houn.The algorithm is based on MapReduce and takes full account on-shelf time periods,it divides the database according to the on-shelf time periods.The algorithm transforms the mining work into MapReduce job,the Map phase to mining candidates in database fragments,and the Reduce phase to calculate the on-shelf utility values of the candidates in parallel.The experimental results show that the DTP-Houn algorithm has a good performance.
Keywords/Search Tags:high utility itemset mining, negative item value, pruning strategies, on-shelf time periods, MapReduce
PDF Full Text Request
Related items