Font Size: a A A

Research On Frequent And High-utility Itemset Mining Algorithms Over Data Stream

Posted on:2021-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:B L ZhangFull Text:PDF
GTID:2428330605460734Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
The massive data generated in the information age makes data mining an important basis for all walks of life to formulate sales strategies and find accurate customer groups.What's more,FIM(Frequent itemsets mining)and HUIM(High-utility Itemsets Mining)are two important branches in the field of data mining.FIM algorithms set the support as a parameter,but HUIM algorithms set the utility as the parameter.The utility refers to user preference,importance,profit and so on,which impact on usefulness.So HUIM is more able to meet the demands of people in reality.However,with the maturity of related research,scholars found that the FIM algorithms or HUIM algorithms independently can not meet the practical needs in some fields.Therefore,frequent&high-utility itemset has become the object in the field of data mining.Nowadays,data stream is increasingly becoming the main form of big data,but its features bring new requirements and challenges to data mining.Thence,data mining for data stream is of great theoretical significance and practical value.This paper focuses on the research of HUIM,and extends the research work from several aspects as follows:(1)In view of the shortcomings of EUCS(Estimated Utility Co-occurrence Structure)used in FHM algorithm,an improved HUIM algorithm is proposed,namely i FHM(improved FHM).The improved EUCS stores sub-tree utility,a tighter utility upper bound.In addition,itdoes not store 2-itemsets whose sub-tree utility is not more than the threshold.(2)In order to deal with the limitation of using FIM or HUIM independently,a frequent&high-utility itmeset mining algorithm——i FHMS-SW(i FHM with Support Based on Sliding Window),is proposed,that is,constructing a new structure,ESCS(Estimated Support Co-occurrence Structure)in i FHM algorithm.Through EUCS and ESCS,we can find all frequent&high-utility itemsets that satisfy both the utility constraint and the support constraint.Meanwhile,i FHMS-SW are implemented on Storm with sequenced sliding window.And result of the experiment proves it to be feasible and effective.
Keywords/Search Tags:Data Mining, Data stream, High-utility itemset, Frequent and high utility itemset, Storm
PDF Full Text Request
Related items