Font Size: a A A

Research On Improvement Of High Utility Pattern Mining Algorithm Over Data Streams

Posted on:2020-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:F GuoFull Text:PDF
GTID:2428330623466989Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,the scale of data has grown exponentially.How to find potential and high-value information from these data has become the main challenge in the current data mining field.With the advent of data streams,the research on data streams mining has become a hotspot in the field of data mining.Among them,high utility pattern mining is a major research direction of data streams.Most of the current high utility mining algorithms over data streams are based on data structure of the global header table and the utility tree.In order to make the high utility pattern mining algorithm over data streams not only can be executed quickly and effectively but also can adapt to more application scenarios,this thesis has done in-depth research on the improvement of high utility pattern mining algorithm over data streams.And the main work can be summarized as follows:(1)Most of the current high utility mining algorithms over data streams include irrelevant redundant data items in global header table and the low-utility data items are processed uselessly during the mining process.In order to improve the mining efficiency of the algorithm,a high utility pattern mining algorithm based on global revision header table and low-utility pattern pre-prune strategy over data streams called HUMGRT is proposed in this thesis.By revising the global header table,the algorithm deletes redundant data items,and uses the pre-prune strategy to ignore the low-utility data items,thereby improving the mining efficiency of the algorithm.The experiments under different data sets show that the HUMGRT algorithm has a better performance.(2)High utility mining algorithms have low efficiency and can easily lead to out of memory in long transaction scenarios,this thesis gives the definition of long path transaction and maximum recursive mining numbers of data items,and proposes a strategy called ESMI.By changing the number of recursive mining of data items in long path transactions without changing the structure of the utility tree,the strategy solves the problem that the algorithm can not be applied to long transaction scenarios well.The experiments show that the ESMI strategy can effectively improving the timespace efficiency and expand the application scope of the algorithm.(3)At present,the high utility pattern mining algorithms over data streams do not consider the scenario of new data items.The existence of new data items will result in missing external utility,and the algorithm can't continue to perform the mining work.In order to solve this problem,a model called RPC-Model is proposed in this thesis,which use the relevant utility information of the processed data items to complement the external utility of the new data items,so that the algorithm can run normally.The feasibility and accuracy of the RPC-Model are verified by relevant comparison experiments.
Keywords/Search Tags:Data streams, High utility pattern, Global revision header table, Long transaction, New data item
PDF Full Text Request
Related items