Font Size: a A A

Research On High Utility Sequential Pattern Mining Based On MapReduce

Posted on:2020-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:K K RongFull Text:PDF
GTID:2428330575967958Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid growth of data volume,the efficiency of efficient execution of sequential pattern mining algorithms is seriously degraded,but the application of efficient sequential pattern algorithms to large-scale algorithms is still in its infancy.When the amount of data is large,the efficient use of the sequential pattern algorithm will generate more candidate sequences,occupy a large amount of memory,and the execution speed is greatly affected.By rewriting the efficient sequential pattern mining algorithm to the MapReduce-based algorithm and proposing more Strict pruning strategy can effectively generate the amount of candidate sets and avoid memory bottlenecks in stand-alone situations.Mining high utility sequential pattern is an interesting problem in data mining.In this paper,we propose a new algorithm called high utility sequential pattern mining based on maximal remaining utility(HUSP-MRU).In HUSP-MRU,the maximal remaining utility(MRU)is defined as tighter upper bound of candidates.Representing the search space with lexicographic sequential pattern tree,the matrix structures are used for MRU storage,and branch as well as node pruning based on MRU are used for improving mining efficiency.extensive tests conducted on publicly available datasets show that the proposed algorithm outperforms USpan algorithm in terms of mining efficiency.Based on the high utility sequential pattern mining algorithm proposed above,anefficient algorithm for mining sequential pattern based on MapReduce is designed.The algorithm uses MapReduce framework to solve the bottleneck of single-machine operation when the data volume is too large.The algorithm first calculates the weighted utility of the items that can form the efficient sequence,uses these utilities to construct the utility matrix,and then mines the efficient sequential pattern in each utility matrix,and using the maximum remaining utility in the mining process to quickly calculate the sequence utility,improve the speed of algorithm execution.Finally,the paper designs a set of traffic flow prediction visualization system that uses efficient sequential pattern mining.According to the traffic data,the system first preprocesses the data and get the sequence database format,then uploads the data to the distributed file system(HDFS).Finally,the sequence pattern mining algorithm based on MapReduce is used to obtain the high utility sequential pattern.In the system,the use of high utility sequential pattern for prediction and visualization.
Keywords/Search Tags:High Utility Sequential Pattern, Utility Matrix, MapReduce, Pruning Strategy, Data Visualization
PDF Full Text Request
Related items