Font Size: a A A

Research Of High Utility Sequential Pattern Mining

Posted on:2019-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:J X ZhangFull Text:PDF
GTID:2428330566498553Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the speedy growth of information technology,data available for collection grows exponentially.How to obtain the potential value among all of collected data to satisfy the users' needs in different industries has become an emerging problem.As an important issue in recent years,high-utility sequential pattern mining(HUSPM)considers both inner quantity and external profit to reveal the high-utility knowledge in databases,which is widely used in decision-making and enterprise management.Existing research of HUSPM mainly concentrates on improving the efficiency of the designed algorithms.However,various applications with different data types and thresholds should also be concerned in the HUSPM framework.Many existing algorithms cannot address these problems.Therefore,this dissertation focuses on studying the related issues of HUSPM in three aspects as the improvement of algorithm efficiency,constraint conditions and data types.The main content and contributions of this dissertation are listed as follows:To improve the efficiency for HUSPM,the HUSP-Miner algorithm is proposed here.The algorithm employs a designed compact data structure,named utility-linked list(UL-List)to instead of the original database.UL-List can speed up the generation progress of the new sequences and the calculation of the actual utility and upper-bound values of sequences,thus avoiding multiple database scans and reducing the computational complexity of algorithm.To reduce search space,this dissertation compares the downward closure properties based on different upper-bound.Based on the downward closure property,new pruning strategies designed to reduce the quantity of candidate sequences.Experiments show that the proposed algorithm outperforms previous algorithm of HUSPM,in terms of runtime,the number of candidates and scalability.For HUSPM under different constraints,this dissertation proposes a framework named high-utility sequential pattern mining with multiple minimum utility thresholds.The traditional approaches discover high-utility sequential patterns with a uniform minimum utility threshold.Based on the designed framework,this dissertation proposes the HUSP-MMU algorithm,which allows to set different thresholds for each item,thus finding high-utility sequential patterns under multiple minimum utility thresholds.To improve the efficiency,HUSP-MMU algorithm adopts the designed downward closure property based on multiple minimum utility thresholds,and combines with the developed pruning strategies.Experiments show that the HUSP-MMU algorithm can mine high-utility sequential patterns with multiple minimum utility thresholds efficiently,and verif y the correctness and completeness of the proposed algorithm.To handle multi-dimensional databases,a novel framework named multidimensional sequential pattern mining is proposed in this dissertation.Two algorithms MDHUSPEM and MDHUSPSD are proposed based on the designed framework.The MDHUSPEM algorithm transforms the original problem to the problem of HUSPM by database transformation,and utilizes the techniques of HUSPM effectively.The MDHUSPSD algorithm mines the sequence part and dimension part of the database by the algorithm of HUSPM and the proposed DHUIMiner algorithm respectively,and then obtains multi-dimensional high-utility sequences by the connection of patterns.The DHUI-Miner algorithm uses the designed data structure named utility list and the pruning strategies based on a new downward closure property to improve the efficiency of algorithm.Experiments were conducted to compare the performance of MDHUSPEM and MDHUSPSD algorithms in different datasets.The results show that the performance of latter is better than that of former,especially for the datasets with more dimensions.Overall,this dissertation combines the basic theory and real-word applications,and proposes novel models,data structure and pruning strategies to expand the applications of HUSPM based on three levels as algorithm efficiency,constraint conditions and data types.
Keywords/Search Tags:data mining, high-utility, sequential pattern, multiple minimum utility thresholds, multi-dimensional
PDF Full Text Request
Related items