Font Size: a A A

A Study On The Extension Of Utility Pattern Mining On Big Data

Posted on:2019-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:X X ZhangFull Text:PDF
GTID:2348330542481601Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The rapid development of database technology and the exponential growth of data volume from all fields of life lead people into the big data era.How to obtain useful information from various databases has received considerable attention.As an important research topic in the data mining,high utility pattern mining emerged recently to address the shortcomings of frequent pattern mining,that is,frequent pattern mining only considers the occurrence frequencies of patterns,while ignores other important information,such as quantity and profit.Many algorithms have been proposed for the basic form of high utility pattern mining problem.However,a lot of new topics on the extended form of utility pattern mining problem that take into consideration the various practical factors are not well studied yet,including non-deterministic high utility pattern mining,closed high utility pattern mining,incremental high utility pattern mining and top-n high utility pattern mining and so on.The state-of-the-art incremental high utility pattern mining algorithms and top-n high utility pattern mining algorithms are either based on a two-phase paradigm,generating a large number of candidates,which will cause scalability issues,or based on a vertical data structure,resulting in a large number of join operations,which will lead to efficiency.To address the challenges,this paper proposes two new algorithms for incremental utility mining and top-n utility mining,respectively,which are based on the existing d HUP algorithm,and facilitated by a new data structure and new pruning strategies.In this paper,we first make a comprehensive analysis and comparison for the existing high utlity pattern mining algorithms,including the running time,memory usage,the number of candidates and the scalability.The result shows that the d2HUP algorithm is the best algorithm and is 1 to 2 orders of magnitude faster than other algorithms in the running time.Thus,d2HUP is used as a basis for incremental utility pattern mining and top-n utility pattern mining.HUPTID algorithm is based on the d2HUP algorithm with transaction insertion and deletion.First,a new data structure is proposed for incremental utility pattern mining.Second,three strategies are proposed for taking every situation to reduce unnecessary mining.In the experiment,three cases are analyzed,that is,the transactions are both inserted and deleted,the transactions are only inserted,and the transactions are only deleted.The results show that the HUPTID algorithm is 1 to 2 orders of magniture more efficient than the state-of-the-art incremental high utility pattern mining algorithms.The TONUP algorithm is the first algorithm that extends the d2HUP algorithm with the top-n high utility pattern mining.The baseline TONUP directly grows top-n high utility patterns by enumerating patterns as prefix extensions,and the full-strength TONUP employs five strategies for taking every opportunity to improve the efficiency and scalability.In the experiment,TONUP is compared with the state-of-the-art algorithms.The results show that the TONUP algorithm is 1 to 3 orders of magniture more efficient than the state-of-the-art top-n high utility pattern mining algorithms,and is even up to 2 orders of magnitude faster than high utility pattern mining algorithms that are tuned with an optimal threshold.
Keywords/Search Tags:utility mining, incremental, dynamic database, top-n, algorithms
PDF Full Text Request
Related items