Font Size: a A A

Research On Utility-based Sequential Pattern Mining And Hiding Method

Posted on:2021-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:Y W ZuFull Text:PDF
GTID:2428330611498841Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Sequential pattern mining is a classic research topic in data mining.However,in sequential pattern mining,the support used to measure the importance of things sometimes does not reflect the user's interest in a certain extent.For example,support does not reflect the information such as the time of web browsing,the profit of product.In order to solve this problem,researchers incorporated utility to sequential pattern mining and proposed utility-based sequential pattern mining which aims to find all sequences with high utility under the specified thr eshold from the database.Although the utility-based sequential pattern mining method can discover high-utility knowledge,it also causes some hidden dangers of information leakage to a certain extent.To address this problem,researchers proposed utility-based sequential pattern hiding method to avoid or reduce the harm caused by this problem.Contrary to the mining method,the hiding method is designed to hide all high utility sequences under the specified threshold by modifying the specified database.Th is paper makes a series of improvements to the existing algorithms from the perspectives of mining and hiding.In many scenarios,decision makers tend to find patterns with high utility.For this reason,this article exploits the hierarchical relationship between items to find patterns with more high utilities,that is,hierarchical high utility sequential pattern mining.In order to find these patterns,this paper proposes the MHUH(Mining high utility hierarchical sequential patterns)algorithm.Although the introduction of hierarchical relationships brings the increase in utility,it also leads to the problem of excessive search space.To solve this problem,this paper proposes the TSWU-based pruning strategy and the PBS pruning strategy to reduce the sea rch space.The experimental results show that compared with mining algorithms that ignore hierarchical relationships,MHUH can bring about an increase in utility.In order to hide all high utility sequences under a specified threshold,it usually needs to mine these sequence first,and then calls the modification module to modify the database.However,mining these sequences is often extremely time-consuming;the quality of modification module depends on the related modify strategy.To improve the performance of the hiding algorithm,this article makes improvements in terms of mining efficiency and modification strategies.On the one hand,in order to speed up the mining process,this paper proposes a highly utility sequential pattern mining algorithm HUS-UT(High utility sequential pattern-utility table).The main improvement of HUS-UT is to exploit the Utility-Table data structure and TRSUbased pruning strategy to accelerate the utility calculation process and reduce the search space.On the other hand,this paper proposes a fast hiding strategy.Based on these improvements,this paper proposes the FH-HUSP(Fast hiding high utility sequential patterns)algorithm.Experimental results show that the proposed FH-HUSP algorithm can quickly complete hidden tasks,and the modification of the database is relatively small.
Keywords/Search Tags:sequential pattern, hierarchical pattern mining, utility mining, utility hiding
PDF Full Text Request
Related items