Utility-based sequential pattern mining can extract actionable high utility sequential patterns(HUSPs)and is used in many fields,such as e-commerce recommendation,scenic route planning,and clickstream analysis.However,the quantitative sequence database may contain personal identity,telephone number,credit card number,hobbies and other private information of users,which is always exposed to the risk of disclosure.At present,some high utility sequential patterns hiding algorithms based on privacy-preserving have been proposed to protect HUSPs from the risk of information disclosure,but they still exist the following problems:(1)Most of existing algorithms ignore the consistency between the original database and the sanitized database when hiding HUSPs,while consuming a lot of execution time and memory usage.(2)The existing algorithms will cause a large number of non-sensitive high utility sequential patterns to be lost and hiding failure when hiding sensitive high utility sequential patterns.(3)The existing high utility sequential pattern mining methods ignore the relationship between items and sequences,which directly leads to the generation of redundant sequential patterns.The study of these problems has important theoretical value and practical significance for effectively protecting HUSPs,reducing the influence on non-sensitive patterns and mining actionable combined sequential patterns.To solve the first problem,this thesis proposes high utility sequential patterns hiding algorithm HHUSP-SW based on sequence weight to better ensure the consistency between the original database and the sanitized database after the hiding procedure.At the same time,a new structure is designed to maintain utility and location information for all high utility sequential patterns to more quickly identify sensitive items that need to be modified.Extensive experiments demonstrate that HHUSP-SW is superior to state-of-the-art algorithms in terms of execution time,memory usage,and missing cost.To solve the second problem,this thesis presents a novel algorithm HSHUSP-ILP for hiding sensitive high utility sequential patterns based on integer linear programming.First,a new HUS-table structure is designed to store the mapping relationship between all high utility sequential patterns and the original database.Second,to reduce the number of constraints,the sensitive and non-sensitive high utility sequential patterns are divided into several sub-groups according to the relationship between the patterns.For each sub-group,the appropriate sensitive items are selected as the unknown integer variables to convert the process of hiding sensitive high utility sequential patterns into the integer linear programming model that satisfies the constraints to solve the problem.The original database is then perturbed by finding the optimal solution of the model.Experimental results show that the proposed algorithm is superior to existing techniques in reducing the influence on.non-sensitive patterns.To solve the third problem,this thesis presents a new method named Combined Utility-Association Sequential Pattern Mining(CUASPM)by incorporating item and sequence relations,which can effectively remove redundant patterns and extract actionable combined sequential patterns with high utility and strong association.Specifically,the thesis introduces the concept of actionable combined mining into high utility sequential pattern mining and develops a novel tree structure to seek high utility sequential patterns.Furthermore,two efficient pruning strategies(i.e.,global and local strategies)are presented to facilitate mining combined sequential patterns while guaranteeing utility growth and high levels of association.Last,the contribution and weight parameters are introduced to evaluate the interestingness of patterns to choose the most useful actionable combined high utility sequential patterns(ACHUSPs).Extensive experimental results demonstrate that the proposed algorithm outperforms the traditional methods in terms of execution time,memory usage,and mining high utility and strongly associated sequential patterns. |