Font Size: a A A

Top-K Sequence Pattern Mining With Non-Overlapping Condition

Posted on:2019-11-18Degree:MasterType:Thesis
Country:ChinaCandidate:D YangFull Text:PDF
GTID:2428330623469006Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In some studies of pattern mining,a large number of patterns that users are not interested in are often being mined,which are difficult to apply in real life.Top-k pattern mining is proposed because of the more frequent patterns the more users concerned.Although this pattern mining can help users find the most frequent k patterns,Top-k mining can only mine short patterns in mining with Apriori properties.However,the short patterns contain less information than the long patterns.Therefore,for the problem of non-overlapping sequence pattern mining,we study the patterns of the top k support in each pattern length,that is,Top-k sequence pattern mining under no overlapping conditions.The main research content and related work of this article are as follows:(1)We analyze the shortcomings of the traditional Top-k sequence pattern mining in this paper.In addition,the Top-k sequence pattern mining algorithm MAPBOK with periodic gap constraint and the improved Top-k sequence pattern mining algorithm NOSEP-k are described and analyzed.Aiming at the shortcomings of existing Top-k algorithms,an algorithm,called NOSTOPK(Non-overlapping Sequence Pattern Mining for Top-k),for mining top-k sequential patterns with the non-overlapping condition is proposed.(2)The algorithm NOSTOPK calculates the support of patterns according to the Nettree and it doesn't need to set the minimum support threshold,which effectively solves the problem that the minimum support threshold ? is difficult to set in frequent pattern mining.Each time,we choose the top k patterns to mine and find the corresponding k(9)| ? |patterns,which reduces the number of candidate sets and effectively compresses the patterns.(3)The traditional Top-k pattern mining algorithm is to obtain the top k patterns in all frequent patterns.The advantage of the algorithm NOSTOPK is that the first k patterns of each length can be mined,satisfying the special requirements of users for patterns.A large number of experimental results verify the feasibility and validity of our proposed NOSTOPK algorithm.
Keywords/Search Tags:Sequence pattern mining, Gap constraint, Top-k, Non-overlapping condition, Nettree
PDF Full Text Request
Related items