Font Size: a A A

Maximal Sequence Patterns Mining With Non-overlapping Condition

Posted on:2021-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:S ZhangFull Text:PDF
GTID:2518306560453484Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Sequential pattern mining with gap constraints is an extension of repetitive sequential pattern mining,which has become a hot research topic in data mining.It has flexible expression and good pertinence.Compared with other similar methods,nonoverlapping pattern mining,as a kind of sequential pattern mining with gap constraints methods,can find more valuable patterns.The state-of-the-arts algorithms focus on finding all frequent patterns,which contains lots of short patterns.However,it not only reduces the efficiency of mining,but also increases the difficulty for obtaining the demand information.In order to reduce the size of the pattern set without changing the threshold and retaining its expression ability,we can focus on the maximal sequence pattern mining which refers to find frequent patterns whose super-patterns are infrequent.Meanwhile,it provides boundary information for frequent patterns and infrequent patterns.To tackle aforementioned problems,this paper makes a detailed analysis and introduction of the mining problem of maximal nonoverlapping sequence patterns.The main research contents and related work are as follows:1.This paper makes a detailed analysis and explanation of the definition and properties of the maximal nonoverlapping sequence patterns2 The pattern matching algorithm Netback is proposed in this paper,which transforms the problem of calculating the support of pattern in the sequence into a Nettree.Starting from the leaf node,we use the backtracking strategy to search the leftmost parent node iteratively,and get the nonoverlapping set of occurrences.Compared with NETGAP algorithm,the time complexity of matching is reduced from O(m*m*n*w/r/r)to O(m*n*w/r/r).3.An effective mining algorithm Net MNP is proposed.Based on the apriori property of sequential pattern mining with nonoverlapping,this paper proposes PGrowth algorithm to generate candidate patterns,which uses pattern splicing and effectively prunes the candidate pattern set.At the same time,Net MNP algorithm combines with pattern generation algorithm to determine the maximal pattern.4.Experiments on real biological sequence datasets verify that Net MNP has better performance that other competitive algorithms and compresses the frequent patterns set effectively.
Keywords/Search Tags:Sequential pattern mining, Maximal sequence patterns, Nonoverlapping, Nettree, Backtracking, Pattern growth
PDF Full Text Request
Related items