Font Size: a A A

Non-overlapping Sequence Pattern Mining With Gap Constraints

Posted on:2017-12-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y TongFull Text:PDF
GTID:2428330596957438Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Sequence pattern mining is an important part of data mining,whose main task is to find the frequent subsequences in the sequence database,and is widely used in many fields.The traditional sequence pattern mining has blindness that leads the mining results are redundant and inefficient.Therefore,various types of conditions and constraints are derived to mining pertinently.The problem of sequence pattern mining with gap constraints is present research hotspot and has a wide application value.According to the requirements of different constraints,sequence pattern mining with gap constraints can be divided into space constraints with no special condition,one-time condition and non-overlapping condition.Non-overlapping condition is that the pattern in the sequence of any two in the same position cannot be used at the same character,which neither as special condition produces many redundant patterns,nor as one-time condition ignores interesting patterns.The research of this paper shows that sequence pattern mining under non-overlapping condition has more full research value and application value.The main research contents in this paper are as follows:1.We proposed the strict definition for the problem of non-overlapping sequence pattern mining,analyzed the reason why existing non-overlapping pattern matching algorithm: INSgrow is not completely solution,and theoretically proved that the complete solution can be solved for pattern matching under non-overlapping condition;2.Pattern matching algorithm NETGAP was proposed,which can completely calculate occurrence using tree network structure.Based on that,three mining algorithms were also proposed,they were breadth first mining algorithm(NetMining-B),depth first mining algorithm(NetMining-D)and improved algorithm(NOSEP)that pattern growth strategy be used to reduce candidate set;3.The pattern matching algorithm NETGAP with completeness was proposed by using the nettree structure,and three mining algorithms: NetMining-B,NetMining-D and NOSEP were also proposed;4.Comparing with the mining results with no special condition,one-time condition and non-overlapping condition in the DNA sequence and Time sequence,respectively,the results prove that non-overlapping condition has batter mining capabilities to interesting frequent patterns for users.The completeness and efficiency of the NOSEP algorithm were also proved through a large number of comparative experiments.
Keywords/Search Tags:Sequence pattern mining, Gap constraints, Non-overlapping condition, Nettree, Apriori property
PDF Full Text Request
Related items