Font Size: a A A

Using Regression Methods To Estimate The Length Of The Longest Frequent Patterns With Periodic General Gap Constraints

Posted on:2016-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:H H SunFull Text:PDF
GTID:2310330536486836Subject:Engineering
Abstract/Summary:PDF Full Text Request
The form of frequent patterns with periodic general gap can be described as p1[M,N]p2[M,N]p3...pm-1[M,N]pm,in which M and N represent the minimum and maximum gap sizes respectively and M<0.In the work of mining sequential patterns with periodic general gap constraints on DNA sequences,an important task is to estimate the length of the longest frequent patterns reasonably.If the length by estimated is too long,it will consume too much memory;but if the length is too short,it will lead to memory overflow.The purpose of this paper is to estimate the length of the longest frequent patterns with periodic general gap constraints with the Regression Method.We do the study and implementation according to the following three aspects.The first one is feature extraction,we use the G-FSA which is a feature extraction algorithm for periodic general gap constraints to calculate the frequency of length-2 patterns in the DNA sequences and then we can get the former 16 dimensions of experimental data sets.The second one is obtaining the regression target.We use the MAPD-PRO which is a sequential pattern mining algorithm with periodic general gap constraints to get the length of the longest frequent patterns in DNA sequences.Changing the gap and threshold to get the length of the longest frequent patterns under different gap and threshold.Then,we can get the 18 th dimension of experimental data sets.The 17 th dimension is the threshold of mining sequential patterns.Through these two steps we can get the data sets.The last step is building learning machine through regression methods,BP-network,Least Squares Support Vector Machines(LS-SVM)and Extreme Learning Machine(ELM)are adopted in this paper.Divide the experimental data sets into training sets and test sets,then using three kinds of regression methods to train these training sets,and after that using the testing sets to test the former training effects.Finally,this paper designed two experiments to predict the length of the longest frequent patterns,one is when the threshold and gap change,and the second is when the threshold value and sequence change.It is verified that using regression method to predict the length of the longest frequent patterns is effective.
Keywords/Search Tags:sequential pattern mining, periodic general gap, the length of the longest frequent patterns, BP neural network, LS-SVM, ELM
PDF Full Text Request
Related items