Font Size: a A A

Negative Sequence Pattern Mining Technology And Application Of Biological Information

Posted on:2022-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y LuFull Text:PDF
GTID:2480306323960419Subject:Software engineering
Abstract/Summary:PDF Full Text Request
A large amount of complex and valuable biological information is stored in the DNA sequence.The similarity analysis of the DNA sequence can discover the evolutionary relationship between organisms,so as to better process the biological information;pattern matching of DNA sequences can count the position and number of gene fragments,which is beneficial to state tracking and targeted therapy of pathogenic genes.However,most sequence similarity analysis and pattern matching methods focus on the entire sequence patterns rather than frequent sequence patterns,which increases the computational complexity;and the current research methods ignore the gene fragments with missing bases,that is,for negative sequence patterns,there are no unified analysis methods,which greatly affects the accuracy and efficiency of analyzing biological information.In view of the low efficiency of the existing analytical methods and the small number of evolutionary relationships and matching paths of the obtained DNA sequences,this paper studies the key issues contained in it,and proposes two solutions respectively,as follows:Aiming at the problems in the similarity analysis of biological sequences,a method of "similarity analysis based on positive and negative DNA sequence patterns" is proposed.Firstly,the f-NSP algorithm is used for pattern mining on the entire DNA data,and the largest positive and negative frequent sequence patterns of the species are obtained.Secondly,a graphic representation method for representing positive and negative sequences on a two-dimensional plane is proposed.This method can well represent positive and negative sequence patterns and convert them into time series through formulas.Finally,DTW-based distance measurement is used to analyze the similarity of DNA sequences,and the results are drawn into a phylogenetic tree.Experiments on real biological data show that the proposed method can obtain a wealth of species evolutionary relationships,and the analysis results were more in line with the actual evolutionary relationships of species,and the accuracy has been improved.Aiming at the problems in the pattern matching of biological sequences,a method of "positive and negative DNA sequence pattern matching with general gaps and One-off constraints" is proposed.Firstly,frequently mined sequence patterns are used as the pattern P in the matching process,which makes the pattern matching more practical.Secondly,two repeated element detection mechanisms are proposed,which can effectively prune the matching paths that do not meet the One-off constraint,and accurately match all information and eliminate redundancy.Through experiments on DNA sequences of 10 real species,it is shown that the algorithm can effectively deal with negative sequence patterns and obtain more matching paths.At the same time,the operating efficiency has also been improved.
Keywords/Search Tags:pattern matching, similarity analysis, negative sequence, pattern mining, frequent patterns
PDF Full Text Request
Related items