Font Size: a A A

Implementation Method For A Family Of Biological LCS Algorithms Based On PAR Platform

Posted on:2022-07-29Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2480306497952119Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
In computer science and other research fields,sequences(strings)are one of the most basic and important data types.Ordinary texts,mathematical formulas,source codes of program,gene sequences(DNA,RNA)can all be regarded as a collection of sequences.The longest common subsequence is the longest common subsequence obtained by deleting zero or more elements from the two or more sequences.The algorithm for solving the longest common subsequence has a wide range of applications in text recognition,file compression,data mining,and bioinformatics.In bioinformatics,it is often necessary to judge the similarity and homology between sequences.Finding the longest common subsequence between biological sequences is one of the basic methods to judge the similarity and homology between sequences.The solving algorithms of the longest common subsequence problem are numerous and complex.Different longest common subsequence algorithms are suitable for different types of sequence data,which makes it difficult for users to understand this type of algorithms and cannot choose a suitable algorithm in practical applications.Existing research mainly focuses on the optimization of the specific steps of the longest common subsequence algorithm,and cannot solve the problem that this type of algorithm is too abstract and difficult to understand.Based on the PAR platform,this article comprehensively uses formal methods,domain engineering,generic programming,abstraction and other technologies and methods to study the realization of the longest common biological subsequence algorithm family from a high level of abstraction.Thereby reducing the complexity of the algorithm in this field,improving the intelligibility,reliability and development efficiency of the algorithm.The main work of this paper includes the following Three aspects:(1)The current typical implementation method of Longest Common Subsequence(LCS)and the multiple longest common subsequence(MLCS)algorithm based on the dominating point are investigated and analyzed in depth.Based on this,an implementation method of LCS algorithm family that comprehensively uses PAR method,domain engineering,generic programming,abstraction and other theories,methods and technologies is proposed.(2)Analyze the dynamic programming-based longest common subsequence algorithm and the constrained longest common subsequence algorithm(CLCS)as a unified domain.Designed the algorithm function components in this field,and applied the algorithm family realization method in this paper to realize it,thereby generating the LCS algorithm family component library.The component library is further used to assemble and generate the basic LCS algorithm,and with the support of the PAR platform Apla?C++ program generation system,it is converted into an executable C++ program for experimentation.(3)On the basis of the research to LCS algorithm based on dynamic programming,the domain of Multiple Longest Common Subsequence(MLCS)algorithm based on dominating point is studied.Extract five main functional components from this domain: sequence legitimacy check component,sequence preprocessing component,dominant?mode component,directed acyclic graph component,result output component.And with the support of the PAR platform,the above components are abstractly implemented to form a highly abstract component library of MLCS algorithms.Further assembly generates the widely used Fast?LCS algorithm based on this component library and with the support of Apla?C++program generation system,it is converted into an executable algorithm program.The comparative experiment with TOP?MLCS algorithm shows that the implementation method of algorithm family proposed in this paper has high practicability.
Keywords/Search Tags:biological sequence, feature model, dynamic programming, dominant point, component assembly
PDF Full Text Request
Related items