Font Size: a A A

Research On Component Assembly Of DBG Strategy-based Sequence Assembly Algorithms

Posted on:2022-09-24Degree:MasterType:Thesis
Country:ChinaCandidate:G WuFull Text:PDF
GTID:2480306497952089Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The rapid development of the second-generation high-throughput sequencing technology and the third-generation single-molecule sequencing technology has accelerated the entire process of human scientists' analysis of the genome and the information it carries.the cost of gene sequencing has continued to decrease with the development of high-throughput sequencing technology and single-molecule sequencing technology.The accumulation of biological data from multiple disciplines including genomics and transcriptomics has provided massive data resources for bioinformatics.How to process these bioinformatics data efficiently and accurately has become a bottleneck restricting the development of sequencing technology and the development of bioinformatics.The sequence assembly algorithm based on DBG(De Bruijn Graph)strategy is a key algorithm in bioinformatics,which is widely used in the field of gene sequence assembly.However,in today's field of sequence assembly algorithms,the existing research work has been studied to meet specific needs,and very few people have carried out research on algorithms from the field of sequence assembly.As a result,the entire field of sequence assembly lacks an algorithm framework that can be applied to various scenarios.To a certain extent,it leads to problems such as the redundancy of the sequence assembly algorithm and the calculation errors that may be caused by the artificial selection algorithm.This paper analyzes the field of DBG Strategy-based Assembly algorithm(DBGSA)in depth,and finds that the sequence assembly algorithm based on DBG strategy can be divided into five major steps: error correction,build graph,remove,contigs,scaffolds.According to the Generative Programming method,the domain feature modeling and algorithm component interaction design of DBGSA are carried out.With the support of the PAR platform,the DBGSA algorithm component library is formally implemented,and further the DBGSA component library is used to assemble the specific algorithm.This paper compares the assembled gene sequence assembly algorithm with the current mainstream Velvet and SOAPdenovo gene sequence assembly algorithms.The results show that the gene sequence assembly algorithm assembled in this paper is not weaker than the other two mainstream gene sequence assembly algorithms and has high practicability.This research adds domainlevel research to the domain of sequence assembly,and implements the DBGSA component library,which can assemble specific sequence assembly algorithms,ensuring the efficiency of algorithm development and the reliability of assembly generation algorithms.At the same time,it also provides a valuable reference for solving problems in the domain of sequence assembly.This paper also uses the software refactoring method to reconstruct the assembly platform of biological sequence algorithm components.The software quality of the refactored biological sequence algorithm component assembly platform has been greatly improved,and the code structure has become clearer,which is convenient for future maintenance.And the gene sequence assembly algorithm assembly function is also added to the platform to generate a new biological sequence algorithm component assembly platform.The user does not need to know too much about the implementation of the internal algorithm,only needs to select the component that meets the constraints through the visual interface,and then the required biological sequence algorithm can be assembled.This greatly improves the user experience.
Keywords/Search Tags:sequence assembly, generative programming, domain feature modeling, PAR method, software refactoring
PDF Full Text Request
Related items