Font Size: a A A

Research On Indel Recognition Method Of High Throughput Sequencing Data

Posted on:2021-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:T ZhangFull Text:PDF
GTID:2370330611456087Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The early start of the human genome project did not spend the main funds on sequencing,but actively developed sequencing tools and tools,and completed the complete sequencing program by greatly improving the sequencing speed and reducing the sequencing cost.But its production of data is still relatively limited,and the thousand-genome project has reintroduced the development of science into an environment where data processing tools are insufficient.Based on this,various platforms and tools that adapt to the large amount of data provided by high-throughput sequencing(HTS)have also been rapidly developed.Indel(Insertion/Deletion)is the branch of high-throughput sequencing data processing in a narrow,are a type of genetic structure variation of larger scale,it is the second largest in variation types of SNP(single nucleotide polymorphism,single nucleotide polymorphism),and become the most common structure variation and widely distributed in different structure,the following is the main research content of this article.First of all,this article will human chromosome 1 as the reference data,using the structure of several common variation recognition algorithm to identify Indel,through experimental verification,the comparative analysis the advantages and disadvantages of several kinds of recognition algorithm,the results showed higher levels of false positive and false negative rate or a lower level of recall and precision,recognition results is not accurate,and then proposed a new algorithm to improve the recognition accuracy of Indel.Then,this paper proposes a single-ended abnormal sequence generation algorithm based on SR(SESR algorithm),which is used to filter abnormal data and obtain single-ended abnormal sequence.The algorithm showed higher recall and precision,and lower false positives and false negatives.The main innovation of this article is to first set 200 bp as a detection window,use the idea of SR(Split read,based on read split matching)to screen out the read break region,and then select single-ended abnormal sequencing fragments in this area to analyze the break region The size,position and orientation of the internal single-ended abnormal sequencing fragments,and finally output the abnormal sequence recognition result.Finally,this paper designs the experimental data construction method and penalty method used in the evaluation of the Indel recognition algorithm,and uses the epidermal growth factor receptor gene as the data source.Using this penalty index,the SESR algorithm and Pindel The pattern growth algorithm used is scored and compared.As a result,the algorithm design work completed in this paper provides better Indel recognition capabilities.The research on the variation of human genome is of great significance in the aspects of genome evolution,medical progress,disease treatment,human health,etc.,and many small Indel occur in the key locations of human genome,so a good Indel detection and research method is crucial.
Keywords/Search Tags:high-throughput sequencing data processing, Genomic structural variation, Indel recognition, SESR algorithm
PDF Full Text Request
Related items