Learning-Based Consensus Construction From Long Error-Prone Reads

Posted on:2021-03-16

Degree:Master

Type:Thesis

Country:China

Candidate:S J Wang

Full Text:PDF

GTID:2370330611499998

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Since the launch of the Human Genome Project,genome sequencing has widely influenced the research methods of life sciences,and the genomes of various model species have been continuously analyzed in global laboratories.In recent years,with the increase in genome sequencing data throughput and cost reduction,it has become a routine method in the field of biomedicine.At present,the third-generation sequencing represented by Pacific Biosciences and Oxford Nanopore Technology long-read sequencing can generate sequencing fragments of enough length,which greatly promotes the development of genome assembly,mutation detection and other analytical fields.However,the third-generation sequencing sequences has a very high error rate(?15%),which affects the accuracy of the analysis results and limits its application in medical research and clinical diagnosis.Therefore,scientists are committed to developing more efficient analytical methods to break this limitation.Genome assembly is the process of reconstructing several M or even hundreds of M genome sequences from a large number of short fragments obtained by random sequencing.The ultimate goal is to generate complete and accurate consensus sequences.Although the application of the third-generation sequencing technology has greatly improved the integrity of genome consensus sequences,the high error rate of sequencing has limited its accuracy.Especially when assembling repetitive sequences and haplotypes,there are still challenges in obtaining high-quality and accurate consensus sequences.The key to generating consensus sequences is to obtain accurate multiple sequence alignment results.Considering the features of long-read,high error rate and high throughput of the third-generation sequencing sequences,resource-intensive sequence error correction and consensus construction are required to obtain high-quality assembly results.This research proposes a consensus generating model that contains deep learning and reinforcement learning methods,which can not only improve the results of multiple sequence alignment,but also obtain gene consensus with higher accuracy.The subject mainly carried out the following three work s:(1)Proposing a method based on reinforcement learning to adjust the alignment of genetic data,which adopts the asynchronous advantage actor critic algorithm to learn the comparison strategies.Since the current mainstream multiple sequence alignment methods still have many shortcomings,it is hoped that the results of the alignment could be improved through effective strategies.(2)Proposing a mechanism called curiosity reward,which can further adjust the results of multiple sequence alignment to make it not only get better results on evaluation indicators,but also be closer to the actual meaning of biology and more in line with the structure of gene sequences' features.(3)Introducing deep learning methods to extract the structural features of multiple sequence alignment results which can help generating consensus sequences with higher accuracy by combining the characteristics of each sequence data with different throughput number.This practice can make consensus still maintain excellent accuracy by using less data without obtaining the quality value at the time of sequencing,nor reading the ultra-long sequence at a time,which can process small data blocks more flexibly.

Keywords/Search Tags:

Gene Sequencing, Multiple Sequence Alignment, Consensus, Deep Learning, Reinforcement Learning

PDF Full Text Request

Related items

1	SgRNA Activity Prediction Method Based On Reinforcement Learning
2	An Enhancer Identification Algorithm Based On Deep Learning
3	Research On Intelligent Decision Model Based On Deep Reinforcement Learning
4	Adjustment Of Atmospheric Profiles Based On Deep Reinforcement Learning
5	Research On Geophysical Inversion Based On Reinforcement Learning
6	Research On Sequence Ambiguity Function Based On Deep Reinforcement Learning
7	Research On Monitoring Method Of Multi-scale Cyclone Based On Deep Reinforcement Learning Algorithm
8	Research And Realization Of Game Strategy Based On Deep Reinforcement Learning
9	Deep Reinforcement Learning With Exploratory Noise
10	Research On Underwater Robot Navigation Algorithm Based On Deep Reinforcement Learning