Detection Algorithm Of The Large Deletions In Maize Genome

Posted on:2017-09-12

Degree:Master

Type:Thesis

Country:China

Candidate:X R Huang

Full Text:PDF

GTID:2323330509461669

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The maize genome is rich in structure variations, among which the large deletions leads to maize phenotypic variation. It is also an important source of genetic diversity and phenotypic variation of maize. This thesis analyzes the next generation sequencing data of 79 temperate and tropical maize lines provided by Chinese Academy of Agricultural Sciences. Based on the method of kmer depth, we use the Hidden Markov Model to study the type, polymorphism and distribution of the large deletions between different lines, and detect the counts and position of large deletions in maize. Details are as follows:Firstly, the sequencing data is preliminarily analyzed and prepared. We use the sequencing data to build the kmer database of each line and get the kmer feature. And then we roughly calculate the generally situation and distribution of the large deletions in maize. We compute the kinship between different lines as well.Secondly, we apply the Hidden Markov Model to detect the large deletions in maize genome. By mapping the kmer of each line to the reference genome, we get the kmer depth of each base pair in the wide genome. We use the scanning window to get the median of the kmer depth of each window, so that we get the kmer matrix as input. Then we use the Expectation Maximization Algorithm to get the solution of Hidden Markov Model. Therefore, we get the deletions of the genome.Thirdly, by using the kinship between different lines, we use the Mixture Hidden Markov Model to improve the algorithm. Because of the different kinship between the reference genome and other lines in the process of genetic, there is a high correlation between different lines’ genome. Thus, we establish HMM for each line and make the Mixture Hidden Markov Model according to their kinship to improve the detection accuracy.At last, two different experiments are designed to evaluate the algorithm. Firstly, random large structure variations and SNP is simulated base on the known reference genome B73. Then we use the tools to simulate the next generation sequencing to get the raw reads. Through this method, the accuracy and efficiency of the algorithm is evaluated. The experiment shows that the Hidden Markov Model can precisely detect the deletion of maize genome in a short period of time. By using the kinship matrix, the accuracy is further improved. About 90% of the deletions are precisely detected. Secondly, in the previous research, thousand large deletions in Mo17 are found by the method of comparative genomic hybridization. We compare the detection result with them. It shows that our algorithms also perform well in detecting large deletion in maize by using the real next generation sequencing data.

Keywords/Search Tags:

Maize genome, Large deletion, Hidden Markov Model, Expectation Maximization Algorithm

PDF Full Text Request

Related items

1	Multi-algorithm Collaboration For Panax Notoginseng PnbZIP And PnWRKY Gene Families Analysis
2	Pig Anomaly Detection Based On Audio Analysis Rechnology
3	Research On Monitoring And Early Warning Model Of River Crab Aquaculture Environment
4	Identifying Drought-Responsive Crucial Gene In Sorghum By Integrating Gene Differential Co-Expression Networks And Hidden Markov Random Field Model
5	Research On The Quantitative Identification Of Green Plant Water And Fertilizer Coupling Based On The Hmm Algorithm
6	Research Of Pig Breeding Environment Monitoring And Early Warning Model Of Ammonia Concentration
7	Development And Application Of Cotton Planting And Production Cloud Management Platform
8	Research On Pig Audio Recognition Based On Audio Compression Transmission And Endpoint Detection
9	Spatiotemporal Analysis Of The Soil Erosion Based On CSLE And CA-Markov Model
10	Mapping And Cloning Of Two Important Loci In Maize