| The maize genome is rich in structure variations, among which the large deletions leads to maize phenotypic variation. It is also an important source of genetic diversity and phenotypic variation of maize. This thesis analyzes the next generation sequencing data of 79 temperate and tropical maize lines provided by Chinese Academy of Agricultural Sciences. Based on the method of kmer depth, we use the Hidden Markov Model to study the type, polymorphism and distribution of the large deletions between different lines, and detect the counts and position of large deletions in maize. Details are as follows:Firstly, the sequencing data is preliminarily analyzed and prepared. We use the sequencing data to build the kmer database of each line and get the kmer feature. And then we roughly calculate the generally situation and distribution of the large deletions in maize. We compute the kinship between different lines as well.Secondly, we apply the Hidden Markov Model to detect the large deletions in maize genome. By mapping the kmer of each line to the reference genome, we get the kmer depth of each base pair in the wide genome. We use the scanning window to get the median of the kmer depth of each window, so that we get the kmer matrix as input. Then we use the Expectation Maximization Algorithm to get the solution of Hidden Markov Model. Therefore, we get the deletions of the genome.Thirdly, by using the kinship between different lines, we use the Mixture Hidden Markov Model to improve the algorithm. Because of the different kinship between the reference genome and other lines in the process of genetic, there is a high correlation between different lines’ genome. Thus, we establish HMM for each line and make the Mixture Hidden Markov Model according to their kinship to improve the detection accuracy.At last, two different experiments are designed to evaluate the algorithm. Firstly, random large structure variations and SNP is simulated base on the known reference genome B73. Then we use the tools to simulate the next generation sequencing to get the raw reads. Through this method, the accuracy and efficiency of the algorithm is evaluated. The experiment shows that the Hidden Markov Model can precisely detect the deletion of maize genome in a short period of time. By using the kinship matrix, the accuracy is further improved. About 90% of the deletions are precisely detected. Secondly, in the previous research, thousand large deletions in Mo17 are found by the method of comparative genomic hybridization. We compare the detection result with them. It shows that our algorithms also perform well in detecting large deletion in maize by using the real next generation sequencing data. |