Font Size: a A A

Research And Implementation Of SNP Discovery Based On Large-scale Est Sequence

Posted on:2011-07-18Degree:MasterType:Thesis
Country:ChinaCandidate:S J DanFull Text:PDF
GTID:2230330374995173Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the increasing number of EST sequence, a large number of redundant EST sequences from different individuals will be the great resource for discovering SNP. SNP, the third-generation genetic marker, represents the most frequent type of human population DNA variation, accounting for more than90%of known polymorphisms. On the other hand, EST is the encoding part of genes, it can be directly obtained the information of gene expression. The SNP discovered from EST provides the most direct marker for functional gene and more reliable marker information for map-based cloning of the gene, identifies the allele directly that decisions on important traits. Based on these two advantages, although there are several methods for SNP discovery, discovering SNP from EST library has more value in practice.This research improved on State863projects "Varieties of cotton bio-molecular design bioinformatics database", building the SNP discovery system based on large-scale EST sequence. Experimental results prove to the SNP discovery system is validity. The main works are as follows:Firstly, this paper analysis the common formats on the biological information, including GenBank format, Fasta format, format of the blast align result, format of the protein coding region, format of the sequence assembly result, format of Align file, format of SNP file and so on.The input file and output file formats in each module of the system are treated as the most commonly used data formats that can be applied to other bioinformatics-related software.Secondly, this paper researched the sequence alignment algorithm on some typical algorithms, and achieved the dot-matrix mapping method and the Smith-Waterman algorithm, comparing with Blast algorithm comprehensivly, this paper suggests to use Blast algorithm in the comparison for large-scale EST sequences. In the system still retains the other two algorithms, that can be applied to small-scale sequence alignment. In addition, this paper focuses on the sequence assembly algorithm based on Hamilton path. Conducted a number of screening on the Overlap based on PHRAR algorithm, and improved the greedy algorithm to the non-cyclic graph topological sorting algorithm in the part of Layout that solved using greedy algorithm always produces a result of not guarantee the best Layout of the problem and improved the accuracy of assembly to improve the validity of candidate SNP. At last, this researcher builded the SNP discovery system based on large-scale EST, including the module of preprocessing the sequence, sequence alignment, CDS discovery, sequence assembly, SNP discovery and SNP visualize.At last, this research excavated SNPs on the63577ESTs of Gossypium Raimondii by SNP discovery system based on large-scale EST sequence, and excavated4133candidate SNPs related the flower organization of Gossypium Raimondii, for the analysis of population genetic structure of cotton and genetic resources conservation, to build a linkage map of cotton and molecular marker-assisted breeding basis.
Keywords/Search Tags:SNP Discovery, Sequence Alignment, Sequence Assembly, EST, Cotton
PDF Full Text Request
Related items