Font Size: a A A

Construction Of Aegilops Genome Data Platform

Posted on:2014-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:B Q DongFull Text:PDF
GTID:2230330395997427Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Archaeological evidence indicates that common wheat was first domesticated around10,000years ago. Common wheat has three sub-genomes derived from spontaneoushybridization of the tetraploid wheat and the diploid wild grass species Aegilops tauschii.And the Aegilops genome has very important effect to environment adaptation and diseaseresistant in common wheat. The basic work of Aegilops tauschii research is the sequencingof genome. Next-generation sequencing could produce lots of sequence fragments in a veryshort time, and it could also resolve a lot of problem which related to about biology. Thethroughout of genome increasing exponentially, while the prices to sequencing continue todecline, which result in massive genome sequences. It is necessary to build a secondarybioinformatics database to integrate the genome data. The Aegilops tauschii genomedatabases, based on the Nest-generation sequencing, could be further applied in molecularbiology techniques, and provide a strong technical support to improve the efficiency ofwheat breeding and also improvement the complex traits in wheat genome.As the completed genome could not be directly sequenced, so the Next-generationsequencing technology have to interrupt the whole genome into short fragments firstly, andthen sequence these fragments to generate reads by sequencing instrument after removingthe very short fragment. Finally, these reads would be used to complete the build of contig.The data format of Next-generation sequencing is FASTA. Only these reads could not beused in molecular biology research, as these reads (~40bp) are too short to study. So thesereads must be assembled to contigs. Contigs were assembled by proper algorithm, and afew contigs were assembled together to form large fragments, which are scaffolds, tofurther processing and analysis. At last, the analyzed data was aggregated into the Aegilopsgenomic database (AGDB). Therefore, in order to cope with the large amount of short readswith highly throughput, a contig construction algorithm has been programmed to solve thisproblem.Aegilops tauschii genome contains many duplicate fragments. The complex structureof these duplicate fragments seriously affects to the results of genome assembly. In thisarticle, pair-end sequence will be used as the input string, based on the trait of the doublehelix structure of DNA sequence to evaluate the overlap between the duplicate sequence fragments, and then make use of the information of sense strand and antisense strand willbe used to correct the result, then reducing the incorrect result. The Burrows-Wheelertransform algorithm was used in assembly, in order to improve the efficiency of generatingcontigs, reduce the running time of the algorithm, reduce requirements of hardware thatwould be used to run. Aegilops tauschii genome doesn’t have a proper reference sequence.In order to solve this problem, the BCA assembly algorithm, which selects the reads havinghigher quality values from FASTQ file was used to reduce the pollution in sequence data.And it can also optimize assembly by using the CAP3algorithm.In this paper, we have completed the Aegilops tauschii genome data platform, makethe best use of the existing technology, after comparing and analyzing the development andthe demand of genome database. Aegilops tauschii genome data platform provides themajor functions to search and display the genome information, including the search ofscaffolds and other genomic information, display of microRNA, R gene and transcriptionfactors. Even more, the platform provides gene browser GBrowse, BLAST search and FTPdownload.The client of AGDB is WEB browser, the system fully considered of the compatibilityamong those mainstream browsers. The AGDB supports IE, Firefox and other browsers.Biologists can search the basic information of Aegilops tauschii genome by the WEBdisplay interface for the platform. The data of genome has stored and managed in MySQLdatabase management software. The display module of AGDB shows most of the genomicdata. Biologists can understand the relationships among the Aegilops tauschii protein, CDS,homologous genes. The platform provides the information of R genes, which would beprovide a steady support for wheat breeding in disease-resistant research filed.The AGDB also embedded in third-party components Genome Browser, GBrowse.The GBrowse provides a nice display to biologists. Users can select any genomic fragmentthey want to find the relevant information about it. To most of biologists, the Aegilopstauschii genome data platform will provide a friendly and efficiently communication andservice platform.
Keywords/Search Tags:Aegilops tauschii genome, FM-index, BWT algorithm, CAP3algorithm, contig, dataplatform
PDF Full Text Request
Related items