Font Size: a A A

Establishment And Application Of Bioinformatics Analysis Pipeline For Digital Gene Expression Profiling Based On Next-generation Sequencing

Posted on:2013-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y LinFull Text:PDF
GTID:2230330374478812Subject:Crop Genetics and Breeding
Abstract/Summary:PDF Full Text Request
The next-generation sequencing technology has advantages of high-throughput, low-cost and so on. It has been widely used in the fields of animal and plant genome sequencing and gene expression. Cotton is an important economic crop, whose fiber cell is also an ideal model for plant cell growth and development. Because of its large genome and polyploidy, it is difficult to complete the whole genome sequencing and sequence assembly. Therefore, in order to understand the gene function and the role genes played in physiological and metabolic activities, doing research on gene expression of cotton with next-generation sequencing has become a hotspot in recent years. However, the next-generation sequencing generates vast amounts of data, mining useful information from these data with bioinformatics tools becomes current urgent problem. So we established digital gene expression analysis pipeline based on next-generation sequencing, which we used to analyze cotton RNA sequences.The pipeline includes pretreatment of the raw data, mapping data to reference sequences, calculating RPKM value and gene coverage, differential expression analysis and subsequent functional annotation. Then we analyzed four samples (DF1-DF4) by this process, which is the ovule RNA sequencing data of upland cotton normal plants YZ1and PDF1-silenced plants on ODPA. The main results are as follow:1. The establishment of analysis platform:we used FASTX-Toolkit to control the quality of raw data, there were clean reads after removing low quality sequences, then we mapped clean reads to reference sequences with Maq and Bowtie, compiled python scripts to calculate RPKM value and gene coverage, analyzed differential expression with DESeq after combining statistical values of multiple samples. At last, Blast2GO was used for functional annotation of differentially expressed genes and the annotation results were classified by WEGO, KEGG was used for pathway analysis.2. Successfully applied to sequencing data analysis of four samples (DF1-DF4) after removing low quality sequences for DF1-DF4, each one retained99.6%of the raw data. RPKM value which was less than100accounts for92.2%,91.7%,91.7%,91.0%. Having removed differential expression of genes between DF1and DF2, there were27differentially expressed genes compared with DF3, of which23genes were up-regulated and the other were down-regulated. When compared with DF4,345differentially expressed genes were found, which contained51up-regulated genes and294 down-regulated genes. The former got52GO annotations and9KO annotations, the latter got142GO annotations and112KO annotations.The gene expression analysis platform we established showed the feasibility through the application of cotton RNA sequencing data, and it laid a good foundation for large-scale gene expression analysis in the future.
Keywords/Search Tags:next-generation sequencing, cotton, expression profiling, gene mapping, differential analysis, functional annotation
PDF Full Text Request
Related items