Font Size: a A A

Study Of Gene Families And Non-coding Sequences Based On Bioinformatics Methods

Posted on:2007-05-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:X S WangFull Text:PDF
GTID:1100360212495162Subject:Crop Genetics and Breeding
Abstract/Summary:PDF Full Text Request
Rice (Oryza sativa L.) is one of the most important cereal crops in the world and feeding more than half of the world population. It can be categorized into two main subspecies, japonica and indica. Rice is a model plant of cereal species because of its relatively small genome size. In rice, draft genome sequences of two cultivars, 93-11 and Nipponbare, representing indica and japonica subspecies, respectively, have been released. Complete sequences of chromosomes 1, 4 and 10 of Nipponbare have been published. Arabidopsis is of no major agronomic significance, but it offers important advantages for researches in genetics and molecular biology because of the small genome size, short generation time, production of plenty of seeds and the ease of transformation by simple techniques. Arabidopsis is the first plant of which the complete genome has been sequenced and published in late 2000. A gene family is a set of genes defined by presumed homology, i.e. evidence that the genes evolved from a common ancestral gene.The purpose of this study is to improve crop breeding and genetics. Drought stress and disease-resistance are main reseach areas. We analyzed the drought stress (LEA) and R gene family in rice. We also improved resistance gene analogues (RGA) polymorphism markers and developed the intron length polymorphisms marker. In addition, the long - and short-noncoding sequence were studied. Therefore, using bioinformatics methods and the combination of rice and Arabidopsis gene sequences, we conducted the following studies:(1) LEA gene familyA total of 34 rice LEA {OsLEA) genes were identified, of which 25 were new. We also identified four OsLEA genes with alternative splices by alignment of full-length cDNA. The OsLEA genes are distributed on the rice chromosomes except that chromosome 10 and 12. Two independent conversion events were observed. Microarray analysis indicated that most of OsLEAs are regulated by different stress treatments. Expression analysis of 15 OsLEA genes with the method of semiquantitative reverse transcription (sqRT)-PCR revealed that the expressions of OsLEA genes are very diverse, some are consititutive, some are regulated and some appear to be related to stress tolerance. Motifs CACGTA and Motifs CACGCACG showed a clear overrepresentation in the upstream region when we searched for conserved DNA elements in the 1,000 bp upstream regions of the ABA-induced and drought-induced LEA genes.(2) R gene family and RGA markersBy scanning the whole genomic sequence of japonica rice using 45 known plant disease resistance (R) genes, we identified 2,119 resistance gene homologies or analogs (RGAs) and verified that RGAs are not randomly distributed but tend to cluster in the rice genome. The RGAs were classified into 21 families according to their functional domain based on Hidden Markov model (HMM). By comparing the RGAs of japonica rice with the whole genomic sequence of indica rice, we found 702 RGAs allelic between the two subspecies and revealed that 671 (95.6%) of them have length difference (InDels) in their genomic sequences (including coding and non-coding regions) between the two subspecies, suggesting that RGAs are highly polymorphic loci between the two subspecies in rice. We also exploited 402 PCR-based and co-dominant candidate RGA markers by designing primer pairs on the regions flanking the INDELs and validating them via e-PCR. The length differences of the candidate RGA markers between the two subspecies are from 1 to 742 bp, with an average of 10.26 bp.(3) Intron length polymorphisms markerIn this study, we performed a genome-wide search of ILPs between two subspecies (indica and japonica) in rice using the draft genomic sequences of cultivars 93-11 (indica) and Nipponbare (japonica) and 32,127 full-length cDNA sequences of Nipponbare obtained from public databases. We identified 13,308 putative ILPs. Based on these putative ILPs, we developed 5811 candidate ILP markers via e-PCR with primers designed in flanking exons. We further conducted experiment to verify the candidate ILP markers.(4) Conserved Noncoding Elements (CNEs)We identify 436 in plants by alignments of three species (Arabidopsis, rice and Poplus). By searching all CNEs against each other to identify the paralogoues CNEs in Arabidopsis, we find 7,972 pCNEs. We assume that functional specificity of proteins associated with CNEs is assumed to be conserved among orthologs and paralogs. The results indicate that most enriched genes flanking in CNEs are associated with the transcription factors. The enriched transcription factors mainly comprise myb family transcription factor and zinc finger protein.(5) Transcription factor binding sites (TFBSs)Seven hundred and eighty seven co-expressed genes are identified by Pearson correlation approaches and calibrated against the GO database. For each target gene, we identified TFBSs in proximal promotes Based on step-wise regression method. We systematically identified the individual and combination of TFBSs that controls gene transcription and expression. Using Pearson correlation r > 0.95, we identified 279 genes in the root, 172 in flower, 129 in pollen and 207 in the seed. The number of TFBSs types is different among the different tissues, but we observe a large difference in total number of TFBSs among the tissues.(6) DNA methylationExcept for TFBSs, a gene expression is also regulated by DNA methylation. DNA methylation is involved in various biological processes including tissue-specific gene expression, genomic imprinting. We presented a computational prediction the DNA methylation of Arabidopsis. We used several different discrimant methods to classify the methylation and non-methylation regions. The results showed that the classifier LMT method has a prediction accuracy of 71.03 % based on the experimental verified methylation data of Arabidopsis.
Keywords/Search Tags:Rice genome sequence, Resistance gene, Resistance gene analog (RGA), Polymorphism, Molecular marker, Intron length polymorphism (ILP), Conserved Noncoding Elements (CNEs), Transcriptional regulation binding sites (TFBSs), Cis-regulation modules (CRMs)
PDF Full Text Request
Related items