| Chapter I: Copy number variation analysis among Chinese and European pig breeds during domestication using next generation dataDuring the domestication and the formation of pig breeds, genetics, variation and selection have affected on pigs with different breeds and variaties among pig breeds in different ways. The analysis of different genome of domeatis pigs and wild boar could find the genome variations. Copy number variation(CNV) refers to copy number gain or loss with the range from one kilo bases to several million bases in genome comparing to the normal gene sequence and the relevant complex small structure variaty of chromosomes. In this study, we discovered CNV and analyzed the function of associated genes in copy number variation region(CNVR) from totally thirteen pig breeds contained forty nine pig individuals using next generation data. Meanwhile, the comparison of CNVs generated from Chinese and European domestic pigs during domestication made the study of selection of CNVs and associated genes in Chinese and European pigs during domestication possible, which could find the hereditary basis of phenotypic difference between Chinese and European domestic pigs, explain the varaiton of pig genome and provide the foundation for pig breeding. The main results of this study:1.1 CNV discovery of domestic pigs during domestication using bioinformatics methods. CNVseq and CNVnator were used to scan CNV from our own next generation sequencing data of Tongcheng pigs and the data of different pig breeds and wild boar downloaded from public data base, totally contained fourty nine pig individuals. After that, we totally found 3,131 CNVRs which were produced during domestication. Among them, 745 CNVRs were copy number gain, 2,364 CNVRs were copy number loss and 22 CNVRs existed both gain and loss. We draw the CNV map according to the genome position of CNVRs.1.2 CNVRs valadition using real-time quantitative PCR. 28 CNVRs were randomly selected from our results for QPCR validation. After that, twenty four CNVRs were coordinated to CNVRs predicted and the validation rate was 86%.1.3 The characterization analysis of CNV distribution.We calculated the count and density of repeat sequence(SINE, LINE, LTR, etc.) in CNVRs with up and downstream 10 Kb according to the genomic location of CNVRs. The results showed that the number and density of repeat sequence were higher in CNVRs than in pig genome. The results indicated that CNVs were always located around repeat sequence in genome and the repeat sequence might contribute to CNV happens.1.4 Gene ontology analysis of CNVRs associated genes. We discovered 1,266 protein coding gene in 3,131 CNVRs generated from domestication using BiorMart tool. And most of the genes were involved in cell adhesion, GTPase activity, cell junction, immunity, olfaction and MAPK signaling pathway after gene ontology analysis using DAVID tool.1.5 The different CNVRs and associated genes generated from Chinese and European domestic pigs. We found 2,278 and 1,706 CNVRs in Chinese and Eruopean pigs, respectively. Among them, 129 and 147 were unique to either Chinese or European breeds, respectively. Gene ontology analysis of genes in the CVNRs was performed. The results showed that most of gene involved in immunity process and production in Chinese pigs, and muscle development in European pigs. Chapter II: Genome-wide analysis of polyadenylation site in pig genome by transcriptome dataPolyadenylation is an important step in the process of RNA post-transcription and plays a key role in the process of mRNA transport and translation. More than one polyadenylation site(PAS) are existed in a gene, which might lead to alternative polyadenylation and produce more transcripts for a gene. And then, gene expression will be changed. Therefore, we discovered PAS at pig genome wide level using transcriptome data. We analyzed relationship between PAS and gene expression, subsequently on pig traits. The main results of this study:2.1 Pig polyadenylation sites discovery based on a large transcriptome data. All the data were obtained from our own Tongcheng and Largewhite pigs before and after PRRSV infection, and data downloaded from public data base, which totally contained twelve tissues, cell and sperm, totally 12 billion reads. Among them, 1.94 million reads contained Poly(A) or Poly(T) were uniquely mapped to the genome, which were used for PAS discovery. Finally, we totally got 28,363 PASs in pig genome.2.2 Annotation of PASs location in genome. Among 28,363 PASs, 13,033 PASs(47%) located in 7,403 genes after annotation using pig annotated file. For annotation of the rest 15,330 PASs, we predicted pig novo genes using all transcriptome data and 6,806 PASs(24%) were annotated in gene region. In conclusion, this study totally found 19,839 PASs(70%) in gene region and 8,524 PASs(30%) located in intergenic.2.3 The distribution characterization PAS in genome and different tissue. The PAS distribution was analyzed in gene and and 3’UTRs according to gene information in pig genome annotation file. The results showed that most of them have more than one PASs, which might produce more transcripts for the corresponding genes. The distance between two adjacent PASs in genes and 3’UTR showed that 45% PASs were close to each other in 1 Kb. The distance between stop codon and the first down stream PAS indicated that various PAS location in 3’UTR, and the median value is 307 nt. We also discovered PAS in liver and testis, totally 12,777 and 14,375 PASs were found. The common PASs is lower, 4,752(21%), and most PAS usage is different in each common PAS. Both of these results indicated that PAS had tissue specific characterization.2.4 The Pearson method was performed for relationship analysis of PAS and gene expression. The data of liver and terstis with high and low level of androstenone were scanned to find PAS. After that, we calculate gene expression and corresponding PAS number and covered reads per gene. The results showed that moderate positive correlation was existed between PAS number per gene and corresponding gene expression(0.4 < r < 0.6, p < 0.01), and strong positive correlation was shown between the PAS covered reads per gene and corresponding gene expression(0.6 < r < 0.8, p < 0.01).2.5 Function of PAS usage in androstenone level and the process of salmonella infection.PASs were discovered from different groups of live and testis with high and low androstenone level.The results showed that 272 PASs displayed high usage in liver of pigs with low androstenone(p<0.05,|log2FC|≥1)and associated 109 protein coding genes.Gene ontology analysis indicated that the high usage PASs associated genes were mainly involved in lipid binding,steroid and fatty acid metabolic process,steroid hormone biosynthesis and metabolism of xenobiotics by cytochrome P450(p<0.05).In testis,260 PASs were significantly different with the same criteria as well as in liver(p<0.05,|log2FC|≥1),and 163 associated genes harbored the PASs.Gene ontology analysis displayed that most of the genes were involved in spermatogenesis and cell cycle(p<0.05).After PASs discovery from salmonella infection data,38 PASs showed high usage at salmonella post-inoculation and usage of 41 PASs was higher before salmonella infection(p<0.05,|log2FC|≥1).28 and 26 pig genes were harbored the high usage of PASs before and after salmonella infection,respectively.Gene ontology analysis of different usage PAS associated genes illustrated most of the genes involved in immune response and cytokine regulation after infection,and translation before infection(P﹤0.05). |