Font Size: a A A

Apa Gene Clustering Study Based On Canonical Correlation Analysis

Posted on:2019-06-03Degree:MasterType:Thesis
Country:ChinaCandidate:Q M LinFull Text:PDF
GTID:2370330545483727Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
Polyadenylation(poly(A)),is a necessary process for mRNA maturation and plays an important role in the regulation of gene expression in eukaryotes.The poly(A)site is where cleavage occurs during polyadenylation,and the choice of different poly(A)sites in the gene is called Alternative polyadenylation(APA).Genes produce different transcript isoforms through APA,increasing the complexity and diversity of the transcriptome and proteome.With the rapid development and application of sequencing technology,more and more poly(A)site data have been generated,and clustering has become one of the common and important computational methods to analyze high through-put poly(A)site data.Clustering of APA genes has become a powerful approach to explore gene expression under APA regulation,identify groups of co-expressed genes,analyze differences in gene expression and predict the function of unknown genes.In the present APA gene clustering studies,gene expression levels are determined by summing up reads of all poly(A)sites in agiven gene.In this paper,we study the clustering of APA genes with considering the APA specificity,and propose an analysis method based on canonical correlation analysis(CCA)and hierarchical clustering.The analysis method mainly includes three steps:first,consider the distribution and abundance of APA sites in each gene,and quantify the correlation between APA genesby using CCA;secondly,use hierarchical clustering to divide the gene sets with significant correlation and identify the key genes in different gene sets;thirdly,evaluate the homology of gene sets and validate the clustering results.At the same time,a parallel framework design was adopted and an R software package PAcluster was developed for the relevant biologists and researchers on the website.In this paper,the proposed method is mainly applied to the analysis of poly(A)site dataset in rice(Oryza sativa japonica MSU7),and we comprehensively compare the clustering results obtained from this proposed method and the clustering results based on Pearson Correlation Coefficients and Minkowski distance.The result shows that this proposed method resultes in a more homogenous gene set and has significantly improved the performance of clustering.Moreover,the proposed method has better robustness.In addition,the R software package PAcluster is easy-to-use and has short calculation time.It can be free downloaded from http://bmi.xmu.edu.cn/software/.The analysis method of APA gene clustering proposed in this paper is conducive to studying APA-regulated gene expression for biologists.The relevant method and R package have been published in the JCR international journal(J Bioinform Comput Biol).
Keywords/Search Tags:Clustering, Canonical Correlation Analysis, Alternative polyadenylation
PDF Full Text Request
Related items