Font Size: a A A

Clustering of mRNA-Seq Data for Detection of Alternative Splicing Pattern

Posted on:2018-03-30Degree:Ph.DType:Dissertation
University:University of California, BerkeleyCandidate:Johnson, Marla KayFull Text:PDF
GTID:1478390020457460Subject:Biostatistics
Abstract/Summary:
Whereas prior methods of studying expression in a cell returned only estimates of gene expression, sequencing of mRNA can provide estimates of the amount of individual isoforms within the cell. As a result, many standard statistical methods commonly used for analyzing gene expression levels need to be modified in order to take advantage of this additional information. Many methods have been developed to study differential isoform expression between known groups but little research has been done utilizing methods of unsupervised learning, such as clustering. One novel question is whether we can find clusters of samples that are distinguishable not by their gene expression but by their isoform usage. That is, instead of using clustering to find groups with shared changes in gene expression, we want to utilize clustering to find groups with shared changes in isoform usage. Here, we propose a novel approach to clustering mRNA-Seq data that identifies such clusters. In order to utilize both gene and isoform information when clustering, we treat the sequencing data as a vector denoting the relative isoform usage of each isoform in a gene. In simulated data, we show that clustering using relative isoform usage values rather than isoform counts is more sensitive to finding clusters based on changes in isoform usage. In a real data set, we demonstrate its performance in finding a technical artifact that resulted in different batches having different isoform usage patterns. Additionally, we also illustrate its usage on several TCGA data sets. Specifically, we looked at whether groups determined from clustering on relative isoform usage were associated with tumor stage or splicing mutations.
Keywords/Search Tags:Clustering, Isoform usage, Gene expression, Data, Methods
Related items