Font Size: a A A

Automated analysis of the subcellular location of proteins in NIH3T3 and CaCo2 cells using fluorescence microscope images

Posted on:2008-01-19Degree:Ph.DType:Thesis
University:Carnegie Mellon UniversityCandidate:Garcia Osuna, ElviraFull Text:PDF
GTID:2440390005466381Subject:Engineering
Abstract/Summary:
Location proteomics is concerned with the systematic analysis of the subcellular location of proteins. In this thesis, I describe the use of automated methods on a large collection of images obtained by automated microscopy of endogenous proteins randomly-tagged with a fluorescent protein in NIH 3T3 cells. Cluster analysis was performed to identify the statistically distinguishable location patterns in these images. This allowed the assignment of a location pattern to each randomly-tagged protein without specifying what patterns are possible. To choose the best feature set for this clustering, I have used a novel method that determines which features do not artificially discriminate between control wells on different plates and uses Stepwise Discriminant Analysis (SDA) to determine which features do discriminate, as much as possible, among the randomly-tagged wells. Combining this feature set with consensus clustering methods resulted in 35 clusters among the first 188 clones obtained. A new collection of 982 clones imaged in parallel with four control patterns was then obtained. Cluster analysis of this collection using cell-level features resulted in 40 clusters, suggesting that the number of distinguishable protein subcellular location families in 3T3 cells is approximately 40. This approach represents a powerful automated solution to the problem of identifying subcellular locations on a proteome-wide basis for many different cell types.;With the goal of extending automated subcellular location pattern analysis methods to polarized epithelia and tissues, a collection of 3D confocal microscope images of confluent monolayers of CaCo2 cells immunostained for various proteins was collected. CaCo2 cells are a human colon cancer cell line that is frequently used as a model system for epithelia. Preliminary analysis of this collection was carried out using field-level features that do not require cell segmentation. An accuracy of 77.4% was achieved with an SHIM classifier and 10-fold cross validation.;This thesis describes approaches to creating large-scale image collections for the purpose of analyzing subcellular location in both unpolarized and polarized cells. Supervised and unsupervised machine learning methods were used to analyze these collections. These approaches described achieve an accuracy and speed which is necessary to process high-throughput datasets.
Keywords/Search Tags:Subcellular location, Proteins, Caco2 cells, Automated, Collection, Using, Images
Related items