Font Size: a A A

Data Analysis For High Content RNA Interference Screening: Pattern Recognition Approaches For Certain Systems Biology Application

Posted on:2010-03-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z YinFull Text:PDF
GTID:1118360302483075Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Cybernetics, systems theory and pattern recognition theory and methodologies are broadly applied to interdesciplinary research. In the context of different applications, cybernetics and systems theory can help dissecting various research topics while pattern recognition technologies form the workflow of solving specific problems. In this thesis, cybernetics, systems theory and pattern recognition theory and methodologies were applied to systems biology research. Specifically, in the context of large scale high content RNAi screening (RNAi HCS) aiming at constructing local regulatory network for Drosophila cell shape change, a series of challenges confronting RNAi HCS data analysis were analyzed and solved. We proposed original solutions for online phenotype discovery, online modeling and validation of novel phenotypes, feature selection, cell classification and modeling of gene functions based on single cell morphology profile. The proposed methods were combined into a complete data analysis workflow, and handled a dataset of more than 2 million single cells. Based on the analysis results from real dataset, we helped biologists propose a biological hypothesis regarding the canalization of cell morphology.At present, most RNAi HCS data analysis workflows utilize typical phenotypes and cells identified from expert ground truth labeling as basis of gene function research. However, the growing size of dataset makes it infeasible to cover the property of whole dataset using manually picked training set. We improved gap statistics, a cluster number estimation and cluster validation method; designed iterative phenotype merging, a strategy comparing newly generated dataset and existing phenotypes; used Gaussian mixture model to describe each phenotype and applied minimum classification error method to do online model update; we combined these components and proposed an original online phenotype discovery workflow to discover, model and validate novel morphological dataset as the dataset extended.In order to compare cell morphology with typical phenotypes, we combined "Support vector machine-Recursive feature elimination (SVM-RFE)" and "Genetic Algorithm based on SVM" to form a feature selection scheme. Using the informative feature subsets and SVM with Gaussian Radial Basis function as kernel functions, we quantified the similarity of morphology between single cell and typical phenotypes. Based on the cell classification results, we carried out a series of quality control, statistical analysis, data filtering and consolidation; picked up a group of significantly repeatable cell population to represent the result of RNAi treatment targeting each gene; the quantitative morphology signatures for each single gene are generated based on those cell populations, and we used cluster analysis on those signatures to identify gene families with different functions in regulating cell shape change.Guided by cybernetics and systems theory, the whole data analysis workflow implemented various state-of-the-art technologies of pattern recognition and statistical analysis, and showed the capability of automatic data analysis in large scale RNAi HCS. We combined dynamic and static analysis and realized online phenotype discovery, modeling and validation; the relationships between information from micro- and macro- level phenomena were checked and single cell morphology profile were utilized to model gene function; the data analysis results on specific project contributed to the understanding of the general law underlying cell morphology change, and we proposed and validated the hypothesis regarding canalization property of cell morphology based on our data analysis using real RNAi HCS dataset.
Keywords/Search Tags:systems biology, online phenotype discovery, cluster analysis, gap statistics, support vector machine, feature selection, cell cycle, canalization
PDF Full Text Request
Related items