Font Size: a A A

Studies On Methods Of Consensus Data Modeling

Posted on:2007-09-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z Q SuFull Text:PDF
GTID:1101360212499141Subject:Analytical Chemistry
Abstract/Summary:PDF Full Text Request
Modeling of analytical data is a common task in chemometrics. There are two types of problems in the modeling of analytical data, namely regression (or calibration) and pattern recognition. Because a single model is inherently susceptible to the difficulties associated with data quality and sample number. In this dissertation, consenesus strategy was used in the modeling of NIR spectroscopy and microarray data, and the theories and application of consensus modeling were investigated, including the following works:1. The basic theories and frequently used methods for the modeling of analytical data were reviewed, and the basic theories, modeling methods and application of consensus modeling were summarized as an emphasis.2. Based on random resampling, a partial least squares-based consensus regression method cPLS was proposed. In cPLS, other than selecting one PLS model on the basis of the best fit, several PLS models satisfying a predefined criterion were selected and combined into one cPLS. The effectiveness of cPLS was demonstrated by comparing the prediction results to those from the regular PLS in an application for the calibration of the NIR spectra of corn samples. The results suggested that combining multiple individual PLS models by cPLS could improve not only the accuracy of prediction, but also the robustness of the model.3. Combination of local modeling with consensus modeling, a consensus dynamic local partial least squares, CDL-PLS, was proposed. Unlike a regular PLS and many consensus methods reported in the literatures which used bagging or boosting to generate constituent predictors, CDL-PLS generates constituent models using a dynamic local modeling technique, which is different from bagging or boosting in that the samples used to develop constituent predictors are not randomly selected from the original training data set but according to their Euclidean distances to the predicting unknown sample. The effectiveness of CDL-PLS was demonstrated by comparing its prediction results to those of a general PLS in an application for the calibration of the near-infrared (NIR) spectral data of tobacco lamina samples. It was found that the use of dynamic local modeling technique could increase the prediction accuracy and stability of a predictor, while the combination of multiple dynamic local PLS models could further improve the prediction accuracy and robustness of a predictor.4. A new classification method CAMCUN (consensus analysis of multiple classifiers using non-repetitive variables) was developed. The central idea of CAMCUN is to combine multiple, heterogeneous classifiers, each derived with distinct features selected according to discriminatory power. CAMCUN was applied in analysis of microarray gene expression data. The analysis including classification of cancer based on gene expression profiles, assessing the chance correlation and the prediction confidence of classifiers, and identifying biomarkers. It was found that CAMCUN give much better prediction accuracy with higher prediction confidence and lower chance correlation than any of the constituent classifiers.5. By integration of disjoint principal component analysis with genetic algorithm (GA), a new feature selection method for pattern recognition was developed and applied in identification of differentially expressed genes from microarray gene expression profiles. In this method, the discriminatory power of combination of genes was obtained from disjoint PCA. GA was used to search for the best combination of genes. The significance in differential expression of individual gene was assessed by a statistic method. It was found that the differentially expressed genes identified using this method showed stronger discriminatory power than those obtained from t-test and SAM (significance analysis of microarray).
Keywords/Search Tags:Consensus
PDF Full Text Request
Related items