Font Size: a A A

The Application Of Machine Learning Algorithms To Data Analysis From Different Areas

Posted on:2015-09-01Degree:MasterType:Thesis
Country:ChinaCandidate:C Y ZhangFull Text:PDF
GTID:2308330452969778Subject:Chemical Process Equipment
Abstract/Summary:PDF Full Text Request
In this paper, the applications of machine learning algorithms to data analysiswere researched across two different areas, namely, cancer treatment research andquality control of rubber mixing. The applications in cancer treatment researchinclude tumor cellular radiation sensitivity prediction and cancer classification. Thedata used in both areas are characterized by abundant noise information.Simultaneously, the nonlinear and unknown relationships are enriched in data.To promote the implementation of personalized cancer treatment, based onmachine learning algorithms and gene expression data of NCI-60cancer cell lines, weproposed a new nonlinear model to predict the radiation sensitivity. In the new model,firstly, Significant Analysis of Microarrays (SAM) was used to select the genes whoseexpressions were highly related with the cellular radiation sensitivity (measured assurvival fraction of2Gy (SF2) J-ray radiation). These selected genes were calledradiation sensitivity signature genes. The dimensionality of original gene expressiondata was clearly reduced down by SAM gene selection. Then, Partial Least Squares(PLS) algorithm was employed on the expression data of these selected genes toextract orthogonal Latent Variables (LVs). Finally, with the resulting LVs as input,Support Vector Machine (SVM) regression model was developed to predict the SF2values of the cell lines used. In addition, for the radiation sensitivity significant genes,we selected three different types of cancer patients and performed survival analysis totest their clinical potential predictive role. The main biological processes they areinvolved in and functions enriched were also revealed by functional enrichmentanalysis conducted on these selected genes.In the cancer classification based on gene expression data, due to highdimensionality but small sample size and multi-collinearity of data, it is difficult forthe regular analysis methods to acquire satisfactory performance. To further improvethe accuracy of cancer classification, a new SPDF (Subspace PLS based DecisionForest) model was proposed. The SPDF model combined PLS feature extraction withdecision forest classification. Relying on the orthogonality among LVs, themulti-collinearity inherit in gene expression data was effectively overcome. Finally, the derived LVs were combined together and used as the input of decision forestalgorithm.There is a large measurement time-delay of hardness parameter in rubber mixingprocess. In order to realize the online hardness prediction and quality control in rubbermixing, the methods that predicted the hardness with rheological parameters wasproposed for the first time. The proposed methods were based on the PLS algorithmand its variants. Moreover, to deal with the non-linearity, time-varying and high noiseof rubber mix process, Q statistic was introduced to accomplish the steps of updatingmodel and selecting new samples. The Q statistic not only can select the sampleswhich contain the model variation information as much as possible, but alsosignificantly reduces the time and data storage consumed in model updating. Theapplications demonstrated that the developed models based on rheological parametercan achieve perfect prediction performance and track the changes under differentconditions.
Keywords/Search Tags:Machine learning algorithm, radiation sensitivity, cancerclassification, rubber mixing
PDF Full Text Request
Related items