Font Size: a A A

Variable Selection Methods Of Machine Learning And Their Application In Pattern Recognition Problems

Posted on:2017-10-13Degree:MasterType:Thesis
Country:ChinaCandidate:G Q ZhengFull Text:PDF
GTID:2348330515467023Subject:Chemical Process Equipment
Abstract/Summary:PDF Full Text Request
The fast development of technology make it important for the application of machine learning algorithms in the pattern recognition in different kinds of engineering and scientific fileds because of the more and more available data.For example,the text recognition in the archaeology,the face and fingerprint recognition in the social and security sciencies and the subtype classification in the histology,and so on.The well-data-poor-information situation,however,lead to another problem,variable selection.That is how to select highly informative variables to improve the efficient and accuracy of machine learning algorithms.Different variable selection methods are proposed to delect many irrelevant and redundant variables to improve the performance of pattern identification.In this thesis,we used the subtype histology of non-small cell lung cancer(NSCLC)and the recognition of snoRNAs in human cells as our background.Different variable selection methods were developed to improve the classification accuracy.With the rapid development of biological technology,abundant biological data have been abtained by high-throughput techniques.It is a task of top priority that how to study biological data and solve the biological pattern recognition problem by learning algorithm.Lung adenocarcinoma(ADC)and squamous cell carcinoma(SCC)are the two major histological types of non-small cell lung cancer(NSCLC),constituting 58.8% and 31.2% of NSCLC respectively(http://seer.cancer.gov/).For a number of clinical and biological reasons,the accurate classification of non-small cell lung carcinoma(NSCLC)into adenocarcinoma(ADC)and squamous cell carcinoma(SCC)is essential.We developed a molecular classifier of NSCLC based on genome-wide mRNA expression,copy number variations(CNVs)and methylation levels by the intergration of elastic net,partial least squares,and Na?ve Bayes classifier.The recognition of snoRNAs has an important significance for understanding life activities of snoRNAs and other RNA.first a comprehensive set of features(i.e.local contiguous structure features,the Z-curve features and so on)were employed to achieve high performance,then importan features were selected from them by EN algorithm,finally,sparse partial least squares discriminant analysis(SPLS-DA)algorithm was used to recognize the snoRNA sequences.Compared with the published results,the methods in this study have a great advantage in either speed or accuracy.
Keywords/Search Tags:variable selection, pattern identification, machine learning, NSCLC, snoRNAs
PDF Full Text Request
Related items