Font Size: a A A

Dimensionality Reduction On LC-MS Dataset

Posted on:2008-02-29Degree:MasterType:Thesis
Country:ChinaCandidate:L LiuFull Text:PDF
GTID:2178360242960207Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
This paper lays stress on biomarker discovery in Cervical Cancer dataset, an LC-MS dataset with an extremely high dimensionality. The aim of the project is to reduce the dimensionality of the dataset so as to mine the proteins or peptides corresponding to the biomarker whose detection indicates a particular disease state, i.e. cancerous or noncancerous.Two main branches of dimensionality reduction techniques are adopted, i.e. feature selection and feature extraction. We integrated a number of up-to-date feature selection algorithms, e.g. CLaNC, One_by_One, T-test, Gram-Schmidt and the like, whilst almost all the feature extraction techniques are combined as well, including PCA, Autoencoder network, etc. Nearest neighbor classification algorithm and cross validation serve as the classifier and data splitting method between training set and test set respectively. Most of the algorithms are implemented in MatLab, together with a complementary tool WEKA. The experiments are carried out under the direction of two methodologies, i.e. performing feature selection BEFORE and AFTER cross validation respectively.After analyzing and comparing the results generated by different algorithms, we are able to summarize the following: Although the 391th feature in Group_I is not the perfect feature which is expected to discriminate the cancerous instances from that of noncancerous with zero classification error, we still may safely draw the conclusion that it is the candidate feature which corresponds to the biomarker. Hence, we suggest, in this paper, that the 391th feature be interpreted to the corresponding protein and be tested by other new clinical cases on the validity whether it is a biomarker in assisting the doctor to diagnose cervical cancer.
Keywords/Search Tags:biomarker discovery, biomarker detection, dimensionality reduction, feature selection, feature extraction, Liquid Chromatography-Mass Spectrometry, the curse of dimensionality, cervical cancer classification
PDF Full Text Request
Related items