Font Size: a A A

A Comparison Study Of Classification Methods For High Dimensional Proteomics Data

Posted on:2016-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:X Z YangFull Text:PDF
GTID:2308330461972462Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Cancer has become a serious impact on human health and disease. Most cancers are preventable, but early stage of cancer is difficult to detect and diagnose. Thus, effective detection of cancer in earlier stage has become an important way to improve cancer cure. Proteomics provides a new way of thinking. With the rapid development of high throughput technology, the analysis of mass spectrometry data can be used to detect whether the sample has cancer.In this article, we analyzed mass spectrometry data of melanoma with commonly used classification methods in data mining and the method ROAD proposed by J.Q. Fan et al(2012). Our goal is to compare ROAD with regular classification method in high dimensional data.In this paper we considered six classification methods:KNN classification, SVM classification, Random Forests classification, Naive Bayes, Fisher classification and ROAD approach. In the data, sample size n=205 and number of variables p= 18856. The data is noisy and contains high redundancy features. That makes the analysis of the data is very difficulty. In order to analyze the data, we did some preprocessing before the formal anaysis. We compared the methods via the misclassification rate. The results showed that for small n large p data, ROAD classification method might be a better choice.
Keywords/Search Tags:proteomic data, Data Mining, ROAD methods, misclassification rate
PDF Full Text Request
Related items