Font Size: a A A

Classification Of Cancer Gene Expression Data Based On Compressed Sensing

Posted on:2014-12-22Degree:MasterType:Thesis
Country:ChinaCandidate:J J LuFull Text:PDF
GTID:2254330401456239Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
DNA microarray technology is applied to the study of neoplasticdisease with a large number of high dimensionality, small sample size of cancer geneexpression data. It is very important to promote the clinical diagnosis and treatmentof malignant tumors. However, how to find the feature genes is very critical, thatcould benefit to cancer classification or recognition with minimum redundancy frommassive gene expression data, This research could dig out the useful knowledge andinformation, understand cancer gene essentially more comprehensively and get a truereflection of the relationship between cancer-genes. In this paper the theory ofcompressed sensing is used for classification of cancer gene expression data. Theclassification problem of cancer gene expression data is considered to represent thetest samples with the sparse representation of the training samples. Through solvingthe cancer gene data reconstruction and computing the reconfiguration residual todetermine the category of the test samples. This classification does not requirerepeated training to build the classifier, as long as the test samples in the trainingsample projection is sparse enough. As a result it will be able to obtain betterclassification results and consume less time. Follow the main contents:1. The dimension reduction research in high-dimensional cancer gene data.Reducing the dimension of cancer gene data based on the signal-to-noise ratio (SNR),principal component analysis, Relief filtration and Fisher criterion methods. Andthen use the compressed sensing method to classify the data which is dimensionalityreduction in order to assess the effect of dimensionality reduction. Studies haveshown that the characteristics of the principal component analysis is more conduciveto solving the sparse solution, the classification accuracy is relatively high.2. Cancer gene expression data reconstruction algorithm. Using the completedictionary of training samples and signal reconstruction algorithm to find the dilutesolutions of the test samples in complete dictionary, and then calculate the residuals,the items with the smallest is the category of the test sample. In the signalreconstruction algorithm, we use L1norm minimization to solve the problem. Theresult of new method is compared to Bagging neural network, SVM and ELM recognition, the experiments show that even in the relatively poor classificationBrain data sets, can also achieve an average of80%classification accuracy.3. Reconstruction algorithm optimized for speed. In the signal reconstructionalgorithm, we use orthogonal matching algorithm to solve the problem. Theclassification accuracy is similar to the method of L1norm minimization, and savesabout50%of the computing time. Thus significantly improve the speed of cancergene expression data reconstruction, and is conducive to applied to the device of lowcomputing power or high computational speed.
Keywords/Search Tags:Gene expression data, Compressed Sensing, Sparse Vector, Residual, L1-minimization, Orthogonal matching pursuit
PDF Full Text Request
Related items