Font Size: a A A

Data Mining And Classification Of Tumor Based On Gene Expression Profile

Posted on:2016-09-18Degree:MasterType:Thesis
Country:ChinaCandidate:Z P GuoFull Text:PDF
GTID:2308330476454908Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
There are many kinds of tumor, the pathogenic mechanism of tumor is complex, and it’s one of the major Malignant disease which do harm to the health of human. Early diagnosis is essential for treatment of patients. The traditional method to diagnose tumor has its own limitations, but gene chip technology is a breakthrough in functional genomics, the Gene Expression Profile data greatly promote the development of functional genomics. Through the massive Gene Expression Data of tumor, we not only can find the relative feature information, but also know the nature of tumor genes, deeply understand the relationship between tumor and gene. It’s benefit for the diagnosis and treatment of tumor.Due to the Gene Expression Profile data with high dimension, small sample, high redundancy and high noise, this paper used the tool of time-frequency analysis for extracting the feature of tumor gene expression profiles to carry out the following research:Firstly, decompose the each sample of Gene Expression Profile by three-layer wavelet packet, compute the energy related to frequency bands and time as a feature vector, then an EMD-based method for cancer classification of Gene Expression Profile was proposed. Using EMD theory, the each sample of Gene Expression Profile is decomposed into many intrinsic modal functions(IMF) which can significantly represent potential information of original time serial, then compute the correlation coefficient between the IMF and origin data, choose the IMF which has a bigger correlation coefficient with origin data and reconstruct the data. Finally, using T-test to choose the feature vector with a strong classification features. An FRFT-based method for cancer classification of Gene Expression Profile was proposed. Utilizing the FRFT theory, the each sample of Gene Expression Profile is transformed into FRFT domain. Adjust the optimal fractional order and analyze the global feature of Gene Expression Profile data on it. Then calculate the entropy of FRFT coefficient and choose the 300 top-ranked FRFT coefficient set by entropy weight as a feature vector. Finally, the Leukemia dataset from MIT database and the colon dataset from Princeton University database was classified with the proposed method respectively, reaching an accuracy of above 90%, which improves the significance of the proposed method in clinical and tumor classification.
Keywords/Search Tags:Gene Expression, Data Mining, Wavelet Package Transform, EMD, FRFT
PDF Full Text Request
Related items