Font Size: a A A

Study Of Tumor Classification Based Oil Gene Expression Level

Posted on:2013-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:X M ChenFull Text:PDF
GTID:2218330374962877Subject:Biological Information Science and Technology
Abstract/Summary:PDF Full Text Request
While early detection is the key to curing cancers, tumors classification is one ofthe major challenges. As traditional pathological diagnosis is restricted in efficiency,the use of gene chip greatly improves the efficiency of data acquisition, whichbecomes a hot topic. Whereas the data from gene chip are mostly high dimension andhigh throughput with noise, the key for the detecting cancer genes is how to extractuseful biological information from numerous data.The thesis mainly studies the methods for extracting multiple features of cancergene and relevant calculation. An improvement of data analysis for studyinginteraction mechanism of genes with inter-class differences is also presented. Themain work in the thesis is as follows:1. Data preprocessing: As there are numerous unreliable data generated by genechip, the data is preprocessed by data filtering, interpolating of missing values basedon KNN and data normalization among samples.2. Multiclass feature extraction methods based on two classes are studied forcancer classification with multiple classes. The work includes:(1) A feature extractionprocess is designed for two classes based on the wavelet transform modulus maximamethod while an experimental study is also carried out for selecting parameters suchas wavelet base.(2) Based on the feature description of every two classes of differentgenes, a feature extraction method based on MSVM-RFE algorithm is studied. Anexperiment on GSE2109under GPL570platform in GEO database (with560trainingsamples and1201testing samples) shows the accuracy of cancer classification withmultiple classes is up to94.4%.3. The algorithm for mining association rules based on adjacency list andminimal generator of frequent closed itemsets is improved for studying interactionmechanism among genes. Relevant studies include:(1) The efficiency of miningalgorithm based on frequent closed itemsets is promoted by reducing insert and replace operation.(2) The loss of minimal generator items of frequent closed itemsetsis avoided by increasing combinations of the itemsets.(3) The concept and process ofrule generation algorithm are simplified, which makes the algorithm more readableand easier to implement. The improvement of the algorithm is then verified byexperimental results.
Keywords/Search Tags:Gene chip, Waveled transform, Frequent closed itemsets, Minimalgenerator, Adjacent graph
PDF Full Text Request
Related items