Font Size: a A A

The Analysis Of Tumor Gene Expression Data Based On Neighborhood Rough Set

Posted on:2017-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:J Y WuFull Text:PDF
GTID:2334330485965083Subject:Statistics
Abstract/Summary:PDF Full Text Request
Based on the importance of precise recognition of tumor subtypes to the precaution of benign tumor and treatment of cancer, research on the subtype classification of tumor has received much attention and is a hot topic in the biological field at present, and tumor genetic expression data can be regarded as important resources for study. Aiming at some characteristics of genetic expression data,lots of methods about selection of tumor information genes were proposed. The neighborhood rough set theory which is gradually mature in various application fields provides a useful tool to analyze this kind of data.In the information gene selection algorithm based on mutual information and extended neighborhood mutual information, the calculation of the relative information between the genetic attribute and decisive attribute is complex. By calculating the value of neighborhood mutual information to sort the genes, a discrimination function is needed to reduce the relative reduction of the pre K genes selected. But the calculation task of this process is heavy, and it is hard to pick the appropriate value of K. In order to solve these problems, we propose a neighborhood rough set attribute classification efficiency algorithm(NRSACE)in this thesis. It is well-known that information genes for tumor classification is not large, and the gene whose classification efficiency is lower than the given minimum classification efficiency control value can be deleted directly. Based on these ideas, we directly calculate the classification efficiency of genetic attribute by the NRSACE algorithm. At the same time, we sort and select them and get the relative reduction information gene set. Finally, support vector machine classifier and the K nearest neighbor classifier are used to verify the classification capacity of the selected information gene set, the validity of the NRSACE algorithm in this thesis is illustrated.In this thesis, in order to validate the proposed NRSACE algorithm, we selected four widely used data sets, respectively, DLBCL?Leukemia1?Leukemia2and SRBCT. The obtained results by data analysis show: by the adjustment of model parameter, the average of classification accuracy of 4 types of tumor subtypes are all higher than 98%, and the variances of the classification accuracies are very small. The average classification accuracy of the analytical results of SRBCT data set is obviously improved by 14%. In this thesis, the robustness of the model is tested. Randomly deleting 5% of the original data set, we analyze the remaining sample by NRSACE algorithm, the number of elements in the genetic information set and the specific elements of the synergistic selection of genetic set have little difference, and the average classification accuracy is relatively stable.All the analysis shows that our study has certain significance.
Keywords/Search Tags:neighborhood rough set, attribute classification efficiency, gene expression profile, feature gene selection, classification accuracy
PDF Full Text Request
Related items