Font Size: a A A

Cancer Microarray Data Classification Based On Rough Sets Methods

Posted on:2013-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:L YanFull Text:PDF
GTID:2214330374956473Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
Cancer is one of the major diseases that endanger human health and even lives in the world today. The scientifically proven cancer early detection of great help in the treatment of cancer and slow down cancer cell proliferation. The development of microarray technology, making the diagnosis of cancer to shift from morphological to molecular level to provide a fast and accurate detection methods for clinical diagnosis. However, microarray data with small sample sizes, large dimensions, a high degree of correlation characteristics between the gene, thus easily lead to low classification accuracy of the classifier, and poor generalization ability. Therefore, how to design a better classification mechanism to overcome the above problems, has become the hot field of today's gene microarray data classification research.Rough set theory is a way to extract the data classification information tool, it does not need to provide users with prior knowledge and additional information, classified information to maximize the retention data sets can achieve feature dimension reduction, while at the same time the information the uncertainty can give a good measure, and use the rough set method to build the classifier has better explanatory than the general classification. Therefore, the use of rough classification to analyze gene expression data has good prospects and the incomparable advantages.On the basis of summing up the results of previous studies carried out research on gene microarray data with rough classifier, mainly in the following areas:1, The classifier information are described. In the framework of rough set theory, a single data set given object to portray the property contains the classification information and use of the property to depict the object contains the classification information.2, The rough set method in two basic reduction strategy-the strategy based on discernibility matrix and by heuristic-based strategy. A brief analysis of the pros and cons of the two strategies will be the advantages of integration, design of a sparse data, extract data set rules to overcome the rough set approach in dealing with sparse high-dimensional characteristics of the gene microarray data training data for a long time, the shortcomings of low generalization ability. Experiments on UCI data sets can be seen that the proposed algorithm is simple compared to the division-based strategies in generalization ability has increased, the increase in time than the strategy based on discernibility matrix. After such modifications, applications and microarray data classification.3, Designed to reduce the dimensionality of a microarray data, and designed a gene mieroarray data, rough classifier. Through the use of principal component analysis to extract the gene microarray data is the main expression of the direction of the gene mainly expressed in the direction of the projection as the center divide, the use of rule extraction method to extract rules to form a rough classifier. The rough classifier to detect two sets of commonly used gene expression data. The experimental results show that the algorithm is effective.This thesis focuses on the characteristics of microarray data, rough set method to solve the sparse high dimensional data classification, gene microarray data dimensionality reduction method. The results obtained in this article provided a model for the use of rough sets to solve the classification of microarray data.
Keywords/Search Tags:bioinformatics, microarray, gene expression data, rough set, classifier
PDF Full Text Request
Related items