Font Size: a A A

Research On Gene Selection And Classification Of DNA Microarray Data

Posted on:2013-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:D F HuangFull Text:PDF
GTID:2230330362972028Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
DNA microarray, also called gene chip, is most significant milestone technologydeveloped since90s’ twenty century, which could analyze the mode of gene expressionfrom different samples (time, atmosphere, etc) to obtain important information. As so far,the technology has been widely applied in gene discovery, disease diagnosis, drug discovery,toxicology research, etc, and it is believed to have a good prospect. At present, scientists aretrying to combine knowledge of many subjects to obtain the biology meaning from the geneexpression data.Considering of the limit of actual laboratory environment and budget, DNA microarraydata is characterized with high dimensional, small samples, big noise, large number ofredundant genes, how effectively and accurately to find useful information from DNAmicroarray data is one of the most important issues to be solved in the field of machinelearning and data mining.Successful feature selection and classification are key to DNA microarray data analysis.The following content is the study of gene expression data feature selection and sampleclassification, below are the main achievements of the studying:There is a new feature selection method, which is based on rough sets theorydistinguish matrix and class separability criterion. The main idea is the most outstandingfeatures can distinguish more different class samples, cluster more like samples.Putting forward the gene selection method, which is based on fuzzy similaritycoefficient, and it enables class separability criterion applied in DNA microarray dataanalysis. It avoids date discretization process and data losing when it is analyzing the fuzzysimilarity matrix and new information system based on original information system.Studying One Versus Rest support vector machine and One Versus One support vectormachine, as well as other current classify methods, analyzing the advantages anddisadvantages in terms of solving multiclass classification problems.Combining the shortest distance method and hyper-sphere minimal comprising method,putting forward classification method which is based on binary tree structure with supportvector machine, use parameters to adjust samples distribution area and minimal distancebetween different samples for regulating binary tree structure.The thesis utilizes the achievements to do gene selection and classification experiment by analyzing human brain tumor and leukemia datasets. And according to the experiment,there is a big advantage of applying the studying achievements in DNA microarray datagene selection and classification.
Keywords/Search Tags:DNA microarray, feature selection, Rough sets, distinguish matrix, SVM, multiclass classification
PDF Full Text Request
Related items