Font Size: a A A

The Research On Manifold-based Semi-supervised Feature Selection Algorithms For Gene Selection

Posted on:2016-09-21Degree:MasterType:Thesis
Country:ChinaCandidate:G Q YuanFull Text:PDF
GTID:2428330473964962Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As a data pre-process to avoid “curse of dimensionality”,feature selection aims to seek most class related features also known as the “good” features.It's actually a dimension reduction technique.Till now,feature selection has been widely applied in many fields,such as computer vision,image process,text mining,machine learning and gene expression profile analysis.Gene selection is the application of feature selection in gene expression profile,aiming to discover the disease-causing genes for diagnosis aiding and disease treatment.Gene expression dataset contains large amount of unlabeled samples but a small number of labeled samples because it's cost consuming to label a sample.This lead to the intrinsic character of gene expression data,that is “small sample with high dimensionality”.Based on the characters of gene expression data mentioned above,we study gene selection methods in a semi-supervised way.By discovering the intrinsic information with both labeled and unlabeled data,the accuracy of classification or clustering is expected to be improved.This study aims to aid medical diagnosis.The main research work is listed as follows:Firstly,do a thorough research on both the manifold-based semi-supervised dimension reduction framework and various feature selection methods,then summarize the common way to do feature selection in a semi-supervised way,especially the exact procedures to depict the underlying manifold with graph embedding in a semi-supervised way.Secondly,a semi-supervised maximum discriminative local margin based gene selection method was put up in this study,with an abbreviation of semi MM,since traditional feature selection algorithms based on maximum local margin criterion ignore the global geometry structure of the whole data distribution and the relationships between features and class labels,and it's a fact that the local structure is more useful for reducing dimensionalities than the global structure.semi MM was enlightened by semi-supervised manifold learning,graph theory and information theory.Moreover,a semi-supervised way to select features and do classification experiments is designed.Comparison classification experiments have been done on five publicly gene expression datasets to verity the effectiveness of semi MM.The experimental results show semi MM is robust to different kinds of datasets and can achieve a good classification accuracy.Thirdly,a manifold distance based semi-supervised feature selection was also proposed in this study,we call it MDFS for short.MDFS aims to find features with both good local preserving power and global consistency since Euclidean distance fails to maintain the global consistency when it's used on the globally nonlinear datasets.The clustering results on three publicly available gene expression datasets have shown that MDFS outperforms LSDF in most cases.This implies the distribution of gene expression datasets are different to each other and MDFS is a try to discover the intrinsic distribution of a gene expression dataset.
Keywords/Search Tags:Gene selection, Semi-Supervised manifold learning, Maximum Local Margin, Gene expression dataset, Manifold distance
PDF Full Text Request
Related items