Font Size: a A A

Research Of Application Strategy Of Feature Selection For Gene Expression Profiling

Posted on:2017-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:G M LiFull Text:PDF
GTID:2310330488988611Subject:An epidemic of health statistics
Abstract/Summary:PDF Full Text Request
Background: Along with the continuous development of molecular biology technology and gene microarray,through quantitative measurement,we could easily acquire large quantities of gene expression data;Gene expression data feature high dimensionality,small sampling,and a lot of noise.Based on the features of gene expression data,some research scholars have carried out effective data mining through the adoption of statistical analysis and pattern identification.Currently,the most successful method is to reduce the dimensionality of expression data and pick out representative information genes.By ensuring the higher level of classification accuracy,it also strives for improving the performance and efficiency of the learning algorithms,so as to efficiently identif y the types of tumors.By eliminating irrelevant and redundant features,feature selection has successfully reduced the number of features,improved the accuracy of the model,and reduced the run time.Currently,despite the fact that there are large quant ities of feature selection algorithms,it is still difficult to choose the optimized algorithm based on the characteristics of gene expression data structures.Therefore,the article has combined the knowledge related to biology and pattern identification,made comparison among some common feature selection algorithms,and finally proposed a reliable section criteria.Method: Through the simulation of data sets with different feature number,sample size,classification status and noise volume,the article has chosen 8 feature selection algorithms and tested them by using 3 kinds of classifiers.It has evaluated the advantages and disadvantages corresponding to each method by applying classification accuracy and computation complexity as the indexes,and finally applied them in real data sets,while making relevant analysis and comparison on the results corresponding to each method with an aim at choosing the best feature selection method.Result: by separately adopting three types of feature selection approaches during analysis towards different kinds of expression profile data sets,all of them could effectively reduce the dimensions of feature.With comparation and analysis,we have found that: SVM-RFE algorithm produces satisfactory results in case of fewer features and smaller sample capacity;Wrapper SVM algorithm has better performance of classification in case of smaller sample capacity and more PCR feature genes;Wrapper k-NN algorithm features better practicability towards data sets with fewer features and more FCR feature genes;ReliefF algorithm is capable of rapidly acquiring feature subsets from high-dimensional data sets,and features better performance in case of more features and larger sample capacity;mRMR algorithm is also applicable for the cases with large feature quantities,and has excellent performance in circumstances with higher signal-to-noise ratio.Conclusion: in case of dividing tumors into different categories and further digging knowledges with biological significance,screening of information genes is a crucial step.By eliminating genes that are irrelevant to classification,it can further reduce the size of data or reduce the dimensions of data,so as to improve the performance of the classifier.Based on the existing large sum of feature selection algorithms,the article has selected some common feature selection algorithms for comparison,and carried out the comparative study towards different features of expression profile data,and finally established the method application strategy with an aim at providing methodological instruction for the analysis on gene expression profile data.
Keywords/Search Tags:Gene expression profiling, Feature gene selection, Tumor classification, Filter, Wrapper, Embedded
PDF Full Text Request
Related items