Font Size: a A A

Research On Key Technologies Of Semi-Supervised Feature Selection

Posted on:2017-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:L S XiaoFull Text:PDF
GTID:2308330485988567Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Feature selection is an important data processing method to high dimension data. It chooses the best feature subsets from the original data and improves the accuracy of algorithms.Since traditional feature selection methods utilizing constrained pairwise and mutual information between features usually ignore the dependency between features. Thus in this thesis, a semi-supervised feature selection algorithm based on attribute dependency is proposed firstly. It is included three aspects:Firstly, attribute dependency is analyzed to reconstruct the original data according to feature correlation. Secondly, the objective function is defined; it calculates mutual information between features and sorts features according to the score of features. Thirdly, the result of feature selection is done with K-means method, and then we analyze its performance. We perform our experiments on UCI datasets and compare with other five feature selection algorithms (UFSMI; Laplacian Score; MCFS; SPECFS; LDA). The theoretical analysis and experiment demonstrates that the proposed method can effectively improve the accuracy and efficiency of feature selection using the attribute dependency between features.Gene expression data is the focus of DNA microarray data analysis. However, due to the number of samples of the data is far less than sample dimension, feature selection algorithms used in the high-dimensional data are significant. In this thesis, we present a semi-supervised feature selection based on l2,1-norm. This method combined with the loss function and regularization method can effectively remove outliers. Taking advantage of the sparse to feature selection, it can solve the issue of complexity of high-dimensional data in the real-life. The key of data diagnosis model is to process high dimension data, then classifying and researching the data using different classifiers and comparing the performance and efficiency among classification models. We perform our experiments on gene expression sequence datasets, firstly, we analysis the influence of selecting different features to classification accuracy; secondly, we analysis classification accuracy between SVM and ELM before and after performing feature selection; thirdly, we analysis the training time complexity of classifiers. The theoretical analysis and experimental results show that, selecting the appropriate feature selection algorithms and classifiers can effectively improve the accuracy and performance of laboratory diagnosis model.
Keywords/Search Tags:Dimension Reduction, Feature Selection, Attribute Dependency, Clustering, l2.1-norm, Classifying
PDF Full Text Request
Related items