Research And Application Of Max-Correlation And Mix-Redundancy Unsupervised Feature Selection

Posted on:2011-11-13

Degree:Master

Type:Thesis

Country:China

Candidate:R Y Liu

Full Text:PDF

GTID:2178330332464799

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

The research and the application of unsupervised feature selection has became a attention issue and plays an important role on processing unlabeled data as the appearance of many unlabel datasets.The thesis does an elementary research on unsupervised feature selection, and then makes a further investigation on unsupervised feature selection based on filter model. The challenge of unsupervised feature selection based on filter model is how to define redundancy and iirelevent features [1]. According to the two challenges and the state of art of unsupervised feature selection based on filter model, ther are two disadvantages on current unsupervised feature selection based on filter model:(1) Defines redundant feature through feature exaction and feature clustering methods. But feature exaction only obtains the feature transformation and can't get the original feature subset; Feature clustering with k-means method, the uncertain value of k brings some troubles for removing redundancy features. (2) The purpose of unsupervised feature selection base on filter is removing redundante or irrelevant features, but now the existed methods only consider one aspect, removing redundant or irrelevant features.Thus, according to the two disadvantages of unsupervised feature selection based on filter, the thesis proposes two methods of removing redundant features from statistics and ensemble clustering methods, and proposes two unsupervised feature selection algorithm base on Laplasian Score, that is LS-CORR(Laplasian Score and Correlation) and LS-EC(Laplasican Score and Ensemble Clustering) Experiences in standard UCI datasets and manual dataset demonstrate that LS-CORR and LS-EC can well process the redundant and irrelevant features of datasets, and obtain a more small feature subsets, also can improve the accuracy of clustering. In the end, the thesis applies Laplasian Score and LS-CORR algorithms on the analysis of aroma features of tobacco to select key aroma features according to the essential attribute and distribution of data. The comparison experiences with other methods show the effective, practicality and reality of the applications.

Keywords/Search Tags:

Unsupervised feature selection, Irrilevant and Redundancy, Pearson Correlation Coefficient, Ensemble Clustering

PDF Full Text Request

Related items

1	Research On Intrusion Detection Method Based On Dual Feature Selection And Stacking
2	Research On Feature Selection Algorithm Based On Similarity
3	Research On Feature Selection Method Based On Clustering Ensemble
4	Research On Feature Selection Method Based On Feature Ensemble Clustering
5	A Course Recommender System Of MOOC Based On Collaborative Filtering Algorithm With Improved Pearson Correlation Coefficient
6	The Research And Application Of Clustering Feature Selection Methods
7	Research On Feature Selection Method Based On Differential Evolution Algorithm
8	Differentially Private Decision Tree Based On Pearson’s Correlation Coefficient
9	Research On Feature Selection Algorithm Of High-dimensional Data Based On Intelligent Optimization
10	Research On Feature Point Detection And Descriptor Extraction Based On Deep Learning