Research On Feature Selection Method Based On Three-way Decisions Theory And Feature Clustering

Posted on:2018-04-06

Degree:Master

Type:Thesis

Country:China

Candidate:M Yang

Full Text:PDF

GTID:2348330569986440

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Over the past decade,computer network technology and storage technology is developing rapidly,making the cost of collecting data in all walks of life is getting lower and lower.As a result,the size of the collected data samples and the size of the data features are showing an exponential growth trend.When the features of the data set continue to increase,it often causes dimension disaster.In this case,if some methods of data mining,machine learning and others are used to process data directly,there may be some unsatisfactory situations,such as being unable to process data,time-consuming and so on.Feature selection is a research focus in machine learning and data mining.It is often used as a method of data preprocessing,and it can select an effective and representative feature subset from the data,reduce the data dimension,and speed up the subsequent data processing.The researchers have done a lot of research on feature selection.Feature clustering,as an unsupervised feature selection method,is a hotspot in the study of feature selection methods.In this thesis,a new feature selection method based on feature clustering is proposed,which fully considers the redundancy between features and can deal with mixed data.In addition,an incremental method is proposed to solve the feature selection problem of dynamic data.The main contents are as follows:1.A new feature selection method based on feature clustering is studied.Firstly,according to three-way decisions clustering theory,redundant features in the original feature space are preliminarily divided into several feature subspace.Then,based on spectral clustering theory,the maximum neighborhood mutual information spanning tree is constructed and divided in each feature subspace to obtain new feature clusters.The representative features selected from new feature clusters have lower redundancy.Taking the correlation between residual features and category feature as the heuristic information,the wrapper is used to select the feature iteratively,and the feature subset with the lowest classification error rate is obtained.Experiments were carried out using 10 UCI data sets,the results show that the algorithm can select better feature subsets,and obtain better classification accuracy than the original feature set and the feature subset selected by several contrast algorithms.2.An incremental feature selection method is studied.In order to solve the feature selection problem of incremental data,the possibility of incremental feature selection based on three-way decisions and feature clustering is discussed.In the initial stage of rapid division,the method of incrementally updating the acquired division threshold is analyzed in the case where the incremental data does not change the original data distribution.Based on spectral clustering stage,the method of updating the neighborhood mutual information matrix of each feature subspace in different cases is discussed.Finally,an incremental feature selection algorithm is proposed to reduce the computational time and ensure the classification accuracy.The experimental results obtained on 16 UCI datasets verify the validity,feasibility and applicability of the incremental feature selection algorithm.

Keywords/Search Tags:

feature selection, three-way decisions, feature clustering, neighborhood mutual information, incremental learning

PDF Full Text Request

Related items

1	Research On Incremental Clustering Algorithm Based On Feature Selection
2	Application Of Feature Selection And Incremental Learning Of Neighborhood Rough Sets In Image Classification
3	A Study On Feature Selection Algorithms Using Information Entropy
4	Research On The Gene Selection Based On Neighborhood Mutual Information
5	Feature Selection Of Information Systems Based On Neighborhood Toleranc Rough Sets
6	Research On Dynamic Feature Selection Algorithm Based On Mutual Information
7	Research On Feature Selection Algorithm Based On Mutual Information
8	Research On Robust Fuzzy Clustering Algorithm Based On Feature Selection
9	Research On Feature Selection Algorithm Based On Lasso And Mutual Information
10	The Research Of Multi-label Feature Selection Based On Mutual Information And Feature Label Relationship