Font Size: a A A

Research On Feature Selection Method Based On Three-way Decisions Theory And Feature Clustering

Posted on:2018-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:M YangFull Text:PDF
GTID:2348330569986440Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Over the past decade,computer network technology and storage technology is developing rapidly,making the cost of collecting data in all walks of life is getting lower and lower.As a result,the size of the collected data samples and the size of the data features are showing an exponential growth trend.When the features of the data set continue to increase,it often causes dimension disaster.In this case,if some methods of data mining,machine learning and others are used to process data directly,there may be some unsatisfactory situations,such as being unable to process data,time-consuming and so on.Feature selection is a research focus in machine learning and data mining.It is often used as a method of data preprocessing,and it can select an effective and representative feature subset from the data,reduce the data dimension,and speed up the subsequent data processing.The researchers have done a lot of research on feature selection.Feature clustering,as an unsupervised feature selection method,is a hotspot in the study of feature selection methods.In this thesis,a new feature selection method based on feature clustering is proposed,which fully considers the redundancy between features and can deal with mixed data.In addition,an incremental method is proposed to solve the feature selection problem of dynamic data.The main contents are as follows:1.A new feature selection method based on feature clustering is studied.Firstly,according to three-way decisions clustering theory,redundant features in the original feature space are preliminarily divided into several feature subspace.Then,based on spectral clustering theory,the maximum neighborhood mutual information spanning tree is constructed and divided in each feature subspace to obtain new feature clusters.The representative features selected from new feature clusters have lower redundancy.Taking the correlation between residual features and category feature as the heuristic information,the wrapper is used to select the feature iteratively,and the feature subset with the lowest classification error rate is obtained.Experiments were carried out using 10 UCI data sets,the results show that the algorithm can select better feature subsets,and obtain better classification accuracy than the original feature set and the feature subset selected by several contrast algorithms.2.An incremental feature selection method is studied.In order to solve the feature selection problem of incremental data,the possibility of incremental feature selection based on three-way decisions and feature clustering is discussed.In the initial stage of rapid division,the method of incrementally updating the acquired division threshold is analyzed in the case where the incremental data does not change the original data distribution.Based on spectral clustering stage,the method of updating the neighborhood mutual information matrix of each feature subspace in different cases is discussed.Finally,an incremental feature selection algorithm is proposed to reduce the computational time and ensure the classification accuracy.The experimental results obtained on 16 UCI datasets verify the validity,feasibility and applicability of the incremental feature selection algorithm.
Keywords/Search Tags:feature selection, three-way decisions, feature clustering, neighborhood mutual information, incremental learning
PDF Full Text Request
Related items