Font Size: a A A

Research On Unsupervised Balanced Feature Selection

Posted on:2022-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:J Y ChenFull Text:PDF
GTID:2518306542963759Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the developing and popularization of information collection equipment technology,the difficulty of data collection has been greatly reduced in the real world.But,due to the semi-structured or even unstructured of the data,feature dimension of the sample may often reach millions and more.Therefore,feature selection methods are needed to solve the problems of dimensionality disaster.However,traditional feature selection methods prefer to select the discriminative features while ignoring the balance property of data,it will not get the correct results.So,in data mining and machine learning,an effective method is badly needed to select relevant features from features of the original data.In many real-world data mining applications,such as the energy load balance problem of wireless sensor networks,sensor nodes need to be reasonably divided into corresponding clusters,otherwise the energy consumption of the nodes will be unbalanced and the network life cycle will be short.In this regard,we need a clustering result that reflects the balanced distribution to express the energy consumption balance problem.And in many data,especially the high-dimensional data,this balanced structure may not be so obvious in the original feature space due to the noisy and redundant features.Especially in unsupervised learning,due to lack of sufficient the prior knowledge,it can't show the intrinsic data structure.To tackle these problems,a new balanced unsupervised feature selection method is proposed in this paper.Main contributions are as follows:Firstly,this paper proposed a balanced k-mean feature selection algorithm.Based on the k-means,the algorithm introduces the balanced regularization term,which can select the features that tend to produce the balanced cluster.Therefore,the features selected by our algorithm are not only discriminative,but also reflect the balance property of data,and are selected in a unified framework through seamless integration of balanced clustering and features.Then we use Alternating Direction Method of Multipliers(ADMM)to solve the objective function.Experimental results show that the proposed method is superior to other mainstream feature selection algorithms in terms of balance and accuracy on benchmark data.Secondly,in order to improve the efficiency of the algorithm and deal with the problem of non-cluster distribution of data,we proposed a balanced spectral feature selection algorithm.Here,we use the structure of spectral clustering to replace the structure of k-means.Spectral clustering is more suitable for the problem of balanced classification,and it can also deal with the data of non-cluster distribution,which is applied to more scenarios.In terms of algorithm efficiency,its time complexity is also greatly reduced.Finally,the validity and superiority of the proposed method are proved by experiments on the benchmark data set.In summary,the research in this paper is a basic problem in the field of feature selection in machine learning.By retaining balanced features in the process of unsupervised feature selection to reflect the internal balance structure of data,an embedded unsupervised feature selection algorithm is proposed.It is worth noting that the proposed algorithm is an embedded unsupervised feature selection algorithm based on balance,which is very different from the traditional algorithm.We not only guarantee the accuracy of feature subset selection,but also fully preserve the balance of feature subset.
Keywords/Search Tags:unsupervised learning, clustering, feature selection, balance, machine learning
PDF Full Text Request
Related items