Font Size: a A A

Research On Semi-Supervised Feature Selection Algorithms

Posted on:2020-11-17Degree:MasterType:Thesis
Country:ChinaCandidate:G W YuanFull Text:PDF
GTID:2428330599454642Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Data is a valuable resource,which records the characteristics of various aspects of objects.With the advent of the information age,data has become more and more importance.Mining potentially valuable information in data can improve every aspect of life.Many data mining technologies,such as clustering,classification and recommendation algorithms,have become the research hotspots.With the progress of technology,data acquisition and storage become more convenient.Various industries have stored a large amount of data,such as disease data in the field of biomedicine,image data in the field of computer vision,text data in the field of natural language processing and so on.Although more data can depict the objects more comprehensively and preserve more valuable information,but dealing with a large amount of data is a headache problem.For example,1)dimensionality of samples is high?a large number of features?and 2)large amounts of data haven't been annotated.This paper mainly aims at solving these two problems.Feature selection technology is used to improve the problem which is caused by a large number of features,and semi-supervised technology is used to improve the problem which is caused by large amounts of data lack of annotation.Based on the idea of least squares regression,three semi-supervised feature selection algorithms are proposed.This paper completes the following four innovations:Firstly,we propose a rescaled linear square regression?RLSR?to obtain a more feasible solution.In semi-supervised learning mechanism,it uses labelled samples to train model parameters firstly,then learns a label for unlabeled samples,and repeats the process until the model converges.In RLSR algorithm,a scale factor measuring feature importance is introduced for feature selection,which provides theoretical support for the calculation of feature weight.Secondly,to better control the sparsity of RLSR model,a novel model named Sparse Rescaled Linear Square Regression?SRLSR?is proposed,which use L2,p–norm as implicit regularization.A smaller p results in sparser feature weight.Especially,when p is 1,SRLSR is equivalent to RLSR.Thirdly,to increase the discriminability of the model,SDSSFS algorithm is proposed in this paper.In this algorithm,we extend?-dragging technology for a supervised task to semi-supervised task,which can enlarge the distance between different classes through learning dragging distance and direction for each sample.Finally,this paper has analysed the influence of each parameter on the performance of algorithms,and discussed the impact in classifier accuracy,when used different proportion of labelled data and selected a different number of features.In six benchmark data,we have verified the superiority of the algorithm by comparing with the feature selection algorithms proposed in recent years.
Keywords/Search Tags:Regression algorithm, Semi-supervised feature selection, Sparse feature selection, Discriminant feature selection
PDF Full Text Request
Related items