Research On Semi-Supervised Feature Selection Algorithms

Posted on:2020-11-17

Degree:Master

Type:Thesis

Country:China

Candidate:G W Yuan

Full Text:PDF

GTID:2428330599454642

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Data is a valuable resource,which records the characteristics of various aspects of objects.With the advent of the information age,data has become more and more importance.Mining potentially valuable information in data can improve every aspect of life.Many data mining technologies,such as clustering,classification and recommendation algorithms,have become the research hotspots.With the progress of technology,data acquisition and storage become more convenient.Various industries have stored a large amount of data,such as disease data in the field of biomedicine,image data in the field of computer vision,text data in the field of natural language processing and so on.Although more data can depict the objects more comprehensively and preserve more valuable information,but dealing with a large amount of data is a headache problem.For example,1)dimensionality of samples is high?a large number of features?and 2)large amounts of data haven't been annotated.This paper mainly aims at solving these two problems.Feature selection technology is used to improve the problem which is caused by a large number of features,and semi-supervised technology is used to improve the problem which is caused by large amounts of data lack of annotation.Based on the idea of least squares regression,three semi-supervised feature selection algorithms are proposed.This paper completes the following four innovations:Firstly,we propose a rescaled linear square regression?RLSR?to obtain a more feasible solution.In semi-supervised learning mechanism,it uses labelled samples to train model parameters firstly,then learns a label for unlabeled samples,and repeats the process until the model converges.In RLSR algorithm,a scale factor measuring feature importance is introduced for feature selection,which provides theoretical support for the calculation of feature weight.Secondly,to better control the sparsity of RLSR model,a novel model named Sparse Rescaled Linear Square Regression?SRLSR?is proposed,which use L_2,p�norm as implicit regularization.A smaller p results in sparser feature weight.Especially,when p is 1,SRLSR is equivalent to RLSR.Thirdly,to increase the discriminability of the model,SDSSFS algorithm is proposed in this paper.In this algorithm,we extend?-dragging technology for a supervised task to semi-supervised task,which can enlarge the distance between different classes through learning dragging distance and direction for each sample.Finally,this paper has analysed the influence of each parameter on the performance of algorithms,and discussed the impact in classifier accuracy,when used different proportion of labelled data and selected a different number of features.In six benchmark data,we have verified the superiority of the algorithm by comparing with the feature selection algorithms proposed in recent years.

Keywords/Search Tags:

Regression algorithm, Semi-supervised feature selection, Sparse feature selection, Discriminant feature selection

PDF Full Text Request

Related items

1	Research On Semi-supervised Feature Sparse Selection Method Algorithm Based On Least Squares Regression
2	Research And Application On Rough Set Based Feature Selection Algorithm
3	Research On Graph Regularized And Discriminant Information Based Feature Selection
4	Research On Semi-supervised Feature Selection Algorithm Based On Graph Learning
5	Research On Semi-supervised Sparse Feature Selection For Image Annotation In Web Space
6	Research And Application For Website Accessibility Evaluation Oriented Group Sparse Feature Selection
7	Research On Semi-Supervised Feature Selection Algorithm Based On Graph
8	Research On Feature Selection Algorithm And Its Application In Image Recognition
9	Semi-supervised Feature Selection Based On Kernel Density Estimation
10	Research And Improvement Of Feature Selection Algorithms Based On Sparse Learning