Dimension Reduction Under Complex Data Environment

Posted on:2019-09-10

Degree:Master

Type:Thesis

Country:China

Candidate:Q Xu

Full Text:PDF

GTID:2428330593451028

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet technology,more and more learning tasks are faced with the problem of high-dimensional data.High dimensional data bring various problems to the machine learning tasks,such as too many model parameters,which can easily lead to overfitting and weak generalization ability of the model;moreover,the high dimension of data processing required high time complexity;in addition,the high dimensional data are also prone to be �Measure of Central� effect,that is part of the samples the nearest samples and the farthest distance tend to be equal.Therefore,we need to use dimensionality reduction methods to reduce the dimensionality of the data.For feature space,feature selection is an effective feature space dimensionality reduction method.Feature selection selects the most representative feature subset from the original features,and uses the selected subset to replace the original feature set for learning tasks,so as to achieve the purpose of dimension reduction of feature space.For label space,label space dimensionality reduction can be achieved by using label selection based or label space transformation algorithms.In addition,the method of data dimensionality reduction is also faced with various complex data environments.In unsupervised feature selection,feature selection task is much challenging because there is no data label.In supervised feature selection,traditional feature selection methods often require that label matrix should be complete.While in practice,the label matrix is incomplete and there are some labels missing due to the high cost of manual annotation and vague ambiguity between some certain tags.In addition to the curse of dimensionality in feature space,high-dimensionality problem also exists in label space.If we use traditional classifier for label prediction,the time complexity is proportional to the number of tags.Aiming at solving high-dimensionality of feature space and label space under complex data environments,three dimension reduction methods are proposed in this paper.The main contributions of this paper can be divided into the following aspects:1.An co-regularized unsupervised feature selection method is proposed.The model considers both data distribution,data reconstruction capability and the manifold structure of data.2.Through linear regression model,multi label feature selection can be achieved while recovering the missing labels.We also add manifold regularization terms to the model to ensure that the distance in the original space can be maintained in the new space.3.Because of the high dimensionality of label space,we learn the low dimensional implicit space shared by feature space and tag space based on dictionary learning,and reduce the dimension of label space.

Keywords/Search Tags:

Dimension reduction, Unsupervised learning, Missing labels, Label space dimension reduction

PDF Full Text Request

Related items

1	Research On Key Problems Of Multi-label Learning
2	Research Of Manifold Learning In Data Dimension Reduction And Classification
3	Multi-label Learning With Limited Or Missing Labels
4	Research On Dimensionality Reduction Technique And Its Application Based On Manifold Learning
5	Multi-view Multi-label Learning Based On Dimension Reduction Of Label Space
6	Multi-label Classification Basedsemi-supervised And Localized Dimension Reduction
7	Topics on supervised and unsupervised dimension reduction
8	Missing Multi-label Learning For Label Semantic Space Mining
9	Research On Dimension Reduction Methods For High-dimensional Complex Data
10	Research On Multi Label Learning Via Feature Space And Label Space Dimension Reduction Method