Font Size: a A A

Dimension Reduction Under Complex Data Environment

Posted on:2019-09-10Degree:MasterType:Thesis
Country:ChinaCandidate:Q XuFull Text:PDF
GTID:2428330593451028Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,more and more learning tasks are faced with the problem of high-dimensional data.High dimensional data bring various problems to the machine learning tasks,such as too many model parameters,which can easily lead to overfitting and weak generalization ability of the model;moreover,the high dimension of data processing required high time complexity;in addition,the high dimensional data are also prone to be ”Measure of Central” effect,that is part of the samples the nearest samples and the farthest distance tend to be equal.Therefore,we need to use dimensionality reduction methods to reduce the dimensionality of the data.For feature space,feature selection is an effective feature space dimensionality reduction method.Feature selection selects the most representative feature subset from the original features,and uses the selected subset to replace the original feature set for learning tasks,so as to achieve the purpose of dimension reduction of feature space.For label space,label space dimensionality reduction can be achieved by using label selection based or label space transformation algorithms.In addition,the method of data dimensionality reduction is also faced with various complex data environments.In unsupervised feature selection,feature selection task is much challenging because there is no data label.In supervised feature selection,traditional feature selection methods often require that label matrix should be complete.While in practice,the label matrix is incomplete and there are some labels missing due to the high cost of manual annotation and vague ambiguity between some certain tags.In addition to the curse of dimensionality in feature space,high-dimensionality problem also exists in label space.If we use traditional classifier for label prediction,the time complexity is proportional to the number of tags.Aiming at solving high-dimensionality of feature space and label space under complex data environments,three dimension reduction methods are proposed in this paper.The main contributions of this paper can be divided into the following aspects:1.An co-regularized unsupervised feature selection method is proposed.The model considers both data distribution,data reconstruction capability and the manifold structure of data.2.Through linear regression model,multi label feature selection can be achieved while recovering the missing labels.We also add manifold regularization terms to the model to ensure that the distance in the original space can be maintained in the new space.3.Because of the high dimensionality of label space,we learn the low dimensional implicit space shared by feature space and tag space based on dictionary learning,and reduce the dimension of label space.
Keywords/Search Tags:Dimension reduction, Unsupervised learning, Missing labels, Label space dimension reduction
PDF Full Text Request
Related items