Font Size: a A A

The Research On Common Subspace Recognition Method For High Dimensional Data

Posted on:2016-11-05Degree:MasterType:Thesis
Country:ChinaCandidate:S ZouFull Text:PDF
GTID:2308330467480836Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of Information and the Internet, high-dimensional data are constantly emerging in various areas, e.g.Data collected by the telescope, Web data, Multimedia data, Biomedical Genetic data and etc. Generally speaking, those data are in a high-dimension feature space, which results in curse of dimensionality. It is a challenge and urgent task to mine the valuable information from these high-dimensional data. An effect strategy to handle high-dimensional data is dimension reduction, representing the original data in a low-dimension space, where it is easy to identify the clear potential structure or pattern. In this thesis, we focus on multi-label learning and the clustering analysis, where the original data can be re-represented via a common latent subspace.In the multi-label classification problems, since some instances may belong to more than one class, there are some correlations between labels. Recently, more and more researchers pay attention to label correlation among multi-label classification. Most existing multi-label learning methods are presented on the original data. However, the original data have some drawbacks like high dimensional, information redundancy and so on, which results in worse learning performance. In this paper, we proposed a common subspace identification model. The model makes use of label information to extract high-level features from the original high-dimension features and express the high-level pattern with two matrices. One is related to the high-level feature space, the other is related to the label correlation according to their common high-level feature space. Based on high-level feature information, the original high-dimensional data is effectively re-represented in a low-dimensional space. Experimental results on real-world data sets have shown that the model can effectively improve the performance of multi-label classification.Furthermore, we expand this idea to the area of the clustering analysis. Our aim is to improve the reliability of feature selection and the performance of the clustering by extracting the common information shared among the features and clusters. Through the analysis of the existing feature selection model, we can verify the effectiveness of the algorithm, and also demonstrate that the identification of common subspace is valuable for unsupervised learning task.
Keywords/Search Tags:high-dimensional data, common subspace, multi-label learning, labelcorrelation, data representation, clustering analysis
PDF Full Text Request
Related items