Font Size: a A A

Research On Dimension Reduction Methods For High-dimensional Complex Data

Posted on:2015-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:W F LiuFull Text:PDF
GTID:2308330464964567Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid development and promotion of information technology, such as computer science, internet communication and data acquisition techniques, “data” has gradually penetrated into all areas of today’s social production and life. It becomes more convenient and efficient for us to acquire relevant research data or production and living information, which also leads to the increasing scale, diversified type, complicated structure and low value density of data in the database. Dimension reduction has been a research highlight in data mining, machine learning and pattern recognition to extract useful information from these high-dimensional, massive and complex data with redundant information for further accurate and fast classification, recognition and retrieval tasks.Aiming to solve the problems and deficiencies of existing algorithms, this thesis conducts a thorough research on the dimension reduction for high-dimensional and complex data. The main work and contributions of this thesis can be summarized as follows:First, in consideration of the limited discriminative capacities of the low-dimensional features extracted by a single algorithm, a discriminative dimension selection method based on dense subgraph detection is proposed in this thesis. It defines the dimension selection criteria, including the discriminative preserving criterion and the independent preserving criterion, then embeds them in a graph model and detects the compact dense subgraph to reselect the low-dimensional features obtained by multiple conventional dimension reduction algorithms. This proposed method can integrate advantages of various algorithms by eliminating redundant information between different low-dimensional features.Second, a dimension reduction algorithm inspired by the signal to noise ratio is proposed. Firstly, it defines the local signal and noise around each sample by the Euclidean distances between nearest neighbors. Secondly, using different combinations of local signal and local noise to construct different signal to noise ratio functions, it simplifies multi-class dimension reduction task with the “one-against-one” or “one-against-all” strategies. Thirdly, generalized eigen-decomposition problems are solved for feature projection vectors. Finally, it is extended to solve nonlinear feature extraction problems by establishing hierarchical structure with simple intermediate nonlinear transformation. It doesn’t depend on the assumption of Gaussian distribution and each extracted low-dimensional feature can be explained intuitively and reasonably.Third, an unsupervised method called structured PFC is proposed. Firstly, clustering information of original data is obtained by cluster analysis. Secondly, clustering centers are embedded in a low-dimensional sub-manifold to preserve global structure in high-dimensional space. Thirdly, a neighboring matrix is constructed to preserve local similarity information. Finally, the projected values of cluster centers on the low-dimensional sub-manifold and the neighboring matrix are used to initialize the PFC model, and the parameters of PFC model are obtained by Gibbs sampling to calculate projection matrix. Structured PFC preserves both local similarity information and global structure of original data, which improves the performance of unsupervised feature extraction method.
Keywords/Search Tags:Dimension Reduction, Dimension Selection, Signal to Noise Ratio, Structured PFC
PDF Full Text Request
Related items