Font Size: a A A

High dimensional data set analysis using a large-scale manifold learning approach

Posted on:2015-10-17Degree:Ph.DType:Dissertation
University:Old Dominion UniversityCandidate:Tran, LocFull Text:PDF
GTID:1478390017998444Subject:Engineering
Abstract/Summary:
Because of technological advances, a trend occurs for data sets increasing in size and dimensionality. Processing these large scale data sets is challenging for conventional computers due to computational limitations. A framework for nonlinear dimensionality reduction on large databases is presented that alleviates the issue of large data sets through sampling, graph construction, manifold learning, and embedding. Neighborhood selection is a key step in this framework and a potential area of improvement. The standard approach to neighborhood selection is setting a fixed neighborhood. This could be a fixed number of neighbors or a fixed neighborhood size. Each of these has its limitations due to variations in data density. A novel adaptive neighbor-selection algorithm is presented to enhance performance by incorporating sparse ℓ 1-norm based optimization. These enhancements are applied to the graph construction and embedding modules of the original framework. As validation of the proposed ℓ1-based enhancement, experiments are conducted on these modules using publicly available benchmark data sets. The two approaches are then applied to a large scale magnetic resonance imaging (MRI) data set for brain tumor progression prediction. Results showed that the proposed approach outperformed linear methods and other traditional manifold learning algorithms.
Keywords/Search Tags:Data, Manifold learning, Large
Related items