Font Size: a A A

Extracting trends in high-dimensional datasets

Posted on:2010-02-08Degree:M.SType:Thesis
University:Wayne State UniversityCandidate:Pokharkar, SnehalFull Text:PDF
GTID:2448390002488004Subject:Computer Science
Abstract/Summary:
High-dimensional data analysis is an important research area in today's world, due to the rapid growth in the amount of data collected. To that end, this thesis seeks an information-revealing representation for high-dimensional data distributions that may contain local trends in certain subspaces. Examples are data that have continuous support in simple shapes with identifiable branches. Such data can be represented by a graph that consists of segments of locally fit principal curves or surfaces summarizing each identifiable branch. This thesis describes a new algorithm to find the optimal paths through such a principal graph. The paths are optimal in the sense that they represent the longest smooth trends through the data set, and jointly they cover the data set entirely with minimum overlap. The algorithm is suitable for hypothesizing trends in high-dimensional data, and can assist exploratory data analysis and visualization. Additionally, another algorithm called IRST which identifies Information Rich Subsets of High-Dimensional data and extracts the order based Subspace Trends present in them is also developed in this thesis. The notion of Trends, the implementation details, the complexities and analysis along with results on synthetic and real world sample datasets are described.
Keywords/Search Tags:High-dimensional data, Trends, Data analysis
Related items