Extracting trends in high-dimensional datasets

Posted on:2010-02-08

Degree:M.S

Type:Thesis

University:Wayne State University

Candidate:Pokharkar, Snehal

Full Text:PDF

GTID:2448390002488004

Subject:Computer Science

Abstract/Summary:

PDF Full Text Request

High-dimensional data analysis is an important research area in today's world, due to the rapid growth in the amount of data collected. To that end, this thesis seeks an information-revealing representation for high-dimensional data distributions that may contain local trends in certain subspaces. Examples are data that have continuous support in simple shapes with identifiable branches. Such data can be represented by a graph that consists of segments of locally fit principal curves or surfaces summarizing each identifiable branch. This thesis describes a new algorithm to find the optimal paths through such a principal graph. The paths are optimal in the sense that they represent the longest smooth trends through the data set, and jointly they cover the data set entirely with minimum overlap. The algorithm is suitable for hypothesizing trends in high-dimensional data, and can assist exploratory data analysis and visualization. Additionally, another algorithm called IRST which identifies Information Rich Subsets of High-Dimensional data and extracts the order based Subspace Trends present in them is also developed in this thesis. The notion of Trends, the implementation details, the complexities and analysis along with results on synthetic and real world sample datasets are described.

Keywords/Search Tags:

High-dimensional data, Trends, Data analysis

PDF Full Text Request

Related items

1	RESEARCH ON DIMENSIONS AND DATA LAYOUT METHODS IN HIGH-DIMENSIONAL DATA VISUAL ANALYSIS
2	Research On Visual Analysis Of High Dimensional Data
3	Research On Visual Analysis Mechanism Of High-dimensional Data Based On Information Entropy
4	Study On Bi-clustering Algorithms Towards High Dimensional Data
5	Research And Design Of Clustering Method Based On Large Data And High Dimensional Data
6	The Research On A Few Key Issues In High Dimensional Data Mining
7	A New High-dimensional Data Clustering Algorithm Based On GAs
8	High-Dimensional Data Analysis by Exploiting Low-Dimensional Models with Applications in Synchrophasor Data Analysis in Power System
9	High-dimensional Data Analysis Based On Scatter Plot Classification
10	Geometric Analysis On High-Dimensional Data:Theories, Algorithms And Applications