Font Size: a A A

Realizing a feature-based framework for scientific data mining

Posted on:2007-02-11Degree:Ph.DType:Dissertation
University:The Ohio State UniversityCandidate:Mehta, SameepFull Text:PDF
GTID:1448390005966302Subject:Computer Science
Abstract/Summary:
This dissertation presents an efficient realization of a feature based framework for analyzing scientific data. The main components of the framework include: feature detection, feature classification, feature verification, and modeling the evolutionary behavior of the features. The usefulness of first three steps is shown on datasets originating from computational molecular dynamics. Modeling the evolutionary behavior of the features involves: (i) understanding the trajectory of an individual feature; (ii) discovering the change which features undergo due to various interactions; and (iii) understanding and deriving various spatio-temporal relationships among features.; A rule-based feature detection algorithm extracts the features. These rules are developed by making use of the domain specific properties. The algorithm is highly robust in the presence of noise. The features detected from noisy datasets are consistent with the features detected from noise-free data.; The trajectory of a feature is represented by using physically meaningful parameters: linear velocity, angular velocity and scale parameters. Most of the existing techniques abstract the feature to a single point and only take into account the change in the position. The proposed representation scheme accounts for change in position, orientation and size of the feature. The representation also aids in establishing various spatial and spatio-temporal relationships among the features. The usefulness of the scheme is evaluated on datasets originating from molecular dynamics and fluid flows.; The interactions among co-existing features is captured by a set of critical events: continuation, merging, bifurcation, creation and dissipation. The algorithms establish correspondence among features based on the degree of overlap between the features in consecutive time steps.; Finally, a visual toolkit is developed which aids the user in establishing various spatial and spatial-temporal relationships. The toolkit achieves real time performance. The usefulness of the toolkit is shown on datasets originating from 2D fluid-flow datasets.; Prior to the developed algorithms, manual analysis of a very small dataset of 100 MB used to take around 6 weeks. However, now feature extraction and classification tasks for a 10 GB molecular dynamics dataset can be performed in 25 hours which is faster than data generation time of 35 hours. (Abstract shortened by UMI.)...
Keywords/Search Tags:Feature, Data, Framework
Related items