Dynamic data mining on multi-dimensional data

Posted on:2007-05-02

Degree:Ph.D

Type:Dissertation

University:State University of New York at Buffalo

Candidate:Shi, Yong

Full Text:PDF

GTID:1458390005480049

Subject:Computer Science

Abstract/Summary:

The generation of multi-dimensional data has proceeded at an explosive rate in many disciplines with the advance of modern technology, which greatly increases the challenges of comprehending and interpreting the resulting mass of data. Existing data analysis techniques have difficulty in handling multi-dimensional data. Multi-dimensional data has been a challenge for data analysis because of the inherent sparsity of the points.; A first step toward addressing this challenge is the use of clustering techniques, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. Cluster analysis is used to identify homogeneous and well-separated groups of objects in databases. The need to cluster large quantities of multi-dimensional data is widely recognized. It is a classical problem in the database, artificial intelligence, and theoretical literature, and plays an important role in many fields of business and science.; There are also a lot of approaches designed for outlier detection. In many situations, clusters and outliers are concepts whose meanings are inseparable to each other, especially for those data sets with noise. Thus, it is necessary to treat clusters and outliers as concepts of the same importance in data analysis.; It is well acknowledged that in the real world a large proportion of data has irrelevant features which may cause a reduction in the accuracy of some algorithms. High dimensional data sets continue to pose a challenge to clustering algorithms at a very fundamental level. One of the well known techniques for improving the data analysis performance is the method of dimension reduction which is often used in clustering, classification, and many other machine learning and data mining applications.; Many approaches have been proposed to index high-dimensional data sets for efficient querying. Although most of them can efficiently support nearest neighbor search for low dimensional data sets, they degrade rapidly when dimensionality goes higher. Also the dynamic insertion of new data can cause original structures no longer handle the data sets efficiently since it may greatly increase the amount of data accessed for a query.; In this dissertation, we study the problems mentioned above. We proposed a novel data pre-processing technique called shrinking which optimizes the inner structure of data inspired by Newton's Universal Law of Gravitation in the real world. We then proposed a shrinking-based clustering algorithm for multi-dimensional data and extended the algorithm to the dimension reduction field, resulting in a shrinking-based dimension reduction algorithm. (Abstract shortened by UMI.)...

Keywords/Search Tags:

Data

Related items

1	Seismic Achievement Data ETL Platform Architecture Design And Software System Implementation
2	The Research And Application Of Data Preprocessing In XML Data Warehouse
3	Research On Related Issues Of Unstructured Data
4	The Data Integration、analysis And Utilization For Hosiptal Information Based On The Data Warehouse
5	Design And Implementation Of Data Mining Support Subsystem Based On Big Data Of Power
6	Design And Implementation Of Environmental Monitoring Data Management System
7	Research On The Problems And Countermeasures Of Domestic Data Journalism Practice
8	Study On Data Dependency_Based Data Quality Processing Techniques In Data Integration
9	Big Data And Research Of Big Data In Modern Internet Applications
10	Design And Implementation Of The Bayonet Data Integration Platform