Font Size: a A A

Dynamic data mining on multi-dimensional data

Posted on:2007-05-02Degree:Ph.DType:Dissertation
University:State University of New York at BuffaloCandidate:Shi, YongFull Text:PDF
GTID:1458390005480049Subject:Computer Science
Abstract/Summary:
The generation of multi-dimensional data has proceeded at an explosive rate in many disciplines with the advance of modern technology, which greatly increases the challenges of comprehending and interpreting the resulting mass of data. Existing data analysis techniques have difficulty in handling multi-dimensional data. Multi-dimensional data has been a challenge for data analysis because of the inherent sparsity of the points.; A first step toward addressing this challenge is the use of clustering techniques, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. Cluster analysis is used to identify homogeneous and well-separated groups of objects in databases. The need to cluster large quantities of multi-dimensional data is widely recognized. It is a classical problem in the database, artificial intelligence, and theoretical literature, and plays an important role in many fields of business and science.; There are also a lot of approaches designed for outlier detection. In many situations, clusters and outliers are concepts whose meanings are inseparable to each other, especially for those data sets with noise. Thus, it is necessary to treat clusters and outliers as concepts of the same importance in data analysis.; It is well acknowledged that in the real world a large proportion of data has irrelevant features which may cause a reduction in the accuracy of some algorithms. High dimensional data sets continue to pose a challenge to clustering algorithms at a very fundamental level. One of the well known techniques for improving the data analysis performance is the method of dimension reduction which is often used in clustering, classification, and many other machine learning and data mining applications.; Many approaches have been proposed to index high-dimensional data sets for efficient querying. Although most of them can efficiently support nearest neighbor search for low dimensional data sets, they degrade rapidly when dimensionality goes higher. Also the dynamic insertion of new data can cause original structures no longer handle the data sets efficiently since it may greatly increase the amount of data accessed for a query.; In this dissertation, we study the problems mentioned above. We proposed a novel data pre-processing technique called shrinking which optimizes the inner structure of data inspired by Newton's Universal Law of Gravitation in the real world. We then proposed a shrinking-based clustering algorithm for multi-dimensional data and extended the algorithm to the dimension reduction field, resulting in a shrinking-based dimension reduction algorithm. (Abstract shortened by UMI.)...
Keywords/Search Tags:Data
Related items