Font Size: a A A

The Analysis And Application Of Clustering Algorithm For Multi-Dimensional Data Streams

Posted on:2010-11-15Degree:MasterType:Thesis
Country:ChinaCandidate:X L YangFull Text:PDF
GTID:2178360275451294Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the gradual maturity of computer science and the increase of requirements in information society, data mining emerged to discover interesting knowledge from a huge amount of information. With the high-speed development of information technology and extensive application of internet, more and more data are generated in the form of stream so that data mining has to face a new data form, namely on-line real time data streams as well as static data in data bases. Data stream is a sequence composed of a series of successive and ordered data. Data stream has the characteristics of infinite, high arrival speed and non-recurrent, so how to effectively process it becomes a new challenge in data mining and attracts wide attention.Because of the finiteness of storing space and infinity of data stream, it is impractical to store the all data from data stream for precise mining results. Therefore in the data mining processing model, mining algorithms only store the synopsis information of data stream and update this information with the continuous arrival of data from streams. At the same time, according to the query demand of users, approximate query results can be gotten through the information of data streams maintained by mining algorithms.Cluster analysis is a significant subject in data mining and plays an important part in the developing trend of data mining. Data mining technology should be applied to the real world. However, real data possess many attributes. So data mining should have the ability of process data with high dimension. Different clustering algorithms use different technologies to handle the high dimension of data.This thesis mainly explores the clustering analysis algorithm for the data streams with high dimension and the contents could be generalized as such three parts:(1) Discuss the window strategy in data streams mining and analyze the problems of algorithm Cell Tree. Propose a new memory structure LIST TREE.(2) Based on structure LIST TREE, a new clustering algorithm LTC is given to process high dimensional data streams.(3) Through experiments on algorithms, analyze the efficiency and performance of algorithms.The results of experiments show that LTC not only has a good adaption to high dimensional data streams but also has a better efficiency than Cell Tree.
Keywords/Search Tags:data mining, data stream, high dimension, clustering
PDF Full Text Request
Related items