Font Size: a A A

Research Of High-Dimension Data Stream Clustering Algorithm Based On Damped Window And Pruning List Tree

Posted on:2011-12-28Degree:MasterType:Thesis
Country:ChinaCandidate:D X WangFull Text:PDF
GTID:2178330332967474Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
These years,with the popularization of the computer information technology represented by Internet, data increase fast. The datasize accumulated by people is up to TB level,even PB level. In the real life, most of the data exists in the form of dynamic continuous data stream, which differs from the traditional data stream in that the traditional data stream exists in static medium and can be visited several times.The characteristics of data stream are:(1)large data scale;(2)high dimension; (3)fast arriving speed;(4)potential disorder;(5)each element can be accessed only once. So,it is hard to gain meaningful clustering results by many traditional clustering algorithm.To solve the "dimension disaster" which exists in the high dimensional data stream, this text is aiming to solve the following questions:(1)How to design an effective clustering algorithm to adjust to the continuously coming high-dimensional data stream?(2)During the process of the clustering,how to discover more clusters and imporve the clustering results?(3)In the process of clustering,how to lower the memory consumption?(4)In the process of clustering, how to increse the algorithm efficiency and shorten running time?This text is aiming to change the disadvantages of the classic algorithm on the basis of the study and research of classic algorithm. Furthermore, a new high-dimensional data stream is put forward. The main tasks inlude:(1) In order to effectively control the memory size and lower the memory consumption, this text puts forward a synopsis data structure, that is PL-Tree for short.Which can reserve the summary information of data stream. When any cluster is requested, it can help output the similar clustering results on line.This text aims to effectively control the memory size and improve the algorithm efficiency by adopting the stategies of core technology data elimination and pruning stategy.(2)In order design an efficient clustering algorithm and adjust to the continuously coming high-dimensional data stream, on the basis of PL-Tree synopsis data structure,this texts puts forward a high-dimensional data stream algorithm which is based on the damped window and PL-Tree, that is PLStream algorithm for short.And experiments are used to demonstrate its availability.(3)To prove that the new algorithm is effective, this text compares it with the classical CELL TREE,and experiments demontrates that the new algorithm is improved obviously in spatial scalability and clustering efficiency.
Keywords/Search Tags:Data stream, High-dimension, Clustring, Damped window, Pruning list tree
PDF Full Text Request
Related items