Font Size: a A A

Research On Intervals Division-Based Clustering Algorithm For High-Dimension Data Streams

Posted on:2011-11-07Degree:MasterType:Thesis
Country:ChinaCandidate:L N XuFull Text:PDF
GTID:2178360302994528Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the continuous increasing in data size and the growing of data dimensions, traditional clustering algorithms have been unable to obtain meaningful clustering results. To cluster high dimensional data streams, this paper has mainly focused on these problems in the clustering process, the effective use of storage space, the updating of clustering and the applicability to data. In the meantime, two clustering algorithms, dynamic clustering algorithm based on optimal interval division and information entropy clustering algorithm based on spaces division, are presented.Firstly, an approach of memory-based data partition is defined. With this method, the intervals obtained are suitable for the size of memory cell space, and the resource waste which is caused by the idle in memory cell can be avoided. Then two kinds of interval partition manners are proposed: partitioning the optimal interval into high-density grids and partitioning the data space into unit space.Secondly, clustering algorithm based on optimal intervals division for high-dimension data streams (short for DOIC) is proposed. With the memo- ry-based data partition and optimal intervals division, the high-density grids, which are closer to the actual distribution of stream data. By merging HDU-trees, new data streams are inserted. Meanwhile, to eliminate the impact of the historical data in clustering results, the weight is used to decay the historical data. The algorithm gains better space scalability and higher clustering quality.Lastly, IEC (Information Entropy Clustering Algorithm based on spaces division) is proposed. This algorithm uses the information entropy to guide clustering data streams. To reduce the calculation projection, the data set is divided into space units. By comparing information entropy between two space units, the space units with smaller information entropy are merged.
Keywords/Search Tags:Data stream, High-dimension, Clustering, Intervals division, Infor- mation Entropy
PDF Full Text Request
Related items