Font Size: a A A

Research On Clustering Algorithm Based On Irregular Grid And Subspace Of Descending Dimension

Posted on:2013-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:J Y CuiFull Text:PDF
GTID:2218330362462918Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Clustering algorithm based on the data stream has become a research hotspot. Itcan be widely used to bioinformatics, meteorological information analysis, intrusiondetection etc. clustering algorithm based on grid can get better clustering quality, butthe clustering quality will be directly influenced by the granularity of partitioninggrids. Moreover, the complexity of algorithm will be greatly increased by partitioninggird in the whole data space. So, it will not be suitable for the clustering overhigh-dimensional data stream. To solve the above defect, this paper will focus on theresearch of the clustering algorithm based on the irregular grid over data stream andthe reducing dimensions method for the high-dimensional data stream.First, we give the introduction and discussion for the related concepts andtechniques. They mainly include the relevant knowledge about data stream, datastream model, data slope technology and the classification of clustering algorithms.Second, we propose a clustering algorithm based on adaptive and irregular grid.The algorithm consists of an online component and an offline component. In the onlinecomponent, data records are read continuously, and then grid will be partitioned forthe new coming data. Meanwhile, grid structure is adjusted incrementally. While in theoffline component, density grids will be clustered. And the boundary grids will behandled. Moreover, clusters are adjusted dynamically and noise points are deleted inreal time.Finally, we propose a clustering method based on the effective dimension andgrid density for high-dimensional data stream. The method consists of an onlinecomponent and an offline component. On the online component, with the arrival of thedata stream, algorithm will map each data to the original grid structure. On the offlinecomponent, the subspace will be generated by the effective method. Then the originalgrid structure is projected to the subspace and the new grid structure will be formed.Moreover, the clustering will be performed on the new grid structure.The above algorithms are implemented with C++language. Experimental results show that these algorithms proposed in this paper obtain the high cluster quality andthey are better than the similar ones. And we are also realizing the anticipated results.
Keywords/Search Tags:data steam, high dimension, irregular grid, grid density, clustering
PDF Full Text Request
Related items