Font Size: a A A

Research On Clustering Algorithm Over High Dimensional Data Stream Based On Irregular Grid Data

Posted on:2015-06-01Degree:MasterType:Thesis
Country:ChinaCandidate:G H HuFull Text:PDF
GTID:2298330422970946Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Data Stream clustering is an important research field in the data Stream mining.There still exist many problems for clustering high-dimensional data Streams in thealgorithms at home and broad. Presented clustering algorithms grid-based have thecapability of efficiency, but the cluster quality is directly influenced by the grid granularityand unable to deal with the high-dimensional data Streams. In order to address aboveproblems, This paper has mainly focused on how to improve the cluster quality ofalgorithms based on grid and density over data Stream, and also deal with the problem ofclustering over high dimensional data Stream, which are important data mining problemswith broad applications, including network security, wireless sensor, industrial control,e-commerce, Network Communication, Business Intelligence and so on.First, an irregular grid-based subspace clustering algorithm over high-dimensionaldata Streams is developed. An irregular grid structure is dynamically maintained andgenerated by means of splitting each dimension into different grid cells. the final clustersare obtained in subspaces which are formed by dimensions associated with correspondingclusters.Second, clustering algorithm based on grid and matrix over high dimensional dataStream is proposed. The algorithm adopts the two-phased framework. In the onlinecomponent, Grid cell is employed to monitor one-dimensional statistics data distributionof each dimension independently. Sparse grid cells which need to be deleted are checkedby predefined threshold. In the offline component, grid matrix structure is generated bythese dense GCs. When the request of clustering is arriving, the final multi-dimensionalclusters are got traversal the whole data space.Finally, the above algorithms are implemented with language of Java. All of ourexperiments are performed on the real and synthetic datasets. The experimental resultsshow the feasibility and effectiveness of our algorithms.
Keywords/Search Tags:High-dimensional data Streams, Data Stream clustering, irregular grid, gridmatrix
PDF Full Text Request
Related items