Font Size: a A A

Research On Clustering Algorithm For Heterogeneous Objects Based On Information Dissimilarity And Irregular Grid

Posted on:2013-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:X F WangFull Text:PDF
GTID:2248330392454810Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Clustering algorithm based on heterogeneous objects has become a research hotspot.It can be widely used to bioinformatics, meteorological information analysis, and intrusiondetection. Clustering quality of the algorithm which can deal with heterogeneous objects isdetermined by dissimilarity measure. Moreover, the partition of the grid also directlyaffects the clustering precision. To solve the problem above, this paper will focus on theresearch of a clustering algorithm for heterogeneous objects based on informationdissimilarity and a clustering algorithm for heterogeneous stream based on the irregulargrid.Firstly, we give the analysis and discussion for the related concepts and techniques.They mainly include the clustering algorithm based on partition, the clustering algorithmbased on hierarchical, the clustering algorithm based on grid and density, and alsoalgorithm for dealing with heterogeneous objects.Secondly, we propose a clustering algorithm for heterogeneous objects based oninformation dissimilarity. The algorithm defines heterogeneous information dissimilaritybetween two heterogeneous objects based on Kolmogorov information theory. In theclustering process, the algorithm selects the initial cluster centers by maximum sum ofdissimilarity. After that, each remaining object is assigned to a cluster center which has thesmallest dissimilarity with it and the criterion function is calculated. Iteratively, clustercenters are updated and the process is ceased until the criterion function converges or theiteration number reaches the pre-set threshold.Thirdly, we propose a clustering algorithm for heterogeneous stream based onirregular grid. The algorithm consists of an online component and an offline component.In the online component, data records are read continuously and the grid was constructedaccording to the continuous property and grid radius and the grid feature vector wasupdated. While in the offline component, the grids constructed during online process areclustered. An undirected graph was built by grid central points and the distance betweenthem. We could get a MST from the undirected graph and get k clusters by cutting the k-1 maximum edge of MST.Finally, experimental results show that the proposed method is feasible and effective,and the analysis of experimental results are made.
Keywords/Search Tags:data stream, clustering, heterogeneous objects, information dissimilarity, irregular grid
PDF Full Text Request
Related items