Font Size: a A A

Study On Clustering Algorithm Based On Graph Model

Posted on:2009-07-09Degree:MasterType:Thesis
Country:ChinaCandidate:X Q ZhangFull Text:PDF
GTID:2178360242982993Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the fast development of internet technique, more and more kinds of meta-data are displayed in mass web pages such as text, pictures, video and audio, which indicates a trend of cross-media. If there is not an effective search mechanism, it is very hard for us to gain useful information from such a large scaled web based database. Therefore, it is essential for us to study methods for searching in large scale of cross-media information. Generally, the information which user searches for is not stored in the database in a straightforward form. To meet the search demand of customers, some intelligent process is needed for the search engine, such as abstract generating, classifying, duplication removing and clustering.Based on survey of related papers and further study of clustering algorithms, the following work has been done in improvement and application of clustering algorithm based on graph model.1. We present an algorithm called voting partition affinity propagation which combines voting scheme with affinity propagation clustering algorithm. It is composed of three parts: relaxed multi-root minimum spanning tree assign dicipline, partition affinity propagation clustering algorithm and voting clustering ensemble scheme. It has advantages of both affinity propagation and voting k-means. Experiments on both simulated and real data set approve its validity.2. We display a scheme that can deal with problem of clustering large scaled data set. It was based on voting clustering ensemble algorithm and random partition. And this framework is used to extend affinity propagation clustering algorithm to deal with larger scaled and arbitrary shaped data set. Experiment results validate the feasibility and effectiveness of this extension.3. We propose a model for image collection which is based on voting partition affinity propagation.The main contribution and innovation of this paper are displayed on the improvement and application of the existing algorithms: 1. Proposing partition affinity propagation clustering algorithm. The algorithm can generate random partition within the substantial cluster when output number of clusters is larger than the intrinsic one, thus to meet the demand for randomicity of voting clustering ensemble scheme.2. Presenting relaxed multi-root minimum spanning tree data point assign algorithm. The discipline can reduce the possibility of mis-assign while mostly keep the randomicity of clustering result.3. Proposing to combine affinity propagation clustering algorithm with voting clustering ensemble scheme through partition affinity propagation using relaxed multi-root minimum spanning tree assign discipline. The problem of how to choose an appropriate threshold for the voting scheme to get a good consistent clustering result is also discussed under the measurement of partition consistent index.4. Displaying a framework that can extend clustering algorithms of relative high time and space complexity so that they can deal with larger scaled data set. Firstly, it divides the data set into several sub-blocks randomly and clusters these sub-blocks. Such operation is done for certain times. Then, the above clustering results are ensembled using voting scheme. Affinity propagation clustering algorithm is extended by this framework to deal with larger scaled and arbitrary shaped data set.5. Introducing a model for image collection which is based on voting partition affinity propagation. Developers can build an application based on this proposed framework to facilitate people to get image resources of certain subject from existing image search engines more conveniently.
Keywords/Search Tags:clustering, graph model, affinity propagation, clustering ensemble, voting
PDF Full Text Request
Related items