Font Size: a A A

Research On K-medoids Clustering Algorithm Based On Breadth-first Search

Posted on:2016-07-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y M ZhouFull Text:PDF
GTID:2348330488482012Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
The rapid development of network and the constant improvement of computer storage technology makes data explosion, it is becoming more and more difficult to find useful information from the vast amount of data. And the data mining technology comes into being to digging valuable information from the huge heap of data. Data mining is the procedure of extracting of original unknown and potentially valuable knowledge and rules in the database, which is widely applied in many fields in recent years. As an important research branch of the data mining technology, cluster analysis is an important method of data partition or grouping, it aims to divide data into meaningful clusters by the similarity between data objects.The main research object of this paper is the K-medoids clustering algorithm,is one of a partition method. First of all, this paper introduces the related concepts of clustering analysis then analyzes K-medoids clustering algorithm emphatically. K-medoids algorithm has the advantage of not easily affected by extreme data and broad application. But it is sensitive to the initial centers, the centers of random selection, as well as the shortcoming of poor accuracy. The main researches of this paper are as follows and make corresponding improvement according to the disadvantages.Firstly, aiming at the problem that traditional K-medoids clustering algorithm is sensitive to the initial centers, we propose a method using the granular computing to make data protocol for the traditional K-medoids algorithm, then gaining effective particles, and selecting K corresponding center points as the initial center points from them. Using the data protocol algorithm to test based on the Iris and Wine dataset, the experimental results show that the initial center points which are located in different clusters respectively. As a result, effectively avoid the problem that the traditional algorithm is sensitive to the data protocol.Then, under the condition of effective data protocol, the breadth-first search strategy is proposed to solve the problems that the traditional K-medoids clustering algorithm has defects that its convergence rate is slow and cluster accuracy is not high enough. According to the similarity between the objects, we use the breath-first search to traverse the binary tree to find out K optimal centers in order to reduce the number of iterations in the processes of clustering algorithm. Meanwhile, a novel criterion function balancing the inner-cluster distance and among-clusters distance is proposed in order to enhance the adaptability and precision of clustering algorithm. Tested on a number of standard data set Iris and Wine in UCI, the experimental results show that this new algorithm effectively reduces the number of iterations and improves the accuracy of clustering at the same time.
Keywords/Search Tags:K-medoids clustering algorithm, granular computing, breadth-first search strategy, binary tree of similar object, criterion function
PDF Full Text Request
Related items