Font Size: a A A

Improvement And Application Of K-means Clustering Algorithm

Posted on:2016-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y D XuFull Text:PDF
GTID:2308330464952604Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development and popularization of computer and network technology, and also the great development and wide application of the database and database management technology, data now explosively increase. Mining valuable information from massive data has been becoming a very important topic, so data mining technology was proposed to fit for such a scenario. Data mining was defined to mine valuable information or knowledge from massive data, aim at more scientific decision making. Clustering analysis, an unsupervised model, parts a data set into several subsets without prior knowledge. Moreover, each subset is regarded as a cluster, the objects in the same cluster are similar to each other while the objects in different clusters are dissimilar. Recently, cluster analysis technology keeps developing and improving, and also widely being used in many kinds of fields.A lot of clustering algorithms have been proposed for all kinds of real applications. Of them, K-means algorithm has been widely used due to its advantages, such as simple operation and high efficiency. However, K-means algorithm has many disadvantages to be further improved. For example, its clustering result is very sensitive to initial centroid; clustering algorithms easily lead to the issue of dead unit. According to aforementioned observations, this thesis puts forward an improved K-means algorithm by optimizing initial centroid, to obtain good clustering effect within linear time complexity. More specifically, the number of centroid (i.e., k) will be provided by the user, so does the traditional K-means algorithm, the proposed optimization method then parts the space into several sub-spaces and counts the number of data points within each sub-space, followed by selecting the spaces including the most number of data points as the initial centers. If the highest frequency of data point is same in different segments, the proposed method enforces to merge different segments into one segment. In this thesis, we also define a threshold distance for each cluster’s center to compare the distance between a data point and cluster’s center, through which we can minimize the computational effort during calculation of distance between data point and cluster’s center. The experimental results on the UCI datasets showed the accuracy and high efficiency of the improved algorithm, by comparing to the conventional K-means algorithm.In this thesis, we continue to apply for the proposed K-means algorithm on the real application of image segmentation, by designing a region image segmentation method. This method first chooses the color space of image pixels and then extracts the image pixels color, texture and position feature to form feature space vectors. Furthermore, this thesis uses the proposed clustering algorithm to conduct image segmentation, feature extraction and image region, respectively, on the derived feature vectors. Finally, the experimental results on datasets of the application of image segmentation indicated the proposed clustering algorithm outperformed the conventional K-means algorithm.
Keywords/Search Tags:Cluster analysis, K-means algorithm, Image segmentation, Data mining
PDF Full Text Request
Related items