Font Size: a A A

Research On Automatic Clustering Algorithm Based On Bayesian Decision Theory

Posted on:2018-06-15Degree:MasterType:Thesis
Country:ChinaCandidate:F Q ZhaoFull Text:PDF
GTID:2348330542452396Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Clustering analysis is an unsupervised learning method,which plays an important role in exploring the important characteristics of the data,revealing the distribution of spatial data and predicting the development trend of data objects.In the background of the era of big data,clustering analysis has been paid more and more attention,and a variety of clustering algorithms are used to solve the practical problems.However,most of the existing clustering algorithms require to specifying the parameters of clustering by human in advance.These parameters are set not only lack of theoretical knowledge,but they also bring inconvenience for users to use.The unreasonable parameters setting often have great impact on the result of clustering.Based on the above reasons,the idea of Bayesian decision theory is applied to the field of clustering,the risk assessment function of clustering scheme is constructed,and an automatic clustering algorithm is proposed based on the function and the framework of improved K-means clustering algorithm.This thesis proposes a method to select the initial centers of K-means clustering algorithm based on the idea of maximum and minimum distance.The data object that has the least contribution to the inner distance is selected based on the minimum distance,while the object which is most dissimilar to the existing centers is selected based on the maximum distance.According to the basis of maximum and minimum distance,all objects are assigned to the nearest initial center and the object that is farthest from the current centers is selected as the next initial center,until the number of initial centers meet the required numbers.Some experiments are carried out on the artificial data sets and the real data sets of UCI database.The experimental results show that the proposed method can avoid the uncertain results of clustering generated by the initial centers selected randomly,improve the quality of the clustering results effectively,and it is benefit for determining the final clustering scheme.An automatic clustering algorithm is proposed based on Bayesian decision theory in the thesis.The single research object of Bayesian decision theory is extended to a pair.The work can analyze the state of the clustering scheme through analyzing the state of the pairs in the whole data set.A risk assessment function of the clustering scheme is constructed.And then,an automatic clustering algorithm is proposed based on the risk assessment function.The algorithm uses the way of continuous partitioning a given data set to find a good clustering scheme.In each partition,the proposed algorithm can select initial centers based on the maximum and minimum distance and use K-means clustering algorithm to obtain a corresponding clustering scheme,and it can evaluate the risk of the clustering scheme based on the constructed risk assessment function.According to the change of the size of the risk,the algorithm can choose a reasonable number of clusters,therefore,it can obtain the final clustering scheme automatically.Some experiments are carried out on the artificial data sets and UCI standard data sets.The experimental results show that the proposed method,without parameter specified by users in advance,is able to obtain efficient clustering results and determine the reasonable clustering scheme.
Keywords/Search Tags:K-means clustering algorithm, Bayesian decision theory, Max-min distance, Automatic clustering algorithm
PDF Full Text Request
Related items