Font Size: a A A

Design And Implementation Of Initial Cluster Center Selection Algorithm Based On Coupling Attribute

Posted on:2019-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:J WenFull Text:PDF
GTID:2428330551460303Subject:Software engineering
Abstract/Summary:PDF Full Text Request
There is a lot of categorical data in our life whose attribute value is a state value that has no geometric features of the numerical data,the attribute value can't carry out numerical calculation,so the clustering algorithm for the numerical data could not be directly applied to categorical data.In the clustering algorithm,the dissimilarity between objects is determined by distance function.The measure of dissimilarity or distance is the basis of clustering algorithm,which has a great influence on the clustering efficiency of the algorithm.Therefore,it is very important to compute the similarity or distance of the two categorical data objects in a reasonable way.The current distance measure of categorical data is still an important topic for many scholars,and many scholars have studied from different angles.Secondly,the performance of clustering algorithm based on partition directly depends on the selection of the initial cluster center.Currently,a common clustering algorithm based on partition usually uses random selected samples as the initial clustering center,which may lead to the clustering process tends to the local optimal results,and the center that truly reflects the data distribution may approach the global optimal clustering center.At present,many scholars have done some research on this subject from different angles.In view of the above two topics,this thesis studies the distance learning and the selection of initial center for the categorical data and analyzes all the existing algorithms.The main research results of this thesis are as follows:A weighted distance based on attribute coupling is proposed.The distance includes two parts:the weighted intra-coupled distance within an attribute and the weighted inter-coupled distance between attributes.By using the degree of dependence between attributes,the weight of the degree of coupling between attributes is used to improve the accuracy of distance measure between objects.And the weight of attribute is used to measure the importance of attribute in distance measure and improve the accuracy of clustering results.Based on the theory of coupling attribute,we improve the method of selecting initial centers using density and distance:we select the initial center by calculating the density of each object based on attributes coupling,so as to improve the efficiency of clustering algorithm based on partition.The research results of this thesis enrich the research on the distance measurement of categorical data,and to some extent provide a new way to the clustering of categorical data.We believe that the continuous research of such algorithm can solve more practical problems.
Keywords/Search Tags:attribute coupling, initial center, distance learning
PDF Full Text Request
Related items