Font Size: a A A

A New Nearest Neighbor Measurement Method And Its Application In Clustering Algorithm

Posted on:2019-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:R F ChengFull Text:PDF
GTID:2428330578472683Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
It is necessary to evaluate the neighborhood between different objects in the field of Data Mining,Natural Language Processing and Information Retrieval.The performances of the classification and clustering algorithms strongly depends on the selection of neighbor measurement,which can be correspondingly chosen according to the difference between data and application scenarios.In supervised learning,a good measurement can be obtained through a measurement learning algorithm.There is no training data in unsupervised learning,but a better neighbor measure method can be designed according to requirements.In this dissertation,we first summarize the commonly used neighbor measurements,and analyze the characteristics and application scenarios of different measurement methods.Based on the idea of related algorithm,the K-Mutual Neighbor Relative Distance(KMNRD)method is proposed in view of the disadvantages of the existing methods.And its characteristics and application methods are deeply analyzed.The KMNRD needs to be used in combination with other measurements,but it does not depend on a specific method.It can reduce the influence of measurement units and keep the relativity of distance.In theory,KMNRD can be used in any application scenario using other methods.Based on the KMNRD method,this dissertation mainly explores the following two aspects.1)Research on KMNRD application in clustering algorithmThe proposed KMNRD is applied to different clustering algorithms,and the enhancement of KMNRD is experimentally compared.Firstly,clustering algorithm is analyzed and summarized,and a great deal of representative clustering algorithms are choosen.Then,clustering experiments are carried out on multiple data sets with different neighbor measurements and different clustering evaluation indexes.At last,the results of clustering are compared from two aspects,including the improvement effect of the same algorithm using the different neighbor measurements and the upgrade rate of different algorithms using KMNRD.Experiments show that using KMNRD can enhance the clustering result of algorithm.2)The minimum spanning tree clustering algorithm based on KMNRD.In this dissertation,we propose an efficient minimum spanning tree clustering algorithm based on KMNRD.Firstly,the problem of minimum spanning tree segmentation and imbalance of data is studied.After that,the flow chart of the algorithm is described in detail,and the time complexity of the algorithm is analyzed.Finally,a comparable experiment is carried out on the synthetic datasets,the real data sets and the face dataset.The experimental results show that the algorithm is simple and easy to implement.It has a very good clustering effect for the data with geometric shape;it can cluster clusters with arbitrary shape;it is compatible on imbalanced data.The algorithm is robust and the performance is better than other comparison algorithms.
Keywords/Search Tags:Similarity Measurement, Dissimilarity Measurement, KMNRD, Clustering
PDF Full Text Request
Related items