Font Size: a A A

The Research Of Similarity Metric In K Nearest Neighbor Classification And Fuzzy C Means Clustering

Posted on:2016-07-09Degree:MasterType:Thesis
Country:ChinaCandidate:J J ZengFull Text:PDF
GTID:2308330470973208Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Pattern recognition was born in twentieth Century 20’s. With the appearance of computer in 40’s and the development of artificial intelligence in 50’s, pattern recognition plays a very important role in people’s daily life and all walks of life in society. Therefore, many famous scholars from all walks of life explore and study the theories and methods of pattern recognition. At the same time, pattern recognition became a discipline rapidly in the early 60’s.There are two important themes in the research field of pattern recognition. They are classification and clustering. Classification and clustering have been applied widely in many fields. For the classification and clustering algorithms, to construct the distance measure or similarity measure is a very fundamental problem. Therefore, in order to ensure that the classification and clustering algorithms are better, the especial key step is to choose the appropriate distance measure or similarity measure.In this paper, following the basic idea of locality preserving projections(LPP) algorithm, we first construct a new similarity measure method, then we propose the new classification and clustering algorithms, which can reflect the internal structure characteristics of data. First, we give a brief overview of the classification and clustering. Second, we list some similarity measure methods which are often used currently in classification and clustering algorithms. Third, we introduce the K Nearest Neighbor(KNN) algorithm, Fuzzy C Means(FCM) algorithm and LPP algorithm in detail. LPP has attracted much attention in current. Last, following the basic idea of LPP, we improve the KNN and FCM algorithms. However, Euclidean distance treats all features equally. Mahalanobis distance considers the distribution characteristics of the data and it is not affected by the influence of dimension. But Mahalanobis distance exaggerates the function of tiny variable. Both Euclidean distance and Mahalanobis distance ignore the local intrinsic geometric structural characteristics of data. Aiming at this problem, following the basic idea of the LPP algorithm, we first make a detailed introduction on locality preserving scatter matrix and locality preserving within-class scatter matrix, then we use the scatter matrices to propose novel distance metrics, last we develop modified versions of classification and clustering algorithms. The modified versions of classification and clustering algorithms’ accuracy have been improved. We carry out experiments on real data, fitting data, face data and handwritten digit data. The experimental results based on cross validation and other experimental results show that the methods proposed are effective and feasible. Compared with the classification and clustering algorithms which are based on Euclidean distance and Mahalanobis distance, the proposed algorithms have better classification and clustering accuracy.
Keywords/Search Tags:classification, clustering, Locality Preserving Projections, Mahalanobis distance
PDF Full Text Request
Related items