| As the saying goes,birds of a feather flock together.The so-called cluster analysis is a multivariate statistical method to study how to classify research objects reasonably according to their own characteristics.Hierarchical clustering is a common method in cluster analysis.Before using hierarchical clustering,it is usually necessary to define the similarity between research objects.Distance is often used to measure such similarity.Different distances are selected according to different data characteristics to measure the closeness between samples.In the era of big data,the types of data we get are becoming more and more diverse.This paper focuses on the similarity measurement between landmark data.Different from most data in our previous statistical studies,each research object in landmark data is a matrix form,and it usually has certain shape characteristics.So for this type of data,we need to define a new distance to measure the similarity between them.Therefore,how to define the distance between landmark data samples and classify the samples is the main research purpose of this paper.Considering that the field of topology mainly involves the study of the internal topological structure or shape of objects,this paper tries to combine the methods in topology with statistical analysis to complete the classification of landmark data.Topological data analysis(TDA)is a collection of methods for finding topological structures in data.Persistent homology is a major method of TDA.Persistent homology is a major method of TDA.Using the method of persistent homology,we can determine the significant topological features of samples from landmark data,and the topological features obtained under different dimensions are also different.According to the topological features found,we define a new distance,Wasserstein distance,and calculate the distance matrix,the distance matrix is the basis of our subsequent hierarchical classification.Based on the method of persistent homology,a complete method system is established to classify landmark data.Today,with the development of the times,artificial intelligence has become more and more of our concern.And face recognition is also the rise and rapid spread in recent years.The technical process of face recognition usually includes four parts:face image acquisition and detection,face image preprocessing,face image feature processing and matching and recognition.So recognition of image information is the key to face recognition technology.In this paper,we use Biold face database as an example to find out the topological features of face data set by the method of persistent homology,and use the topological features found to calculate the Wasserstein distance between every two samples.The hierarchical classification is performed on the distance matrix,and the data output results are obtained by R language programming.The classification of image samples in Biold face data set is completed,and good results are obtained. |