Font Size: a A A

Speaker Verification And Person Re-identification Based On Deep Metric Learning

Posted on:2021-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:J W XuFull Text:PDF
GTID:2518306104493614Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the development of deep learning,in recent years,biometric recognition(face recognition,speaker verification,person re-identification,etc.)technology has made significant progress.Among them,the development of face recognition is relatively mature and well researched.However,the application scene of face recognition is relatively simple,and often requires a clearer face image.The mature deep learning methods in face recognition directly applied to other biometric recognition tasks(such as speaker verification,person re-identification,etc.)are not ideal.The machine learning problem of biometric recognition is often not a simple classification task,but belongs to an open-set task,so it is not suitable for processing with traditional classification methods.Deep metric learning technology combined with deep neural network is the current mainstream technology for biometrics recognition.Based on deep metric learning technology,this paper studies speaker verification and person re-identification algorithms from different data forms and different application backgrounds.The main work and innovations include the following parts:(1)Aiming at unrestricted natural scene speaker verification,a multi-metric learning speaker verification method based on deep neural network is proposed.At present,the method based on deep metric learning has achieved certain results in speaker verification,but in the case of poor signal environment and large background noise,the accuracy of speaker verification is not ideal.The problem is that existing methods are often based on a single metric learning method,considering that the information is not comprehensive.Therefore,this paper proposes a multi-metric learning method,which is mainly to use multiple metrics to learn the loss function to optimize the feature vector during the training phase,so that multiple loss functions with different weights comprehensively consider the difference between multiple samples in the same batch.Experiments on two large public datasets(Vox Celeb1 and Vox Celeb2)show that this method significantly surpasses the previous single-metric learning method.(2)Aiming at the problem that the traditional triple loss function and its variants are weak in the generalization ability of person re-identification tasks,this paper proposes a new triple-batch-center loss function.In each batch,first obtain the center point of the same label data,and then constrain the distance from each sample point to the center of its own category and the distance from the center point of the non-self category,because each center point contains information of multiple samples,So the information considered is more comprehensive,and at the same time,it can better maintain a smaller intra-class distance and a larger inter-class distance.The experimental results on three large-scale data sets(including Market-1501,Duke MTMC-re ID,and CUHK03)fully show that the deep network features obtained based on the training of triple-group batch center loss have better generalization and discrimination ability.(3)To further improve the performance of person re-identification in complex scenarios,a loss function based on bulldozer distance(Wasserstein distance)is proposed.Existing metric learning methods often only consider distance optimization between single data points or batch data points,and our target task is often distance optimization between different categories.The well-known bulldozer distance is a measurement method to measure the difference of the distribution of different categories,so we introduce the bulldozer distance into the person re-identification task,which makes the difference between the samples of different categories increase.As an aid,this method,combined with existing metric learning methods,can further increase the generalization of the model and the distinguishability of feature vectors.In terms of application,this method is also a special case of multi-metric learning methods.Although in terms of data form,sound and image have obvious differences,a large number of experiments show that our proposed method has obvious effects on person re-identification and speaker verification tasks.Based on the deep metric learning technology,this paper studies the problem of speaker verification and person re-identification in two fields of computer vision and speech processing,and two different data forms of image and speech,and proposes three novel loss function construction methods.
Keywords/Search Tags:Deep metric learning, Speaker verification, Person re-identification, Triple-batch-center loss, Wasserstein distance, Biometric recognition
PDF Full Text Request
Related items