Font Size: a A A

Research On Multimodal Fusion Of Voiceprint And Infrared Face Recognition

Posted on:2022-11-09Degree:MasterType:Thesis
Country:ChinaCandidate:T K ZhiFull Text:PDF
GTID:2518306746451954Subject:Computer technology
Abstract/Summary:PDF Full Text Request
At present,biometric authentication technologies such as voiceprint recognition,iris recognition,and face recognition have made great progress and have been widely used.However,these biometric authentication technologies still have some problems,such as low security and insufficient system robustness.To solve these problems and achieve the goals of information security and authentication accuracy,we can adopt multimodal identity authentication.At present,multimodal identity authentication technology based on visible face and voice has attracted much attention.This authentication method significantly improves the recognition accuracy.However,the data privacy of visible faces is sensitive,and once leaked,it may cause very serious consequences.Therefore,starting from the idea of multimodality,this paper proposes a multimodal identity authentication method of voiceprint and infrared face.Compared with other multimodal systems,the multimodal fusion of voiceprint and infrared face has the characteristics of weak privacy,high security,and high precision.It is an ideal way at present.The main work of this paper is as follows:(1)Participated in the construction of the first multi-modal data set THS2021 of voice,infrared video,and visible video.The data set was recorded by the voice and language technology center of Tsinghua University.The number of speakers in the data set is 245,with 200 sentences recorded per person.The content includes three kinds of voice: Chinese,numbers,and English letters.At the same time,the visible video and infrared video of the speaker are recorded.(2)The performance of voiceprint recognition and face recognition on the THS2021 data set is studied.The voiceprint recognition baseline system uses the development set data training model of Voxceleb2.The network framework used is Resnet34.The EER of the THS2021 voice data tested on this model is 8.61%.The baseline system of face recognition uses the Inception-Res Net-v1 model pre-trained by VGG-face2 data.The EER of this model is 4.80% on the visible face image of THS2021 and 16.75% on the infrared face image.(3)Research on multimodal identity authentication of voiceprint and infrared face.Feature level fusion is used to splice voiceprint features and face features.On the whole THS2021 data set,when each person registers one face image and one voice,the multimodal fusion authentication EER of voiceprint and infrared face is 7.91%,the EER of voiceprint recognition is 8.61%,the EER of infrared face recognition is 14.75%,and the multi-modal fusion EER of voiceprint and visible face is 2.29%.The multi-modal fusion performance of voiceprint and infrared face recognition is better than the single modal performance,But it is weaker than the multimodal fusion authentication performance of voiceprint and infrared face recognition.On the THS2021 test set,when the number of face images and voice registered by each person is greater than or equal to 2,the multimodal fusion performance of infrared face and voiceprint is better than that of visible face and voiceprint after passing through the DNF model.The performance improvement of infrared face recognition needs further research.
Keywords/Search Tags:Voiceprint recognition, Infrared face recognition, Multimodal fusion, DNF model
PDF Full Text Request
Related items