Research On Multi-modal Fusion Speaker Recognition Based On Audio-visual Data

Posted on:2022-06-25

Degree:Master

Type:Thesis

Country:China

Candidate:Y M Li

Full Text:PDF

GTID:2518306485959319

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Today,with the rapid development of information technology,how to identify a person's identity quickly and accurately,and ensure its information security is a task that must be studied.Although the technology of single-modal has been widely applied to various scenarios,it still has some disadvantages,such as low security and susceptible to environmental interference.In order to solve this disadvantage,the identification technology based on multi-modal fusion has become a research hotspot,which is considered to be the future direction of identification.Based on the voiceprint and face modalities,this paper studies the biometric recognition technology of multimodal fusion,and discusses its adaptability to the environment.The main work of this thesisis as follows:Firstly,the fusion methods and strategies of multi-modal data is researched,the advantage of multi-modal fusion identification technology in recognition accuracy is analyzed.Based on Vox Celeb2 data set,deep residual network(Res Net)and bidirectional gating loop unit(Bi-GRU)are used for feature level fusion of audio-visual data,the end-to-end voiceprint recognition,face recognition and multi-modal fusion are realized respectively.Through the comparison and analysis of the experimental results,it is concluded that the accuracy of multi-modal fusion is 17.55% and 2.12% higher than that of single-modal voiceprint recognition and face recognition,respectively.Secondly,the performance of multi-modal fusion identification system in noisy environment is studied.By adding different degrees of noise to the original data,and comparing the performance of single-modal and multi-modal fusion identification in the noise environment in the experiment,it is concluded that the accuracy of multimodal fusion identification under noise data is improved to varying degrees compared with single-modal.

Keywords/Search Tags:

multimodal fusion, voiceprint recognition, face recognition, neural network, end-to-end model

PDF Full Text Request

Related items

1	Research On Multimodal Fusion Of Voiceprint And Infrared Face Recognition
2	Design And Implementation Of Identity Recognition System Based On Combination Verification Of Face And Voice Pattern
3	Research On Voiceprint Recognition Based On Speech Feature Fusion
4	Enhanced Voiceprint Recognition For Birds By Information Fusion
5	Research On 3D Face Recognition Based On Convolutional Neural Network
6	Research On Emotion Recognition Based On Multimodal Fusion
7	Research On Key Technologies Of Voiceprint Recognition In Household Scenarios
8	Research On Biological Characteristics Recognition Based On Deep Learning
9	Research And Implementation Of High Recognition Rate Voiceprint Recognition Technology Based On Convolutional Neural Network
10	3D Face Recognition Base On Multimodal Fusion