Font Size: a A A

Research On Multi-modal Fusion Speaker Recognition Based On Audio-visual Data

Posted on:2022-06-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y M LiFull Text:PDF
GTID:2518306485959319Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Today,with the rapid development of information technology,how to identify a person's identity quickly and accurately,and ensure its information security is a task that must be studied.Although the technology of single-modal has been widely applied to various scenarios,it still has some disadvantages,such as low security and susceptible to environmental interference.In order to solve this disadvantage,the identification technology based on multi-modal fusion has become a research hotspot,which is considered to be the future direction of identification.Based on the voiceprint and face modalities,this paper studies the biometric recognition technology of multimodal fusion,and discusses its adaptability to the environment.The main work of this thesisis as follows:Firstly,the fusion methods and strategies of multi-modal data is researched,the advantage of multi-modal fusion identification technology in recognition accuracy is analyzed.Based on Vox Celeb2 data set,deep residual network(Res Net)and bidirectional gating loop unit(Bi-GRU)are used for feature level fusion of audio-visual data,the end-to-end voiceprint recognition,face recognition and multi-modal fusion are realized respectively.Through the comparison and analysis of the experimental results,it is concluded that the accuracy of multi-modal fusion is 17.55% and 2.12% higher than that of single-modal voiceprint recognition and face recognition,respectively.Secondly,the performance of multi-modal fusion identification system in noisy environment is studied.By adding different degrees of noise to the original data,and comparing the performance of single-modal and multi-modal fusion identification in the noise environment in the experiment,it is concluded that the accuracy of multimodal fusion identification under noise data is improved to varying degrees compared with single-modal.
Keywords/Search Tags:multimodal fusion, voiceprint recognition, face recognition, neural network, end-to-end model
PDF Full Text Request
Related items