Today,with the rapid development of information technology,how to identify a person's identity quickly and accurately,and ensure its information security is a task that must be studied.Although the technology of single-modal has been widely applied to various scenarios,it still has some disadvantages,such as low security and susceptible to environmental interference.In order to solve this disadvantage,the identification technology based on multi-modal fusion has become a research hotspot,which is considered to be the future direction of identification.Based on the voiceprint and face modalities,this paper studies the biometric recognition technology of multimodal fusion,and discusses its adaptability to the environment.The main work of this thesisis as follows:Firstly,the fusion methods and strategies of multi-modal data is researched,the advantage of multi-modal fusion identification technology in recognition accuracy is analyzed.Based on Vox Celeb2 data set,deep residual network(Res Net)and bidirectional gating loop unit(Bi-GRU)are used for feature level fusion of audio-visual data,the end-to-end voiceprint recognition,face recognition and multi-modal fusion are realized respectively.Through the comparison and analysis of the experimental results,it is concluded that the accuracy of multi-modal fusion is 17.55% and 2.12% higher than that of single-modal voiceprint recognition and face recognition,respectively.Secondly,the performance of multi-modal fusion identification system in noisy environment is studied.By adding different degrees of noise to the original data,and comparing the performance of single-modal and multi-modal fusion identification in the noise environment in the experiment,it is concluded that the accuracy of multimodal fusion identification under noise data is improved to varying degrees compared with single-modal. |