Font Size: a A A

Research On Deep Feature-Level Fusion Of Face-Audio Multimodal Personal Identification

Posted on:2020-11-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y H LiuFull Text:PDF
GTID:2428330590963048Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Multimodal biometric recognition,making use of more than one type of biometric features to recognize person,has achieved a more accurate and reliable result than that only using a single type of biometric features.Specifically,multimodal biometric recognition in terms of face and audio has been an active research area because it has advantages in acquisition,accuracy and safety.In recent years,deep learning methods have achieved state-of-the-art feature extraction ability,and its end-to-end learning pattern is very helpful to multimodal feature fusion.Therefore,this thesis aims to explore the feature-level fusion of face and audio data for recognition by using deep learning methods.The main contributions could be listed as follows:(1)We propose a feature fusion and recognition model of face and audio that combines multimodal CNN(Convolutional Neural Network)and RNN(Recurrent Neural Network).Firstly,we explore how to use CNN to fuse face features and audio features for recognition,and we design 4 different kinds of multimodal CNN architectures and compare them in experiments.The reason why multimodal CNN could extract discriminative fused features lies in that CNN has an advantage in feature extraction and the parts of feature extraction and fusion could learn to fit for each other by training the network.Moreover,we propose a method to combine multimodal CNN and RNN so that we can make use of sequential information to get a more accurate recognition result.The specific practice is to divide audio-visual data of a period of time into some frames and extract the fused features by multimodal CNN then put the sequence's fused features into RNN.Besides,we propose three different methods of using RNN to classify sequential features and we also compare them in experiments.(2)We propose a feature fusion and recognition method of face and audio that is based on attention mechanism.Firstly,for feature extraction,we use ResNet that is the advanced version of CNN to extract face features and LSTMs(Long Short-Term Memory Networks)to extract audio features so that the amount of parameters would be reduced.For feature fusion,we use attention mechanism to comprehensively analysis a sequence's face and audio features and get the attention weights of the fused features in the sequence.Then we use the attention weights to transform the fused features in the sequence so that the discriminative ones will get more weights and the noisy ones will get less weights by doing this the effects of noisy features could be alleviated.Moreover,we compare different feature fusion methods and the structures of the networks by carrying experiments.Finally,we explore how to apply the model to both real-time and sequential face and audio feature fusion for recognition.The experiments demonstrate that our model is suit for both real-time and sequential face and audio feature fusion for recognition.
Keywords/Search Tags:Face and Audio, Feature Fusion, Multimodal Recognition, Deep Networks, Sequential Recognition
PDF Full Text Request
Related items