Research On Deep Feature-Level Fusion Of Face-Audio Multimodal Personal Identification

Posted on:2020-11-14

Degree:Master

Type:Thesis

Country:China

Candidate:Y H Liu

Full Text:PDF

GTID:2428330590963048

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Multimodal biometric recognition,making use of more than one type of biometric features to recognize person,has achieved a more accurate and reliable result than that only using a single type of biometric features.Specifically,multimodal biometric recognition in terms of face and audio has been an active research area because it has advantages in acquisition,accuracy and safety.In recent years,deep learning methods have achieved state-of-the-art feature extraction ability,and its end-to-end learning pattern is very helpful to multimodal feature fusion.Therefore,this thesis aims to explore the feature-level fusion of face and audio data for recognition by using deep learning methods.The main contributions could be listed as follows:(1)We propose a feature fusion and recognition model of face and audio that combines multimodal CNN(Convolutional Neural Network)and RNN(Recurrent Neural Network).Firstly,we explore how to use CNN to fuse face features and audio features for recognition,and we design 4 different kinds of multimodal CNN architectures and compare them in experiments.The reason why multimodal CNN could extract discriminative fused features lies in that CNN has an advantage in feature extraction and the parts of feature extraction and fusion could learn to fit for each other by training the network.Moreover,we propose a method to combine multimodal CNN and RNN so that we can make use of sequential information to get a more accurate recognition result.The specific practice is to divide audio-visual data of a period of time into some frames and extract the fused features by multimodal CNN then put the sequence's fused features into RNN.Besides,we propose three different methods of using RNN to classify sequential features and we also compare them in experiments.(2)We propose a feature fusion and recognition method of face and audio that is based on attention mechanism.Firstly,for feature extraction,we use ResNet that is the advanced version of CNN to extract face features and LSTMs(Long Short-Term Memory Networks)to extract audio features so that the amount of parameters would be reduced.For feature fusion,we use attention mechanism to comprehensively analysis a sequence's face and audio features and get the attention weights of the fused features in the sequence.Then we use the attention weights to transform the fused features in the sequence so that the discriminative ones will get more weights and the noisy ones will get less weights by doing this the effects of noisy features could be alleviated.Moreover,we compare different feature fusion methods and the structures of the networks by carrying experiments.Finally,we explore how to apply the model to both real-time and sequential face and audio feature fusion for recognition.The experiments demonstrate that our model is suit for both real-time and sequential face and audio feature fusion for recognition.

Keywords/Search Tags:

Face and Audio, Feature Fusion, Multimodal Recognition, Deep Networks, Sequential Recognition

PDF Full Text Request

Related items

1	Research On Multi-modal Fusion Speaker Recognition Based On Audio-visual Data
2	Research Of Recognition On Feature Level Fusion Of Face And Iris
3	The Research Of Face And Iris Fusion Recognition At The Feature Level
4	Design And Implementation Of Identity Recognition System Based On Combination Verification Of Face And Voice Pattern
5	Face And Iris Fusion Research, And Identify A Number Of Issues
6	Multimodal Emotion Recognition Based On Deep Learning
7	Personal Identification Based On Video And Audio Feature Fusion
8	Research On 3D Face Recognition Technology Based On Deep Learning
9	Research On Affective Computing Based On Multimodal Fusion
10	Multi-biometric Fusion Recognition Technology Research