Font Size: a A A

Research On Multimodal Learning Method Based On Position Attention Module

Posted on:2021-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:K X LiaoFull Text:PDF
GTID:2518306047986359Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the deepening of deep learning research,various technologies in the field of Artificial Intelligence are gradually applied to social life.Person identification is one of the important applications.In practical application situations,person identification in videos has already become an urgent task.Compared with static image data,video data contains richer dynamic timing information,such as variable image information and audio information.Making full use of these different modal information can help with the identification in videos and that is an important research direction of multimodal fusion methods.This thesis proposes a multi-modal learning method based on the positional attention mechanism where the main idea is to have the system pay more attention to important modal information,while ignoring relatively unrelated modal information.The method mainly includes three parts: singlemodal feature optimization processing,multi-modal feature fusion and multi-modal model fusion.(1)The optimization of a single modal feature is helpful to the improvement of algorithm recognition accuracy.In reality,insufficient feature extraction algorithms and other problems may cause defects in the extracted character features.If the original character features are directly fed into the neural network for classification training,the final recognition result may fall flat.The optimization of single modal features can improve this situation.The method proposed in this thesis with an positional attention mechanism module can enhance similar features,suppress outliers,and thus improve the accuracy of person recognition.(2)The multi-modal feature fusion method can make full use of the various character modal information contained in the video data for various extreme situations.For person recognition tasks,we can focus on the use of facial features for research,and use features of the head,the body and audio signals as assistance to fuse multi-modal features into a new video feature.In the process of multi-modal feature fusion,we use the position attention mechanism module to conduct fusion experiments,which will amplify the influence of modal features that are more related to facial features,while features that are less related to facial features The impact will be suppressed.On the premise of ensuring that facial features are the main features,this method can fully use multi-modal information,and then improve the recognition accuracy of the algorithm.(3)The multi-modal model fusion method can solve the problem of missing persona features..In the process of multimodal model fusion,we usually train multiple weak classifier first and use them to solve the same problem.By combining multiple weak classifier,we may get a more accurate and robust model.When there is an extreme situation where certain video personal features cannot be extracted,this thesis will extract scene feature from the video.The scene features and the video features obtained through multi modal feature fusion will be fed into the neural network separately.Finally,the trained mode is fused with multi-modal models to improve the person recognition effect.This thesis will introduce a large-scale video data set named iQIYI-VID Data Set of 600 K video clips for multi-modal person identification.The experimental results on the iQIYIVID Data Set show that the proposed algorithm is an optimal algorithm comparing to others.
Keywords/Search Tags:Person identification, multi-modal learning, feature fusion, model fusion
PDF Full Text Request
Related items