Font Size: a A A

Deep Feature Fusion And Attention Models For Video Action Recognition

Posted on:2018-11-21Degree:MasterType:Thesis
Country:ChinaCandidate:S C ZhaoFull Text:PDF
GTID:2348330542477858Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The action recognition of video contents which includes computer vision,pattern recognition,artificial intelligence and other disciplines is not only an important research subject in the field of computer vision,but also a very challenging hotspot at present.With the rapid expansion of video data,the classification of video contents has attracted a large number of people's attention.Besides,this task has important theoretical directive significance and extensive commercial value in many scenes,such as surveillance video,behavior detection,abnormal event warning and virtual reality.In the early stage,the common method in the action recognition adopts the bottom features of frames of videos to make quantitative processing which can be divided into three steps:(1)the preprocessing of video frames and the extraction of local features;(2)the pooling and the quantitative processing of features;(3)the training of classifier based on the quantized features.Compared with other traditional approaches of tackling the task of action recognition,the use of the improved dense trajectory features and Fisher quantization is one of the best ways to achieve the best performance among multiple published data sets.However,this traditional features has many problems,such as too large storage space,too long extraction time and cannot meet the real-time requirement and suffer the bottleneck of performance enhancements.With the explosive growth of the amount of data and the sudden emergence of deep learning in the field of computer vision,the method of deep learning has improved in many other fields of computer vision compared with traditional methods.However,the field of action recognition is a special case.The application of deep learning is relatively slow in the classification of action recognition.This is mainly due to the small amount of existing video data which has labels and the complexity of videos compared with images,so it's very hard to train a well performed network which is suitable for videos contrasted with images.This phenomenon depicts the complexity of video action recognition,while the method of deep learning has its own unique advantages,so the researchers are looking forward to a higher breakthrough in the application of action recognition with assiduous exploration.The task of this paper is to deal with the classification of video action recognition based on the deep learning framework.In the meantime,with the latest research this paper proposes two approaches:(1)the method of action recognition integrating the traditional visual features with the deep features that extracting with the deep framework;(2)the deep learning recognition method based on the attention model.Clearly,the first method makes full use of the temporal information extracted by the traditional features and the scene information extracted by the depth features,and effectively combines the complementary information to improve the accuracy of action recognition.The second method is to explore the deep learning network with the further step and design the attention model to deal with the action recognition task based on the convolution neural network.Compared with the traditional method,the proposed method of this paper has the advantages of faster implementation,smaller feature space and higher performance,and it has been verified in several released datasets.
Keywords/Search Tags:Action Recognition, Deep Learning, Convolutional Network Attention Model
PDF Full Text Request
Related items