Font Size: a A A

Research On Fine-grained Action Recognition Algorithm Based On Deep Learning

Posted on:2021-05-14Degree:MasterType:Thesis
Country:ChinaCandidate:L LiFull Text:PDF
GTID:2518306503972939Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the development of the Internet and communication technology,video data generation is becoming faster and faster,and applications are becoming more and more popular.Using computer to assist people to analyze video has a wide range of applications.As an important sub-field in the topic of video analysis,action recognition is of great value.The development of neural network is in full swing,and behavior recognition algorithms based on deep learning have become the mainstream research direction in this field.Most mainstream datasets and network structures rely heavily on spatial information in video data and are not suitable for fine-grained behavior recognition tasks.There are many application scenarios for fine-grained action recognition,such as intelligent scoring of gymnastic movements and detection of autistic children.This paper proposes a series of methods to improve network performance for autism detection datasets with slight differences.First,according to the characteristics of the dataset,we tried two video preprocessing methods,segmentation foreground and manual cropping method.The former is based on the video object segmentation algorithm,which can separate the specified targets in the video and eliminate the interference of the complex background in the video.The latter is relatively limited to this dataset.By artificially cropping a part of the video,the local area in the video is enlarged,and the important area is determined artificially by the network.Experiments prove that the latter is indeed helpful for action recognition tasks.In terms of network design,this paper also proposes several improvements for fine-grained difference videos.First,based on the soft attention mechanism,this paper proposes a video-level action recognition network.The network inputs complete video information,which provides more information input to the network than clip-level videos.Through the attention network,attention is paid to the separable regions in the video to improve the network recognition ability.The experimental results show that the network improves the network performance compared with the baseline.Then the article discussed the feature fusion method.The bilinear pooling layer has achieved many results in the field of fine-grained image classification.The algorithm better encodes the detailed texture features of pictures by extracting the second-order features of the feature map,and improves its ability to distinguish fine-grained images.Then,this paper introduces the bilinear pooling layer into the field of action recognition,uses the bilinear network fusion feature to extract the clip-level video features obtained by the network,and then encodes the extracted second-order features through the following LSTM network to encode the complete video timing information.For the bilinear pooling algorithm,a series of comparative experiments are carried out in this paper.The final experimental results show that the bilinear pooling algorithm effectively improves the recognition effect of the network.In this paper,based on fine-grained difference video dataset,using soft attention mechanism and bilinear convergence network,a fine-grained action recognition network is designed.Under this dataset,the network performance is better than the traditional algorithm in terms of average accuracy and detection accuracy all have improved.
Keywords/Search Tags:action recognition, fine-grained, C3D, LSTM, soft attention network, bilinear network
PDF Full Text Request
Related items