Font Size: a A A

Research On Video Motion Recognition Method Based On High-low Layer Feature Fusion And Convolutional Attention Mechanism

Posted on:2020-08-13Degree:MasterType:Thesis
Country:ChinaCandidate:J R WangFull Text:PDF
GTID:2428330590958277Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
With the popularity of computer networks and the deepening of smart city construction,video has become the largest data carrier in urban data.The difficulty of the video motion recognition technology is that the video contains more noise information and has two-dimensionality of time and space,which is more difficult and less accurate than image recognition.In this paper,with Temporal Segment Networks as the framework,several improved methods of video action recognition are proposed from the two aspects of enhancing feature expression ability and improving video content saliency.Firstly,two video motion recognition methods based on high-low layer feature fusion are proposed,which are top-down Feature fusion method and bottom-up feature fusion method.The motion recognition accuracy of the proposed methods has reached 93.9% and 94.5% on the UCF101(split1)test set respectively,which is 1.6% and 2.2% higher than the TSN model without feature fusion under the same conditions.Then,from the perspective of improving the saliency of video content,a video motion recognition method based on convolutional attention mechanism is proposed in this paper.An attention mechanism of fully convolved structure is designed to capture significant areas of video motion in this paper.The proposed attention mechanism structure has the advantages of reducing training difficulty and being easy to couple with multiple basic networks,compared with the existing attention mechanism structure based on recurrent neural network.Finally,the test accuracy after cross-validation on the UCF101 and HMDB51 is 95.0% and 71.6%,which is 0.8% and 2.2% higher than TSN without the attention mechanism under the same experimental conditions.Finally,based on the convolutional attention structure proposed in this paper,a video motion recognition method based on multi-level attention mechanism network is proposed.The method has the effect of accurately capturing the significant areas of the video action recognition in each level.The test accuracy is 94.4% and 72.0% on UCF101(split1)and HMDB51(split1)after multi-modal fusion,which is 0.1% and 0.4% higher than the video motion recognition method based on the convolutional attention mechanism and is 2.1% and 2.1% higher than TSN under the same experimental conditions.The improvement of the accuracy verifies the effectiveness of the proposed network in this paper.
Keywords/Search Tags:Video motion recognition, High-low layer feature fusion, Top-down, Bottom-up, Convolutional attention mechanism, Multi-level attention mechanism network
PDF Full Text Request
Related items