| With the continuous innovation of intelligent equipment,the implementation of many industries requires massive amounts of video information as a basis.Compared with ordinary images,video information contains great value.Dynamic,sustainability and relevance are all unique characteristics of video,and mining video data based on these characteristics will gain great value.The research on video action recognition is an important research topic in the field of computer vision and pattern recognition.Action recognition not only has important scientific significance,but also has very critical practical application value,which has a great impact on our life.The main application areas of action recognition include remote recognition,virtual reality,social video recommendation,autonomous driving,face recognition,video surveillance.As an important and challenging subject in computer vision and pattern recognition,researchers continue to explore in-depth video action recognition and have made certain research progress.Early research mainly focused on video action recognition methods based on traditional machine learning.With the emergence of superior deep learning,current action recognition research mainly focuses on designing effective deep learning models to achieve accurate action recognition.In the existing deep learning-based video action recognition methods,3D CNN fully learns the time derivation between consecutive frames,and has good performance in exploring spatiotemporal features,but it also adds computational cost.To solve this problem,this paper proposes a new network.On the basis of the time-space grouping of the feature channels,the time group is shared with the weight to constrain the collaborative learning features,plus a square pooling module,and named it as Video Action Recognition based on Spatiotemporal Grouping and Cooperative Network.The SGCN network has the following three advantages:(1)The SGCN network is grouped based on channel characteristics and can effectively model time and space information respectively.(2)The model uses three 2D convolutions instead of 3D convolutions to collaboratively learn spatiotemporal features,which greatly saves computational costs and realizes parameter sharing,greatly reduces many parameters,and improves model efficiency.(3)SGCN adds an energy model to temporal grouping to increase feature correlation with multiplicative features,and further learns temporal motion relationship features through square pooling operations to further improve the effectiveness of SGCN in action recognition tasks.In order to verify the effectiveness of the SGCN network proposed in this article,this article trains and tests it on the classic video action recognition databases Something-Something V1 and Something-Something V2,and compares it with other current video action recognition methods with superior performance.Contrast.The experimental results show that the SGCN network proposed in this paper can achieve better results,and its performance is generally better than other comparison methods. |