Font Size: a A A

Research On Semantic Understanding For Action Recognition

Posted on:2020-06-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ZhaoFull Text:PDF
GTID:1368330623955850Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid development of the network and the widespread application of surveillance equipment,limited human resources are increasingly struggling with massive image data.Therefore,more and more researchers expect to use the action recognition technology to detect and recognize human gestures,movements and expressions in images or videos,so that machines can intelligently analyze,study and imitate human action.Therefore,the recognition and understanding of human action in images and videos has gradually become a hot topic in the field of computer vision.In the process of human action expression,there are many obstacles,such as apparent morphological differences,difficulty in modeling non-rigid deformation,heavy partial limb occlusion,and semantic gap between the low-level visual features and the high-level semantic features,and so on.As a result,it is difficult to extract robust and effective spatiotemporal representative feature by human action recognition techniques.At the same time,it is very significant to understand context information and interaction environment for interactive human action recognition.It is difficult for existing algorithms to construct models with semantic analysis and logical understanding.Therefore,how to construct a robust spatiotemporal feature expression and an effective semantic understanding model is the key point for machine to recognize human actions.In recent years,along with the development of machine learning and computer vision,action recognition technology has made some progress in both academia and industry.However,there are still some problems: 1)How to extract a robust local spatiotemporal features is the most basic and important in action recognition tasks;2)How to extract the context information from spatiotemporal features,and how to construct an effective semantic analysis model is the key problem in the task of human action recognition;3)How to explore the temporal relationship among spatiotemporal features and understanding the interactive pattern of humanhuman,human-object,and human group is a breakthrough to improve the interpretability of action recognition.In view of the above problems,we have studied the action recognition and understanding for visual image data from three aspects,and the main contributions and innovations of our method are as follows:(1).Visual reconstruction for multi-source image data.Since previous work paid little attention to the complementary relationship among various image data,this paper proposes a novel video data structure to simulate human vision system by eliminating static appearance redundancy,enhancing spatial structure information and motion trajectories expression.Aiming at the problem of missing spatial structure information in videos,a multi-task video segmentation algorithm based on sampling estimation and multiplicative iteration is proposed,and a synthetic spatial depth data based on trajectory density is also proposed;Aiming at the problems of static appearance redundancy and insufficient expression of motion information,a novel spatial-optical data structure that integrates optical flow data with synthetic spatial depth data is proposed,which reduces the interference of static appearance redundancy on motion information and improves the feature expression in terms of spatial and temporal information.The proposed algorithm improves the accuracy by about 17% compared with the traditional algorithm.(2).Multi-level semantic parsing modeling.Because the inconsistent mapping between low-level visual features and high-level semantic information,this paper proposes a bottom-up multi-level semantic parsing model.Aiming at the problem that it is difficult to extract the local spatiotemporal feature in behavioral actions,an action recognition model based on semantic feature and cross classification is proposed,which enhances the semantic information of local spatiotemporal features and realizes the fine classification of semantic feature;Aiming at the complex temporal logic of human action in long sequential videos,a spatiotemporal feature expression and high-level semantic analysis algorithm based on the combination of three-dimensional convolutional neural network and recurrent neural network is proposed,which effectively analyzes the logical relationship in long sequential videos and improves the accuracy of action recognition.The proposed algorithm achieves 90% accuracy in the action recognition datasets.(3).Object-oriented human action understanding.Due to the lack of logic analysis from micro movements to macro action,the most of existing models have an underlying ability to explain action occurrence.This paper proposes a novel action understanding algorithm based on multi-level semantic attribute detection and recognition.Action recognition techniques based on video classification have a low-level interpretability and can be disturbed by environment easily.This paper proposes a fall detection data set for indoor scenes which provides a data support for action recognition.And we propose a novel fall detection method by extract multi-level pyramid appearance features at the same time.The proposed method realizes the semantic properties from the view of the object oriented detection and explores the process from micro movements to macro action,and it improves the interpretability between machine vision and human vision.The proposed algorithm achieves approximately 90% detection accuracy and real-time detection performance of 35 FPS.
Keywords/Search Tags:Action Recognition, Deep Learning, Object-oriented Modeling, Semantic Understanding, Computer Vision
PDF Full Text Request
Related items