Research On Human Behavior Recognition Based On Visual-Semantic Relationship

Posted on:2019-05-17

Degree:Master

Type:Thesis

Country:China

Candidate:T Y You

Full Text:PDF

GTID:2428330596465399

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Along with the breakthroughs of deep learning technologies in recent years,the problems of behavior recognition based on computer vision has received widespread attention and achieved considerable progress.Behavior recognition method based on computer vision has a wide application prospects in fields such as security monitoring,medical monitoring,human-computer interaction,automatic driving and unmanned shops.At present,most behavior recognition methods can only recognize the behavior of a single person and can only recognize a limited number of behavior categories such as walking,running and falling.They cannot detect a large number of interactions between humans and environmental objects in the scene.In the complexity scene with acute background changes,behavior recognition methods that use man-craft features usually have poor robustness against environmental changes,object deformation and occlusion,leading to low recognition accuracy.Additionally,because the amount of information of the image data to be processed is large,most current behavior recognition methods based on computer vision have high computational complexity and cannot achieve real-time calculation performance.To solve the above problems,the major research works in this thesis are listed as follows:(1)For the problem of behavior recognition in videos,a Long-Short Term SpatioTemporal Visual Model(LSTVM)combining three-dimensional convolutional neural network and recurrent neural network is proposed.The method uses a threedimensional convolutional neural network to extract short-term spatial-temporal visual features and then feeds the generic short-term behavioral features into an improved recurrent neural network to extract specific long-term behavioral features.The experimental result shows that the LSTVM method achieves 87.6% accuracy on the UCF101 dataset.(2)For the improvement of interactive behavior recognition accuracy in videos,the optimization problem of interactive behavior recognition is researched based on the research work in(1)and a Long-Short Term Spatio-Temporal Visual Model with Human-Object Visual Relationship(HOVR-LSTVM)is proposed.The method uses an object detector based on convolutional neural network to obtain the semantic and spatial locational information of humans and objects,and then constructs semanticspatial locational features to fuse with the short-term spatial-temporal visual features.The experimental result shows that the HOVR-LSTVM method improves the accuracy to 92.5% on the UCF101 dataset,outperforming other state-of-the-art methods.In addition,the HOVR-LSTVM method has lower computational complexity compared with other methods based on optical flow information and the calculation speed is 125.2 frames/sec,achieving faster-than-real-time recognition performance.(3)For the problem of human-object interaction detection,a Visual-Semantic Model with Attention Mechanism(VSM-AM)is proposed to detect multiple humanobject interactions simultaneously in an image.The method includes the following three aspects: Firstly,an object detector based on convolutional neural network is used to obtain the semantic and spatial locational information of humans and objects,and a method of 3-channel spatial locational pattern is proposed to construct human-object spatial locational features;Secondly,a convolutional neural network is used to extract generic visual features of humans and objects,and an Attention Network(AN)is proposed to construct the spatial visual features;Thirdly,a word embedding method is used to encode the semantic information of objects into semantic features,and an action classifier fusing semantic features is proposed to classify the interaction behavior.The experimental result shows that the VSM-AM method achieves mean average precision of 21.30% and Top-3 recall rate of 56.9% on the HICO-DET dataset,outperforming other state-of-the-art methods.In addition,the calculation speed of the VSM-AM method is 7.8 frames/sec,achieving real-time detection performance.

Keywords/Search Tags:

behavior recognition, deep learning, computer vision, relationship detection

PDF Full Text Request

Related items

1	Computer Vision Object Relationship Detection Based On Deep Learning
2	The Abnormal Behavior Detection Of ATM Operation On Computer Vision
3	Research On Computer Vision Based Detection And Recognition Of Dynamic Human Gestures
4	Research On Out-of-distribution Detection Algorithm Of Vision Based On Deep Learning
5	Research And Implementation Of Human Behavior Recognition Technology Based On Surveillance Video Stream
6	Research On Behavior Recognition Algorithm Based On Deep Learning
7	Research Of Circular Marker Recognition Algorithm Based On Deep Learning
8	Research On Human Behavior Recognition Based On Series Matching
9	Research And Implementation Of Object Grasping Recognition Algorithm Based On Computer Vision
10	Research On Human Dangerous Behavior Recognition Method Based On Deep Learning