Font Size: a A A

Video-oriented Target Detection And Behavior Recognition

Posted on:2021-09-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y SongFull Text:PDF
GTID:2518306548956349Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In daily life,video surveillance is everywhere.Squares,railway stations,residential areas,traffic roads and other common places are distributed with countless cameras of all sizes.Video surveillance can realize crime prevention,traffic control,accidents and other functions,and plays an increasingly important role in maintaining social security.This article is based on video content analysis technology to conduct research on target detection and behavior recognition in surveillance video,to detect,track and identify behaviors of emerging targets.The main work and results of the paper are as follows:In target detection,pedestrian detection in monitoring has complex backgrounds,multiple target scales and poses,and occlusion be-tween people and surrounding objects.As a result,the yolov3 algorithm is inaccurate in detecting some targets,which may result in false detection,missed detection,or repeated detection.Therefore,on the basic of yolov3's network,using the residual structure idea,the shallow features and deep features are upsampled and connected to obtain 104 ? 104 scale detection layers.The bounding box size obtained by k-means algorithm clustering is applied to the network layers of various scales to increase the detection effect of the network on multi-scale and multi-pose targets.At the same time,the yolov3 loss function is updated by using the prediction box's repulsive loss on other surrounding targets to improve the detection effect of occlusion between targets.The experimental results prove that the proposed network model has better detection effect than the yolov3 algorithm on the MOT16 dataset,which proves the effectiveness of the method.Aiming at the problem of unbalanced distribution of spatio-temporal information in video images,this paper proposes a 2D/3D hybrid convolutional network that introduces attention mechanism,which fully captures video space information and dynamic motion information,and better reveals motion features.With the help of the dual-stream convolutional network structure,we built 2D convolution and 3D convolution parallel neural networks.In the 2D convolutional neural network,the residual structure and the LSTM network model are used to focus on the spatial feature information of the video behavior.Secondly,the 3D convolutional neural network constructed by Inception structure is used to extract the spatiotemporal feature information of video behavior.On the basic of the two high-level semantics extracted,the attention mechanism is introduced to fuse the features.Finally,the obtained significant feature vector is used for video behavior recognition.Compared with other network models on the UCF101 and HMDB51 datasets,it can be seen from the results that the proposed 2D/3D hybrid convolutional network has good recognition performance and robustness.Due to unreliable detection in multi-target tracking and occlusion within the category,the data association is ambiguous during trajectory matching,which affects its tracking effect.Therefore,this paper proposes a multi-target tracking algorithm based on highperformance detection and fusion of appearance,motion and shape information for matching and association.First,in the detection stage,the improved yolov3 network model is selected to perform target detection on the data set,and high-performance detection results are obtained.Secondly,the constructed wide residual network model is used to extract the feature of the detection result to obtain the feature vector with appearance information and location information.Finally,multi-target tracking is achieved by calculating the appearance,motion and shape similarity of feature vectors for trajectory matching.Experimental results prove that the proposed multi-target tracking model has good tracking effect on the MOT16 data set,and proves the effectiveness of the method.
Keywords/Search Tags:target detection, action recognition, multi-target tracking, deep learning, video understanding
PDF Full Text Request
Related items