Font Size: a A A

Crowded Scene Analyze Algorithm Based On Deep Learning

Posted on:2019-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:P C LiFull Text:PDF
GTID:2518305891974679Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the development of Internet and communication technology,image and video data have become the mainstream form of transmission,crowd scene analyze is challenging task with distinctly importance in computer vision,it has great implementation prospect in anomaly detection and video structured retrieval.The categorical definition of crowd scenes is dependent on multiple levels of information,which leads to a great deal of uncertainty between the categories,while the dynamic nature of the different crowds also varies greatly.With the continuous development of deep learning methods in recent years,it is necessary to establish a powerful deep learning model with high accuracy and generalization ability.Three conditions need to be met: a large-scale group scene data set for deep model training,quantitative characteristics of the crowd's inherent property and can fit well in cross-scene condition,as well as a powerful deep neural network.In this paper,a spatial-temporal feature fusion algorithm based on two-stream Residual Network is proposed.Static appearance features and dynamic features are learned and aggregated by two-stream network model.The algorithm is based on 10,000 video segments of WWW dataset with multi-label attribute representation.Firstly,the video data stream is preprocessed,then residual neural network is used to extract the depth feature of static image,which is used as input of appearance stream of two-stream network.Meanwhile,the KLT algorithm is used to extract the trajectory descriptor from each video,to generate the K-NN topology map in each frame.It conforms to the Markov time-domain model,and the global quantitative features: collectiveness,conflict and stability can be calculated by the time-domain mathematical model.Collectiveness is the consistency of behavior in the neighborhood,and the stability based on the number of constant neighbors in the topological graph formed by the trajectories.Conflict is the velocity dependence between neighboring points.According to the calculation of the characteristics of the above three properties,the motion picture of each video is obtained as the input of the motion stream of the two-stream network.The two-stream network depth model is used to study the static characteristics and dynamic characteristics of the video simultaneously.Experimental results show that the training model of this algorithm has some advantages in the accuracy of group scene comprehension and good effect in the comprehension of complex scenes.In this paper,a residual deep features with long-term recurrent network based algorithm is proposed.Inspired by the end-to-end network training method,this algorithm effectively utilizes the image features of spatial domain extracted from the residual network,then uses the LSTM to extract the dynamic information in the time domain,reduces the overfit risk through the Dropout layer and utilize Softmax classification.This paper discusses in extraction and optimization strategies of CNN features in the framework,including the selection of activation function and pooling methods.In the feature extraction and optimization,we conducted a complete experimental comparison and analysis of Res Net,Google Net and VGGNet.The experimental results show that the proposed algorithm improves the training performance and computing speed,and has better recognition accuracy for crowd scene tasks and has stronger generalization in across scenarios.
Keywords/Search Tags:Deep learning, residual neural network, scene analyze, crowd inherent property, long short term memory, two-stream network
PDF Full Text Request
Related items