The acceleration of the informatization construction process has led to the explosive growth of image and video data,and thanks to the rapid improvement of computer computing power,the use of deep learning algorithms to process large amounts of image and video data has become one of the current hot researches.The continuous update and optimization of convolutional neural network has made great progress in general object detection research and behavior recognition research.Therefore,it is worth exploring and discussing the realization of pedestrian detection and identification of behavior categories in real complex environments.The deep learning algorithm relies on super strong representation ability and rich feature description,which greatly improves the accuracy and execution effect of pedestrian detection in multiple fields and complex backgrounds.Relying on the high accuracy of pedestrian detection,the understanding and recognition accuracy of pedestrian action is also greatly improved.At present,the research on the direction of pedestrian detection is faced with the problem that under the influence of complex and changeable backgrounds,the objects are occluded from each other and the different shooting angles lead to the deformation of the objects,which reduces the detection accuracy.At the same time,there are also problems that the deep network model will generate a large number of parameters and floating-point calculations,and the computer hardware needs to provide sufficient computing power,which cannot meet the needs of detection tasks in practical application scenarios.Aiming at some of the problems raised above,this thesis builds a lightweight pedestrian detection model under complex backgrounds.Its purpose is to efficiently extract feature information without sacrificing detection accuracy,reduce computation and time overhead,and adapt to detection tasks in practical application scenarios.The main idea is to build a YOLO v3 model based on the Darknet-53 framework.The YOLO v3 network can adapt to multi-scale object detection,and then prune the channels of the model to obtain a lightweight model.Finally,the model performance is verified on the self-built dataset.Among them,the self-built dataset contains scenes of daytime,dim night and rainy days with large differences in lighting,and the backgrounds are mostly intersections with dense pedestrian and vehicular traffic and neighborhoods with dense buildings.The model achieves m AP of 87.89%,Precision of 90.37%,and Recall of 77.05% on the proposed dataset.The results show that the lightweight model based on YOLO v3 proposed in this thesis is feasible for pedestrian detection and can efficiently complete the detection tasks.Based on the theoretical basis of pedestrian detection direction,in order to further interpret the motion information,the action recognition algorithm is also discussed.In this thesis,a residual network model based on the fusion of global average pooling and multi-scale feature extraction modules is proposed for the action recognition tasks.The main core ideas of the model are as follows:(1)The framework is based on the Res Ne Xt model and sets up a multi-scale feature extraction unit to fully capture the input feature information,enrich the feature dimension,and solve the problem of a single scale of feature information.(2)At the end of the network,to avoid the redundancy of network parameters,the global average pooling layer is used,which effectively integrates the global spatial features of the image without losing the detail information of the original image and avoids the problem of network overfitting.(3)In the model structure,different convolution calculations are set to adjust the dimensions.The improved algorithm in this thesis verifies its performance on multiple sets of recognition scenes,including UT-interaction dataset,UCF11 dataset,UCF101 dataset and CAVIAR dataset.The above scenarios to be recognized cover a variety of action categories,and the background is complex and variable.The recognition accuracy of the model in the above datasets reached 99.9%,99.5%,97.3% and 100%,respectively.According to the results,the model achieves a high level for each recognition task,which proves the excellent performance of the model. |