| With the rapid application of intelligent analysis and decision-making algorithms such as object detection,target tracking,and target recognition in video surveillance scenarios,video surveillance is developing in the direction of intelligence.With the advancement of science and technology,the price of video surveillance equipment is becoming more and more affordable,and the number of monitoring equipment is also increasing geometrically.Due to the increase in the computational complexity of video surveillance algorithms,the demand for hardware resources for video surveillance has also increased.Therefore,studying how to reduce the complexity and model size of video surveillance-related algorithms while ensuring the performance of detection and tracking has become a problem that needs to be studied in the field of deep learning algorithms.Starting from the detection and tracking algorithm in the video surveillance scenario,this paper studies and changes the lightweight of the human detection and tracking algorithm to realize the lightweight detection and tracking system with good accuracy and fast speed,and the main work completed is as follows:(1)By experimentally comparing the parameter quantity and complexity FLOPs of ordinary convolution,deep separable convolution and ghost convolution,it is found that ghost convolution can reduce the number of parameters and has better feature extraction ability.In order to improve the detection speed of the detection model and not have a great negative impact on the detection effect of the detection model,a CSP-Ghost Net detection feature extraction network is built based on the actual application by using ghost convolution.In addition,a self-channel attention is proposed to compensate for the negative impact on accuracy caused by the reduction of detection network parameters,and the crosscorrelation characteristics between channels are used to enhance the target features and improve the performance of the network,and the effectiveness of the proposed method is verified by experiments.(2)In the video surveillance scenario,multi-target tracking is used,and in order to improve the inference speed of the tracking model,a lightweight Ghost Re ID feature extraction network is built by using ghost convolution.In order to increase the feature discrimination between targets,a feature fusion method is proposed,which integrates the texture features and semantic features of the shallow layer of the targets.Since pedestrian targets will be deformed and occluded during movement,the new and old feature fusion method is used to fuse the features of multiple time frames together when the features are updated,so as to improve the stability of the tracking model.And the effectiveness of the proposed method in this paper is verified by experiments.(3)According to the improved detection algorithm and tracking algorithm,the pedestrian detection and tracking system in the video surveillance scene is completed,and the input image is preprocessed by Gamma correction.Users can load input videos in a variety of ways.And use the collision method to count the flow of people within the monitoring range. |