Font Size: a A A

Research On Pedestrian Detection Algorithm Based On Reparameter Convolution Block And Encoder Prediction Head

Posted on:2023-10-22Degree:MasterType:Thesis
Country:ChinaCandidate:C MinFull Text:PDF
GTID:2568306917479194Subject:Engineering
Abstract/Summary:PDF Full Text Request
Pedestrian detection is the use of computer vision technology to determine whether there are pedestrians in images or video sequences and determine the location of pedestrians.Among them,target detection algorithm based on deep learning is the current research frontier.This thesis selects Anchor based YOLOv5 and Anchor free YOLOX as the research objects,and conducts research on small target undetected problem and occluded target undetected problem.The specific contents are as follows:(1)In general,small-scale pedestrian targets have a small range and are difficult to capture detailed features,which may easily lead to missed detection.The convolution block in YOLOv5 and YOLOX is a one-way structure composed of convolution layer,normalization layer and activation function layer.The feature extraction ability is not strong,which is easy to lead to missed detection of small pedestrian targets.In addition,the Swish activation function used by YOLOv5 and YOLOX is not smooth enough and space insensitive,resulting in reduced target positioning accuracy.To address this problem,two residual-based multi branch convolution blocks are designed to enhance the learning capability of the convolutional neural network and avoid overfitting.Moreover,in the reasoning stage,multi branch convolution blocks are re-parameterized and converted into single path convolution blocks(Reparameter Convolution Block,RepBlock)to reduce reasoning time and memory occupancy.Then,the Mish-FReLU activation function incorporating spatial conditions is used to improve the positioning accuracy of target detection and reduce the missed detection of small targets.In a word,this thesis applies RepBlock and Mish-FReLU activation function to YOLOv5 and YOLOX respectively to get Rep-YOLOv5 and Rep-YOLOX.The experimental results in the self built VOC dataset and Crowd Human dataset show that the average precision of Rep-YOLOv5 is increased by1.82 and 2.70 percent respectively compared with YOLOv5.Compared with YOLOX,the average precision of Rep-YOLOX is increased by 1.56 and 2.34 percent respectively.Moreover,Rep-YOLOv5 is suitable for video detection with many small targets,and Rep-YOLOX is applicable to small target detection scenarios requiring high accuracy.(2)The occlusion between pedestrians and other occlusions will lead to the reduction of pedestrian information,difficulty in feature extraction,and easy to cause missing detection.The feature fusion modules of Rep-YOLOv5 and Rep-YOLOX treat the features with different contributions equally,and ignore the relationship between image regions,resulting in the reduction of target detection accuracy in occluded scenes.In addition,Rep-YOLOv5 algorithm uses single threshold Non Maximum Suppression(NMS)to remove redundant boxes,which is difficult to deal with two target boxes with severe occlusion,leading to missing detection.To address this problem,this thesis adds an additional weight to each feature in the feature fusion module to learn the importance of different features.At the top of the feature fusion module,a Transformer encoder based on the self attention mechanism is incorporated to capture global information and rich context information,improve the accuracy of target recognition in occluded scenes,and improve missed detection.This thesis applies the improved feature fusion module(T-BiFPN)to Rep-YOLOX to obtain TRep-YOLOX.In addition,the D-NMS with double thresholds is introduced in this thesis,and the deformed hyperbolic tangent function is used for fractional attenuation in the occluded part,so that Rep-YOLOv5 can reduce the missed detection rate of occluded objects without increasing false detection.Improved feature fusion module(T-BiFPN)is combined with D-NMS and applied to Rep-YOLOv5 to obtain TRep-YOLOv5.The experimental results in the self built VOC dataset and Crowd Human dataset show that the average precision of TRep-YOLOv5 is 1.25 and 1.72 percent higher than that of Rep-YOLOv5 respectively.Compared with Rep-YOLOX,the average precision of TRep-YOLOX is improved by 0.98 and 1.55 percent respectively.TRep-YOLOv5 is suitable for video detection with serious occlusion,and TRep-YOLOX is applicable to occlusion scene detection with high accuracy.
Keywords/Search Tags:Pedestrian Detection, RepBlock, Mish-FReLU, T-BiFPN, D-NMS
PDF Full Text Request
Related items