The vehicle detection system with high speed and high accuracy can not only help traffic commanders to strengthen the management of road traffic system,but also quickly extract the information of accident vehicles in traffic accidents and other emergencies,so as to improve the efficiency of dealing with emergencies.It can be seen that improving the accuracy and speed of vehicle detection has a great signification and application value to strengthen the urban road traffic management system.This paper deeply studies the Mask R-CNN network model,improves its backbone network and Ro I Align,and proposes the CA-PS Mask R-CNN network model to complete the detection task of different vehicle types(car,bus,truck)with urban road as the actual background.The major improvements are as following:(1)A new backbone network Res Net59-FPN-CA is proposed: Firstly,based on Res Net50,a sixth stage is added to extract the feature map P6 in FPN.The size of the feature map in the fifth and sixth stages is 1/16 and 1/32 of the input image respectively,and the residual structure is replaced by the hole convolution residual structure,which not only increases the receptive field and resolution of deep features,but also can obtain more accurate positioning information.At the same time,the channel number of the feature map output in the fifth and sixth stages is reduced to 256,which reduces the number of network parameters and the running time of the network model;Secondly,the channel attention network(CANet)is introduced into the FPN network.The feature map is adjusted channel by channel,and the discrimination ability of the important features in the feature map is improved,so that the subsequent RPN network can make full use of these feature maps.(2)With the increase of Res Net network layers,the ability of Mask R-CNN network model to perceive the target location information in the positioning task decreases.To solve this problem,this paper introduces a 588 dimensional Position Sensitive Score Map to improve the sensitivity of deep features to target location information,and uses PS ROI Align to pool to get 7*7*12 and 14*14*3Ro I,which reduces the number of channels in the Head structure and improves the detection speed of the model.The Mask R-CNN network model based on different backbone networks,the Mask R-CNN network model based on location sensitive score map of different dimensions and CA-PS Mask RCNN network model are trained on cityscapes dataset.The results show that the detection accuracy of the Mask R-CNN network model based on Res Net59-FPN-CA is significantly higher than that of the model based on other backbone network.And the Mask R-CNN network model based on 588 dimension Position Sensitive Score Map has better detection performance in the Mask R-CNN network model based on location sensitive score map of different dimension.The detection accuracy of CA-PS Mask R-CNN network model is improved to 87.76%,and the test time is shortened to128 ms. |