Font Size: a A A

Vehicle Detection In Traffic Videos Using Convolutional Neural Network And Inter-frame Information

Posted on:2022-09-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y N YangFull Text:PDF
GTID:1522307106467274Subject:Intelligent Transportation Systems Engineering and Information
Abstract/Summary:PDF Full Text Request
Vehicles are one of the basic components in a transportation system,and vehicle detection in videos is an important task in transportation video analysis.The intelligent video vehicle detection technologies provide guidance for traffic management,traffic safety,traffic decision making,and is of great significance to improve the level of intelligent traffic management.Due to the quality of traffic videos,vehicles scale,occlusion,lighting,etc.,there are many problems existing in video vehicle detection,such as low detection accuracy,low detection efficiency,etc.Aiming at the above problems,to improve the accuracy and speed of video vehicle detection,this paper focuses on the key technologies in video vehicle detection,and explores the way of applying inter-frame information based on the output and feature layers.The main research works and contributions are as follows:(1)In view of the problems that the static detector SSD(Single Shot Multi-Box Detector)has more false negatives and false positives when detecting vehicles in videos,this paper proposes a video vehicle detection method leveraging feature fusion and inter-frame optimization.Specifically,a feature-fused SSD detector is first proposed,which aggregate the deeper layer features into the shallower layers to enhance the feature representation of small vehicles.In the post-processing stage of the detection network,a tracking-based detections optimization(TDO)strategy is proposed to select detections,compensate missed and false detections.TDO strategy first selects the best results from the detected results by intra-frame matching,then links best results by inter-frames matching.Thus the false negatives and false positives can be compensated by the propagated results,and the confidence of the final results can be optimized in time domain.Experiments on the VHS(Vehicle in Highway Scene)dataset show that the feature-fused and inter-frame optimizing based method improves the mean Average Precision(m AP)of SSD by 8.2%,thus effectively reduces the missed and false detections in SSD.(2)Given that the output layer-based video vehicle detection methods focus on mining the inter-frame information in detection results and ignore the inter-frame information in convolutional features inside the network,these methods have limitations in improving detection accuracy.Aiming at this problem,this paper explores the way to improve the detection accuracy from the perspective of feature level,and proposes a method based on multi-scale feature and memory.Specifically,a multi-scale feature generation network(MFGN)is proposed to improve the detector’s self-adaptation ability to vehicle scale.MFGN generates features with two scales and predefines multi-cale anchors for each feature scale.Based on MFGN,this paper proposes a memory-based multi-scale feature aggregation network,which aggregates historical features with current features through two parallel Conv LSTM(Convolutional Long Short-Term Memory).The multi-scale feature and memory based method enhances the feature representation of the current frame from spatial and temporal dimensions,thus improving the vehicle detection accuracy.Experiments on the UA-DETRAC dataset show that the m AP of the multi-scale feature and memory based method is 7.4% higher compared to the corresponding static detector Faster R-CNN.(3)Since video frames are treated as independent input images,static detectors ignore the temporal information of vehicles when detecting vehicles in videos,generating redundant calculations in the feature extraction process.In view of the low detection efficiency problem caused by redundant calculations,a temporal attention based video object detection method is proposed for fast detection.The core idea of the method is to use a dynamic detection network to reduce the feature extraction time for most of the video frames.Specifically,the dynamic network performs time-consuming semantic feature extraction on sparse key frames.For dense non-key frames,a lightweight feature propagation network is used to estimate semantic features.As the theoretical basis of the feature propagation network,the feature temporal attention model analyzes the spatial self-attention of feature pairs to produce the attention weight,which is used for feature estimation.Furthermore,to further improve the detection performance,a new key frame decision method is developed to determine the attributes of the current frame by measuring the feature similarity.Experiments show that the temporal attention based method runs twice the speed of the corresponding static detector.Although there is a decrease in detection accuracy,it is small.This indicates that the method significantly improves the detection speed while maintaining detection accuracy.(4)In view of the importance of image features and Ro I(Region of Interest)features on detection results in two-stage static detection networks,this paper explores the way to enhance feature representation from both pixel and instance perspectives,and proposes a cascaded attention based network(CAN),which uses two cascaded feature networks to model image features and Ro I features,respectively.Specifically,this paper proposes a pixel-level feature optimization network to optimize the input frame features within frames to produce more detailed features.For the Ro Is,a priori-location based instance-level feature aggregation network(PIFAN)is proposed,which uses the Ro I features and detection results of previous frame to enhance the Ro Is features of current frame in two steps by an instance-level feature aggregation model.Ro I features global enhancement is first performed.Then the detection results of previous frame are used as priori positions to filter the best Ro Is of current frame,whose features are locally enhanced.To improve the detection efficiency,a cascaded attention and prediction based method(CAPM)is constructed.It introduces dynamic detection network into the CAN model and combines the accuracy and speed advantages of vehicle tracking algorithms in predicting vehicle position.Experiments on the Image Net VID_vehicle dataset show that CAPM is three times faster and has a 2.2% higher m AP than the corresponding static detector Faster R-CNN.The above results also verify the effectiveness of the cascaded attention model in improving detection accuracy.In summary,this paper focuses on the task of vehicle detection in traffic video analysis,and studies the key technologies of video vehicle detection from the perspectives of outputs,intra-frame feature representation,inter-frame feature calculation,key frame strategy.A Feature fusion and inter-frame optimization method,multi-scale feature and memory based method,temporal attention based method,and cascade attention and prediction method are proposed.Among them,the first one belongs to the output layer based method and the last three are feature layer based methods.The relevant methods proposed in this paper have high reference value for the decision making of intelligent transportation systems.
Keywords/Search Tags:Intelligent transportation, Vehicle detection, Convolutional neural network, Inter-frame information, Feature aggregation, Key frame strategy
PDF Full Text Request
Related items