| Semantic segmentation is a pixel-level image classification task.Compared with other computer vision tasks such as image classification and object detection,semantic segmentation can provide richer semantic information.With the rapid development of deep learning in recent years,semantic segmentation algorithms based on deep learning have been widely used in areas such as autonomous driving,defect detection,smart agriculture,and medical image analysis.However,the current semantic segmentation algorithms based on deep learning often require large amounts of computation,which limits the application of the algorithms on platforms such as embedded devices with limited resources.This thesis first analyzes the existing video semantic segmentation model based on optical flow method,and then designs a lightweight real-time video semantic segmentation model,and then proves the efficiency of the proposed model through experiments on the street view datasets,finally,the inference performance of the model on the embedded devices is further optimized.The research content of this thesis is as follows:1.A real-time video semantic segmentation model based on feature propagation is proposed.The existing video semantic segmentation model based on optical flow method has several problems:(1)It takes too much time to use the off-the-shelf optical flow network to predict the optical flow information between frames.(2)The optical flow network needs to be pre-trained and then jointly trained with the semantic segmentation network.(3)The low-level features rich in spatial detail information are not used.(4)The dynamic scheduling of key frames is not implemented.Aiming at the problem that the optical flow estimation takes too long and the optical flow networks needs to be pre-trained,this thesis proposes an efficient feature propagation module,which can quickly predict the flow field information between frames and perform feature propagation,and can be directly embedded in the semantic segmentation network for end-to-end training.Aiming at the problem that low-level spatial feature information is not used,this thesis proposes an efficient feature update module to extract low-level features in each frame,and the extracted low-level features can improve the segmentation accuracy of the model after fusion with high-level features.Aiming at the problem of not realizing the dynamic scheduling of key frames,this thesis implements a key frame dynamic scheduling mechanism based on the flow field value threshold.This mechanism enables the model to determine key frames according to the changes between video frames.In addition,this thesis proposes a feature fusion module with a dual attention mechanism,which can further improve the segmentation accuracy of the model.2.An ablation experiment was carried out on the CamVid dataset,and the effectiveness of each module in the designed model was proved one by one.Then experiments were carried out on sevaral datasets to evaluate the inference speed and accuracy of the model,and through the feature map visualization,it was intuitively proved that the designed feature propagation module can effectively propagate the high-level features of the key frame to the non-key Frames,and align the high-level features with the low-level features of non-key frames,so that the high-level features have a stronger ability to express spatial information.Finally,the key frame scheduling experiment proved the effectiveness of the designed key frame dynamic scheduling mechanism based on the flow field value threshold.3.Based on Jetson Nano embedded device,the inference performance of the model on the embedded platform is optimized.First,each module of the model is quantized and compressed independently through the TensorRT inference engine,and then the best combination of parameter accuracy of different modules is found,so that the model’s inference accuracy and inference speed reach the best balance,finally,the actual running speed of the model is further improved by using the designed multi-threaded asynchronous pipeline architecture. |