| In recent years,with the development of deep learning,the application scenarios of artificial intelligence have gradually become close to our life.The understanding of urban street scenes and surveillance scenes requires accurate semantic segmentation of video images,which has become one of the current research hotspots.In order to improve the precision of semantic segmentation for urban street view,this paper proposes an improved semantic segmetation model based on Deep Labv3+ by studing semantic segmentation methods based in deep learning,and tries the algorithm on the platform of city-level monitoring quality evaluation system.The main work contents and innovation are as follows:(1)Solve the data imbalance problem of Cityscapes data set.By adjusting the loss function of the model,the segmentation precision of the model is optimized.In this paper,the imbalance of the number of data sets categories was first balanced by considering the weighting of the cross entropy loss function of the original model.After comparative tests,focal loss was used to replace the cross entropy loss function,so as to solve the imbalance between the number of data set categories and the training difficulty at the same time.Experimental results show that the performance of weighted focal loss is better than that of weighted CE loss,and both of them have better performance than CE loss in evalution set.(2)Make adjustments to the input features of the Decoder.In this paper,the shallow features of Decoder input are adjusted from 1 to 3,which are the deeply separable convolutional layer output from entry_flow/block2、entry_flow/block3 and exit_flow/block1.Subsequently,these shallow features are adaptively fused by ASFF,and then concatenate with the output of Encoder in the channel direction.Experimental results show that using ASFF in the fusion process can imporve the segmentation pecision.(3)Strengthen the feature extraction capability of backbone networks.The channel attention mechanism is used to optimized the addition process of the residual structure of the backbone network.SE structure is used to re-calibrate the residual branch before the addition operation,in order to strengthen the main features and suppressing the secondary features.Experimental results show that the backbone network with SE-block can effictively imporve the segmentation accuracy.In summary,by using focal loss as the loss function,using ASFF to optimize the decoder structure and using SE-block to strengthen the residual structure,the miou of the improved deeplabv3+ model proposed in this paper has achieved 85.20% on the Cityscapes evaluation set.Finally,the algorithm is tested in a city-level monitoring quality evaluation system platform project,and the results are good. |