| Automatically extracting road networks from very high resolution(VHR)images is a classical task in remote sensing field,which has played important roles in many real applications,such as urban design,geo-referencing,vehicle navigation,intelligent transportation system and geo-spatial data integration.However,it’s extremely time-consuming and tedious to manually annotate road networks from VHR images.Based on the current deep learning methods,this paper presents a novel method,which is a multi-task and end-to-end convolutional neural network,to simultaneously extracting road surface,road edges and road centerlines from VHR images.During designing the model architecture,we apply the strategy of deeply supervised nets to provide integrated direct supervision for each convolutional stage.It helps to learn multi-scale,multi-level features for our tasks.Then,the feature map of each convolutional stage are concatenated to make the final predictions.During the training,the three sub-tasks are holistically trained in a cascaded model.Considering their correlation of these tasks,we adjust the model architecture elaborately and make it be effective and low-computation.As for the loss functions,we use a re-weighting cross-entropy loss function to deal with the imbalanced distribution issue.Considering the over-fitting problem of re-weighting method,a construction loss function is applied to alleviate the issue.Hence,the model seems to generate single-pixel width road edges and centerlines even without applying non-maximum suppression.During the post-processing,we crop the large VHR images into small image patches with overlapping.The predicted maps are stitched into panoramas with our proposed bilinear blending method,which solves the inconsistent issue in the overlapping regions.We introduce a rough and simple user interaction method,in which users can use a single-pixel width brush to provide additional in formation for the model,to obtain desired predictions in the challenging regions covered with shadows and occlusions.Finally,we apply a RANSAC-based method to fit the road centerlines and optimize the results via some topology constraints.In order to verify the effectiveness of our method,we build a benchmark dataset,which consist of a series of VHR images with 0.21m spatial resolution.It provides detailed manually annotation of the mentioned three tasks that will helps the following researchers in this field.Experimental results show that our proposed method is superior to several current state-of-the-art convolutional neural networks for semantic segmentation in computer vision field.Specially,our method achieves the best performances in both our proposed dataset and the other dataset built by former researchers. |