| Urban village is the product of China’s rapid urbanization process,which mainly consists of low and crowded buildings.There are problems such as poor environmental quality,backward municipal supporting facilities and so on.Therefore,how to combine the current fully convolutional model based on deep learning and high-resolution remote sensing image resources to realize the automatic extraction of vector contour of urban village buildings is a problem worthy of discussion for the intellectualization of urban planning management business.However,the current remote sensing image segmentation algorithms based on deep learning can only obtain the pixel mask of urban village buildings,and the segmentation result does not contain direct information such as location and quantity.In addition,the redundant models not only slow down the processing speed,but also reduce the classification and positioning accuracy,which limits practical value of these algorithms.In view of the above problems,based on the experimental data from Gaofen-2 image in Hangzhou in 2017,this research divides the large-scale remote sensing image,and marks the shape masks and positioning rectangular boxes of buildings in urban villages in the image slices.At the same time,data augmentation is applied to complete the production of the remote sensing data set of buildings in the urban villages.Then,based on the current anchor-free and single-stage object detection algorithm and instance segmentation algorithm based on pixel modeling,an end-to-end vector polygon full convolution extraction model named Polar Mask-UV(Polar Mask for Urban Villages)is designed.In this algorithm,the polar representation is used to replace the traditional pixel representation to describe the target contour,and the heavy pixel-wise prediction task is simplified to the problem of instance center classification and sparse distance regression.Firstly,residual network and pyramid network are used for multi-scale information fusion and extraction for rich semantic and spatial information;Then,the focal loss is introduced into the classification branch to strengthen the classification ability of the instance center point,and the category confidence of each instance center point is obtained;On the other hand,the distance center-ness is output by the center-ness branch to obtain high-quality positive samples;Secondly,the polar loss is used for sparse distance regression in location branch,which outputs distances of contour points;Finally,the confidence and contour parameters of each instance are integrated to obtain the boundary coordinates,which can form the vector contour of the buildings in urban villages by connecting the boundary points in sequence.Experimental results show that in terms of classification result,the improved classification branch can effectively mitigate the category imbalance and reduce false positive and false negative,which achieve 1.5% gain in average precision.In the aspect of positioning result,the improved feature fusion network can restore the boundary details more completely and finely,and average precision is increased by 3.1%.By optimizing ray uniformity and position of instance center,the positioning branch further upgrades the mask quality,which refines centerness by 11%.Compared with other pixel-wise modeling methods,it has the advantages of simple processing,accurate classification and accurate positioning with 7.6% gain in average precision,and can directly obtain the vector boundary without any post-processing operation.In the aspect of model lightweight,the polar representation effectively simplifies the processing of segmentation problems.Compared with other methods,the designed algorithm can realize rapid learning convergence and real-time detection,which reduces the model volume by 1/2 and increases the processing speed by 2 times.The research results can provide intuitive and accurate building location,shape and quantity information for the planning and management of urban villages. |