| Urban landmarks are spatial feature that are more salient than their neighbors in structural aspect,cognitive aspect or appearance.Urban landmarks play a significant role as spatial references and are important in spatial cognition and way-finding.As present landmarks are mainly extracted manually,which is time-consuming and label-intensive.More and more scholars begin to pay attention to the automatic extraction of landmarks but the current extraction methods cannot satisfy our need of navigation.This study introduces a random forest classifier in urban landmark extraction and extracts landmarks from basic geographic information databases.Since the urban feature database is highly imbalanced,in which the number of non-landmarks is much greater than the number of urban landmarks,this can present an low classification accuracy for urban landmarks.To solve the problem of low recognition rate for urban landmarks,this paper focuses on two aspects,which are data and algorithms,to reduce the influence of imbalanced data on classifier.This paper select 15 salience indicators for urban features to construct a feature space.These indicators are available from basic geographic information databases or social sensing data,and can be divided into three categories: structural,cognitive and perceptual.From the point of data balance,concerning the imbalance of the urban POI dataset,Random oversampling(ROS),SMOTE and ADASYN are applied to reduce the data imbalanced rate.We apply random forest algorithm for extracting urban landmarks after getting three balance datasets using ROS,SMOTE and ADASYN.To determine the best feature set,we evaluate the importance of each feature and perform tests on the different combinations of features.The results show that the improved algorithm based on oversampling preforms well in urban landmark extraction: the recall and AUC of the results is above 90%,and ROC is the best method to oversampling datasets.In addition,we obtain the best combination of indicators for the model,which can help reduce the difficulty of data collection.Form the point of algorithm design,for improving the precision of minority,this paper proposes a cost-sensitive random forest.The class distribution is added to the cost function,and corresponding weight is given to each sample according to its spatial scales.To improve the random forest,each decision tree is weighted by the classification performance.The results show that comparing with random forest and cost-sensitive decision tree,the cost-sensitive random forest gains higher precision classifying: recall and AUC is above 90%.Additionally,this method is suitable for small dataset,which can help reduce the manual marking workload. |