| With the rapid development of Internet technology and the popularization of mobile terminal equipment,more and more geographic information data related to spatial location appears on the Internet.These data are also used in various fields such as map navigation,environmental monitoring,urban planning,land use,and emergency management,etc.Points of interest are geographic data that are closely related to the production and life of the public,and are also the support for location services and related applications.People usually use public comment,Weibo and other mobile applications to upload and publish information on points of interest such as food,hotels,scenic spots,etc.,resulting in a large number of POI data that are frequently updated in the network.How to classify and utilize these POI data and tap the potential value has become the focus and difficulty of geographic information data analysis research.Since the names of POIs best reflect the subject information,this thesis uses the names of POIs to identify and determine their categories.For the problems of sparse feature vector and simple semantic relationship in POI name text,the automatic classification of POI data is realized by improving the BERT model and establishing the ERNIE-RCNN model.At the same time,eight types of POI data in Chengdu,Sichuan Province were selected for parameter optimization and debugging,and compared with BERT-RCNN,BERT-CNN,Word2vec-RCNN,Word2vec-CNN ERNIE-CNN,ERNIE-RNN,ERNIE-RCNN,ERNIE-FC and other algorithms to verify the validity of the model in this thesis.The main research contents of this thesis are as follows:(1)POI name text feature extraction based on TF-IDF.In view of the feature that the POI name text contains topic type feature words,TF-IDF is used to extract the feature words in the POI name text,so that the feature words related to the POI topic can be selected and assigned more weights,and provide the input parameters of the feature vector for the next step POI classification model.(2)POI classification using an improved BERT model.Aiming at the problems of sparse POI feature vector,simple semantic relationship and weak contextual connection,the BERT word vector model is improved,and a POI name classification method based on the ERNIE-RCNN model is proposed.First,the high-quality POI name vector representation with contextual semantic information is obtained by using the bidirectional Transformer structure and fine-tuning of the ERNIE model,and then sent to the RCNN model structure to obtain the deep semantic information of POI through the bidirectional recurrent neural network.Finally,use the Softmax classifier to classify the POIs.(3)Verify the effect of POI automatic classification.In order to verify the effectiveness of the model in this thesis,eight types of POI data in Chengdu,Sichuan Province are selected for experiments.The experimental results show that the method model proposed in this thesis has an overall classification accuracy of 95.65% in the eight types of POI data.The classification accuracy rate is over 91%,and the F1 value is over 92%,and the classification performance is better than that of other comparison models.It effectively shows that the ERNIE-RCNN model proposed in this thesis can achieve good results in the automatic classification of POI,which is beneficial to the standardization and management of network geographic information data. |