Font Size: a A A

Research On POI Acquisition Technology For Web Text Geographic Information

Posted on:2018-11-21Degree:MasterType:Thesis
Country:ChinaCandidate:F ZhaoFull Text:PDF
GTID:2348330542972259Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
POI(Point of Interest)is the data set that used to express ground object.Each of POI contains four aspects: name,category,longitude,latitude.Full rich of POI is the necessary resources of location-based-services(LBS).But there is a huge contradiction between the industry's growing service demand and the traditional inefficient collection.The traditional acquisition mode can't meet the demands of the POI because its slow speed and long update cycle.And now network resources are rich of the POI information.Therefore,this paper is devoted to study on the technology of collecting POI from web text.The main research of this paper includes the following three aspects:First of all,this paper studies the topic web crawler technology to crawl the POI-information-related Web pages which contain the POI information.In order to obtain the web pages which are highly correlated to POI information,this paper adds the structural information of web content to the traditional web page feature vector generation algorithm to enhance the feature vector's ability to express the web page content,which is also used to improve the crawling web page Accuracy.In addition,Inorder to improve the web crawler's discovery rate of POI-sensitive web page,this paper adds the automatic adjustment function of theme thrshould and theme vector into the process of theme judgement.Secondly,through the study on the conditional random field model and its application in the process of Chinese named entity recognition,we accomplish the identification of the name of POI and place with using conditional random field model.Since the identified string of names can not be used as address information,based on the analysis of the address structure,this paper proposes an address recognition method based on the address model,which can be used to identify the address information from the string of names.Finally,the association between POI name and address is studied.In general,a POI-sensitive Web page often contains multiple POI names and addresses,how to correctly link them together is essential for the successful acquisition of POI information.Based on the structural characteristics of web pages,this paper proposes a weighted statistical method and a path matching method to solve the problem of POI name and address association.In addition,because it is difficult to obtain the latitude and longitude information of the POI from the Webpage,this article uses the services provided by Google Map to get the latitude and longitude information of the POI through analyzing the name and address successfully associated with the POI.The aquired information is then combined into the follow format to save: "POI name,address,longitude,latitude".In this paper,based on the above theoretical research,we develop the system which is used to POI acquisition experiment,thus accomplish the collection of POI data from the web pages.
Keywords/Search Tags:POI, Topical crawler, Conditional Random Field Model, Path matching
PDF Full Text Request
Related items