Font Size: a A A

Research On Extraction Of Web Textual Geographic Information

Posted on:2018-10-10Degree:MasterType:Thesis
Country:ChinaCandidate:X F ZhangFull Text:PDF
GTID:2348330542990829Subject:Engineering
Abstract/Summary:PDF Full Text Request
The Internet has a wealth of geographical information,so the Internet has become an important source of access to geographic information.The traditional method of obtaining geographic information must rely on a large amount of manpower and time,The traditional way to obtain geographic information must rely on a large number of manpower and time,which led to the geographic information related industries update cycle is relatively long,access to information limited situation.This paper focuses on the large amount of unstructured geographic information in the web text,a scheme of using the cascading model based on Conditional Random Fields(CRFs)and Support Vector Machine(SVM)is proposed to obtain the structured geographic information,Finally obtain the <geographic entity,attribute,attribute value> triple geographic information.The experimental results show that the accuracy rate of geographic information extraction is 83.98%,the recall rate is 82.03%,and the value of F is 82.99%,which proves the effectiveness of the method.The main research work includes the following two aspects:(1)In the first layer model,the attribute and attribute value of the geographic entities are regarded as two named entities,and the attributes and attribute entities are identified together with the geographic entities based on the CRFs model.Taking into account some geographical entities,attribute entities with regularity and stability,the dictionary features are added when constructing the geometric entities and attribute entity feature templates.Finally,the efficient of CRFs model used in this paper to identify the geographic information entities and the effectiveness of the dictionary features are proved by comparing the experimental results.(2)In the second layer model,For the geographic information entities(geographic entities,attribute entities,attribute value entities)identified by the upper layer,it can not determine the correspondence between the geographic entities and the attribute entities,attribute entities and attribute value entities.Therefore,the SVM model is used to judge the geographic information Whether there is a correspondence between entities(geographic entity,attribute entity,attribute value entity).In addition,considering the large number of feature items obtained after pretreatment of geographic information corpus,there may be some useless and redundant information.In this paper,the information gain feature selectionalgorithm in other research is borrowed,and its characteristics in geographic information entity relation Choose the lack of perfection.Finally,the efficient of SVM model used in this paper to extract the geographic information entity relationship,and the improved information gain feature selection algorithm is more effective are proved by comparing the experimental results.
Keywords/Search Tags:geographic information entities, SVM, CRFs, entity recognition, relational extraction
PDF Full Text Request
Related items