Font Size: a A A

Research On Approaches To Construction Of Knowledge Base In The Field Of Geography

Posted on:2018-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:F F XuFull Text:PDF
GTID:2348330542469380Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The geographical domain knowledge base is important to the study of the question answering system in the national matriculation examination(called GaoKao).The manual construction of the mass knowledge base costs a lot of manpower and time.Therefore,this thesis mainly studies the construction of the geographical domain knowledge base,including named entity recognition,attribute value extraction and comparative element extraction,which provide support for answering geographical questions in GaoKao.The main researches of this thesis include:(1)Geographical named entity recognition.Two different models are used to recognize two categories of entity,including geographical core terms and geographical location.Geographical features are employed for Conditional Random Fields to recognize named entity.In addition to investigating the effectiveness of word embedding on geographical entity recognition,learn word embedding from the large-scale unlabeled corpus,and combine base feature of word,as the input feature of Elman neural network.(2)Attribute value extraction of geographical entity.On the basis of recognized entity and given common attributes of the entity.Firstly,page content of the entity is collected from encyclopedia data.Secondly,string similarity algorithm is applied to extract the attribute value.Finally,attribute value is cleaned and checked to complete the attribute value extraction.(3)Geographical comparative elements extraction.Comparative keywords based method and class sequence rules based method are used for comparative sentence classification.Take the classified comparative sentences as basis,a new comparative elements extraction framework is proposed based on the Answer Set Programming(ASP)language.Firstly,the POS tags and dependency relations of words are represented as ASP facts.Secondly,existing comparative elements extraction rules are translated into ASP rules.Finally,the extraction rules are implemented automatically using existing ASP solvers.Experimental results verify the effectiveness of the proposed method.The results of experiment on the geographical entity corpus show that the F1 values of CRF and Elman neural networks based model both achieve above 77.69%on the two mentioned categories of entity.Additionally,the results of experiments on the geographical comparative corpus show that the ASP based method is not only efficient,but also has better experimental results than CRF based model on most of the comparative elements.
Keywords/Search Tags:Knowledge Base, Named Entity Recognition, Comparative Sentence Mining
PDF Full Text Request
Related items