Font Size: a A A

Research On Network Text Geographic Information Extraction And Credibility Evaluation Technology

Posted on:2019-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y YueFull Text:PDF
GTID:2428330548987381Subject:Engineering
Abstract/Summary:PDF Full Text Request
Information extraction replaces the user to find important content from a large amount of tedious text information.With the popularity of the Internet,the era of big data have already come.The richness of the textual information contained in the Internet has increased and the amount of information has increased dramatically.Faced with this situation,the research and development of information extraction technology is imminent.At the same time,it is necessary to determine the authenticity of information in the vast information database on the Internet.False information is not only unhelpful for people,but it also causes many unnecessary losses.The effective use of Internet geographic information can provide great convenience for people's travel.Effective geographic information extraction and effective reputation assessment also provide information sources for geo-information software,which saves a lot of manpower and financial resources and facilitates people's travel.In view of this,this paper studies the net text geographic information extraction and reliability assessment techniques.The main research work includes the following:(1)Through the conditional random field(CRF)model,identify geographical name entities,geographical attribute entities and geographical attribute value entities in the network text.In the process of entity recognition,external auxiliary vocabulary and place name database were used to construct the feature templates.The comparison experiments show that the uses of these features can effectively improve the effectiveness of entity recognition.(2)Combine of two geographical name entity,geographic attribute entity,and geographic attribute value entity in the same sentence,to find all possible entity combination pairs.There are four categories of relationships between composition pairs: geographical name and geographic attributes,geographic attributes and geographic attribute values,geographic name and geographic attribute values.The BP network is used to determine the relationship between pairs.Then the geographic information fusion algorithm is used to obtain the geographic information groups and store them in the database.The comparison experiments in the paper show that the LM algorithm in BP neural network has a better effect on the extraction of geographical entity relationships.(3)In order to assess the credibility of textual geographic information,a evaluationmethod based on search engine was proposed.First,the suspicious information appending with the search engine website's URL and send to the search engine server.Calculate the degree of similarity between the returned file and the suspicious information and the credibility of the returned web page,last measure the credibility of the suspicious information by combining the two.This method was used to evaluate the geographical information extracted from this paper,including 85.38% of trusted information,9.10% of suspicious information,and 5.63% of untrusted information.
Keywords/Search Tags:information extraction, Classification technology Conditional random field, BP neural network, information credibility, text similarity
PDF Full Text Request
Related items