Font Size: a A A

Research On The Analytical Method Of The Geographical Elements For The Chinese Address Of The Internet

Posted on:2017-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y H DuanFull Text:PDF
GTID:2348330512465149Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Due to the promotion and popularization of location information service,more and more enterprises will have their own address data and software functions to generate a location service application that can provide convenience for people,such as mobile phone map App.This process needs to be mapped to the geographical coordinates by a large number of Chinese address from the natural language described,which enables the accurate positioning of the electronic map,so as to provide convenience for people's information retrieval,query and positioning services.However,the Chinese address information obtained on the Internet is not complete,non standardized and other issues,namely,these address data are not organized according to the level of geographical elements.In order to establish the accurate mapping of spatial information and non spatial information,the study of the Chinese address obtained by the Internet is of great significance to the analysis and standardization of the Chinese address.The Chinese address obtained by the web crawler is the research object.Firstly,the conditional random field algorithm was proposed in this thesis.Four character words annotation was mainly used in this algorithm.and the conditional random field model was established to analyze the geographical elements in the address.Secondly,an algorithm based on multi factor to calculate the credibility of administrative division was proposed in this thesis,the main purpose of which was to identify the administrative divisions in the geographical elements of the address,the method first matched the multiple administrative divisions set by using a dictionary of administrative divisions,set position matching factor to different administrative divisions,then selected the best administrative divisionresults by using the mutual relationship between the various factors,and calculate the credibility of different administrative divisions.Finally,an improved algorithm based on conditional random field was proposed in this thesis,which can effectively identify the administrative divisions in Chinese address and other parts of the geographical elements.The method built feature library,made a word of the empirical transition matrix according to standard address corpus,extracted the feature word address string to form a random field,and found applicable to address elements analysis of expression by means of empirical transfer probability matrix to analysis the address string to be processed.However,due to the limited features that the feature library contains,for some of the characteristics of the frequency was not high,can not be a good judge.But for Chinese features in a font containing address word,the algorithm can effectively identify the geographical elements.In this thesis,the three algorithms used in this paper were tested by different address database,and the final result was compared to the horizontal and vertical.A large number of experimental results showed that the multi factor algorithm and the improved conditional random field algorithm proposed in this thesis had good effect,can effectively divide the different geographical elements,and lay the foundation for the application and development based on the location.
Keywords/Search Tags:Geographical elements, Conditional random field, Multi-factor Algorithm, Empirical transfer matrix
PDF Full Text Request
Related items