Font Size: a A A

Identification And Extraction Of Place Names And Address From Webpage Text

Posted on:2018-07-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z B DuFull Text:PDF
GTID:2348330518497640Subject:Cartography and Geographic Information System
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, such as network, wireless communication and cloud computing,the society is striding forward the "big data" era,especially from the explosive growth of the Internet media information,which contains a huge amount of information had become an important data source of the geographic information industry. As the most important channel of information release, dissemination and communication,internet contains a wealth of geospatial information,which had become an effective complement to traditional geographic information collection.As to the characteristics ,such as variousness, randomness, interactivity and so on,of network information, it is difficult to identify and extract the information automatically,and convert into the data source of GIS tools for further statistics and analysis.Therefore,it is critical to identify and extract geospatial information contained in network to the effective use of"internet big data".Network information is often presented in the form of text.It is an effective solution to identify and extract the geospatial information which display in the form of place names and addresses of the network infonnation. The identification of place names and addresses refers to analysing the text information bases on semantic analysis technology to find the standard and non-standard place names and addresses information.the extraction of place names and addresses refers to utilizing the attributes of place names and addresses which represented in the mathematical form to extract the target address accurately.Based on analysing the characteristics of network information and place names and addresses, this paper presents a method of identifying and extracting place names and addresses from webpage text based on "Place names and address gene library". The results as follows:(1)The identification method of place names and addresses on webpage text.This study defined the conception of place names and addresses genes, presented the place names and addresses are constituted by place names and addresses genes,and constructed the "Place names and address gene library"of a certain region.This study Improved Chinese word segmentation algorithm which whith "Place names and address gene library"as the dictionary to identify the place names and address genes in the webpage text.Then restore the address string through combining place names and address genes as to the relevant rules, in order to achieve the purpose of identification.(2)The Extraction method of place names and addresses on webpage text.The traditional extraction method does not reach the purpose of extracting the target place name. This study tries to explain the inherent attribute of the place names and address mathematically. According to the attributes, event, position, length and word frequency, to generate extraction rule tree,and calculat the extraction index,according the extraction index to extract target address accurately .(3)This method has been proved experimentally,and feasible.It has good efficiency and accuracy rate, otherwise it was applied to the actual project,which achieved the Real-time acquisition and WEB front-end visual display of network information.
Keywords/Search Tags:Webpage text, Place names and addresses, Identification, Extraction, Place names and address genes
PDF Full Text Request
Related items