Font Size: a A A

Research On Chinese Gazetteer Services Based On Trie-tree Index

Posted on:2018-06-30Degree:MasterType:Thesis
Country:ChinaCandidate:J Y DongFull Text:PDF
GTID:2370330623950633Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Nowadays,with the rapid development of computer technology,the construction of smart city and digital city is also in full swing.Geographic information system has gradually become one of the indispensable important aspects of people's lives.As the basic geographical element of urban digitization and geographic information system,service of names and addresses is of great significance to the development of geographic information and has a wide application in people's life.Therefore,how to improve the efficiency of gazetteer retrieval and make the search result more accurate becomes one of the focuses of geography information research at the present stage,which is also the main research content of this paper.In order to retrieve the Chinese gazetteer,it is necessary to segment the Chinese gazetteer phrase firstly.Because there is no space between Chinese words in phases,Chinese word segmentation is more difficult.According to the characteristics of the Chinese names and address phrases,the paper studies the existing Chinese word segmentation methods,designs the division dictionaries such as the administrative division proper noun dictionary,the word segmentation classification dictionary and the synonym dictionary,and establishes the accurate and efficient word segmentation rules.We complete the word segmentation by means of feature processing,ambiguity processing and synonymy processing,and provide the method of gazetteer dictionary correction,so that the accuracy of word segmentation can be improved.On the basis of accurate word segmentation,it is necessary to establish a suitable and efficient indexing mechanism to improve the efficiency of gazetteer retrieval.Trie-tree is an efficient prefix matching index,which is suitable for building in the names and addresses service.The method of trie-tree index based Chinese gazetteer phrases is realized,which increases the efficiency of names and addresses retrieval.At the same time,a series of other fuzzy matches,such as fuzzy prefix matching and pinyin matching,are realized based on the trie-tree index of Chinese names and addresses,so that the service can be used for fuzzy matching and then improve the diversity of search besides it is suitable for accurate retrieval.Finally,according to the breadth-first algorithm,the search results are sorted by relevance and accuracy to improve customer satisfaction.In order to make full use of hierarchical information in the gazetteer data and provide better experience for users in the gazetteer retrieval,we realize the classification retrieval of names and addresses in the service.We set up the grade structure of three levels according to the space hierarchical information of the names and addresses,and set up the classification structure of three levels according to the hierarchical semantic information.According to select the specific level and category,the user can limit the scope of the geographical names and addresses search,and then reduce the redundant information.In order to realize the classification retrieval of geographical names and addresses,the hierarchical data structure is designed based on postgreSQL database to ensure the efficiency of the search service.Based on the word segmentation rules,the search index and classification structure,the prototype system of the names and addresses service is designed and implemented.The prototype system is based on B/S structure and web server and we show the logical framework design of it in the paper.On the basis of realizing the function of gazetteer retrieval,the prototype system also provides other basic functions of names and addresses service,such as the management of gazetteer data,reverse retrieval and frame retrieval,which provides users with a variety of retrieval options and improves system availability and friendliness.Now the key technology of the system has been used in the digital Xiangxi system.In conclusion,we have done a series of related work of gazetteer retrieval,and realized a names and addresses service.We did research on many aspects such as how to improve the efficiency of retrieval,how to make the results more accurate and how to make the service friendlier.A prototype system is finally designed and realized.
Keywords/Search Tags:gazetteer retrieval, ambiguity removed word segments, Chinese trie-tree index, grade and classification retrieval, proper system
PDF Full Text Request
Related items