Font Size: a A A

The Design And Implementation Of Segmentation Engine For Chinese Place Names

Posted on:2016-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:J Y ChenFull Text:PDF
GTID:2298330467493770Subject:Software engineering
Abstract/Summary:PDF Full Text Request
According to survey,80percent of the urban information involves geographic location. However, most of the data do not have coordinates or specific Chinese address, which has become a major obstacle in the process of the urban mapping/informatics. The technology of geocoding combines spatial and non-spatial data together. The Chinese address segmentation engine is the key component of the geocoding process. It bridges the gap between Chinese geocoding engine and the Chinese address information. The Chinese address segmentation engine improves the data integration in urban mapping. It visualizes the data and assists decision-making process and thus simplifies the management job. Therefore, this thesis investigated the current segmentation technique, designed and implemented a new technique that conforms the Chinese address real-world situation.This paper investigated the design and implementation of the Chinese address segmentation engine, specifically, the main work carried out are as follows:(1) Designed a database of Chinese addresses. Through analysis and research of a large number of Chinese address name and reference to features of China’s administrative region, all the names of addresses were classified using hierarchical rules. Combined with the current internet world library resources and mainstream storage technology, a new knowledge based database for all addresses was designed.(2) An improved two-way maximum matching algorithm for knowledge based database for all places is proposed. According to the type of the Chinese addresses, an improved address segmentation algorithm is proposed based on the knowledge based database, which added spatial judgment and level recognition model in the forward and reverse maximum matching algorithm. The new algorithm simplified the matching logic of place name and the improved address segmentation accuracy.(3) A caching strategy was proposed, which can directly store the urban code, level information and the place address together in the cache. Combining mainstream cache technology, the urban code, level information and the place address were stored directly together in the cache. Besides, the hash table technology was used to read knowledge based of place address quickly. Both the implementation process of segmentation engine and the performance of segmentation engine were improved.(4) A Chinese address segmentation engine was designed and realized. The experiment result shows that the initialization time for library of national place address is about35seconds, and the accuracy and availability of the Chinese address segmentation engine are96.5%and99.99%, respectively, which reached a practical level.
Keywords/Search Tags:segmentation, knowledge base, word library, place address, forward/reverse maximum matching algorithm, cache
PDF Full Text Request
Related items