Chinese address name recognition transforms the user’s information into geographic coordinates and is one of the most important functions in modern auto navigation software, geographic information and location based services system. Baidu map, Gaode map and other domestic companies mostly use "forward maximum matching" or "reverse maximum matching" word segmentation algorithm to address, and could be able to get the correct value when the address exists in the database. But when the user enters with some mistake he/she cannot get satisfactory results or even cannot get one result. In this paper, I infer the user’s intention by the address dictionary, using natural language processing technology that greatly increase the address matching robustness and intelligence.The raw address database from the supplier is fixed only. First of all we should ensure that the input can be properly matched to the address when it is in the original database. I created a Trie tree data structure by using the original address database and did word segmentation by dynamic programming method that increase the accuracy and efficiency greatly.When the user’s input does not exist in the address database in the case, the above method will fail. In order to solve this problem, this paper created word position tagging(B-begin, E-end, M-middle, S-single) for the whole address database, then created hidden Markov model and trained the model parameters. According to the performance, this paper used the Viterbi algorithm which is based on dynamic programming. All the doings increased the scope of word segmentation.After the address segmentation, we still need to determine the type for each word and know the word belongs to the province, city, county, street or the point of interest and to correct the use’s input by referencing the address database. This paper created a n-gram based inverted index file which could be fast index related to a set of wor ds. Then get the maximum matching by calculating the similarity and sorting candidates. Word segmentation and matching are two most important parts for address recognition. How to deal with these two steps is the key point of this paper. |