Font Size: a A A

Research Of Chinese Address Standardization Based On ALBERT

Posted on:2022-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:S Q SunFull Text:PDF
GTID:2518306476990739Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Address is not only a data resource to describe the spatial relationship of objects,but also an important spatial representation in human social behavior.In the era of big data,the production of address data is increasing every day.These address data can link different fields,which is beneficial to social governance and operation.However,there is a lack of a unified standard for address planning in China,which leads to the lack of a complete standard system for the use of Chinese address.Therefore,the standardization of Chinese address is necessary.Based on the above purpose,this thesis attempts to build a Chinese address standardization system and carries out the following research work:1.Research on Chinese address segmentation model.Because there is no natural level segmentation of Chinese address,Chinese address segmentation becomes the premise of Chinese address research.In this paper,we use the pre training model ALBERT to generate the word vector,and use the fine tuning strategy to construct the access network layer by using the long-term memory network and the conditional random field model,so as to design a new Chinese address segmentation model ALBERTBi LSTM-CRF.So the problem of Chinese address segmentation is transformed into the problem of NER.This method reduces the training amount of the model and greatly optimizes the word segmentation ability of the model.2.Design and implementation of Chinese address standardization system.In order to solve the problem of using Chinese address,a lot of work has been done in this paper.The system takes Chinese address word segmentation as the core,converts Chinese address into structured data,constructs Chinese address standard library using existing standard address,and designs a set of efficient Chinese address matching algorithm to standardize the original address,so as to make better use of address data.Experiments show that the accuracy rate,recall rate and F1 value of the Chinese address segmentation model proposed in this thesis reach 97.88%,97.94% and 97.91%respectively,while the accuracy rate of Chinese address standardization method in different types of address data sets is more than 92%.Compared with the traditional method,the effect is obvious.
Keywords/Search Tags:Chinese address segmentation, Chinese address standardization, Albert, BiLSTM-CRF
PDF Full Text Request
Related items