Automatically Chinese Address Recognition And Normalization

Posted on:2011-10-17

Degree:Master

Type:Thesis

Country:China

Candidate:H Sun

Full Text:PDF

GTID:2178330338981793

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the development of Internet, the data on network and number of users both grow exponentially. Nowadays, how to locate the accurate information and deal with those different formats generated by different users are two key problems to be solved. Such problems are serious especially in the field of Emergency Management, so we used address data of the field as our experimental object and focused on au-tomatically Chinese address recognition and standardization.Automatically address recognition is a sub-task of Named Entity Recognition which belongs to Natural Language Processing. Existing researches usually employ rule based methods or statistical learning based methods. We used the latter one which involved Maximum Entropy model to identify address data from plain text. Including:1. Characteristic analysis: including word frequency features of Chinese address and the contexts.2. Feature selection and modeling: we defined features used in maximum entro-py model and applied the model in address recognition.3. Experimenting: we validated our methods based on experimental results. As it turned out the improvement was notable.Another part of our work was Chinese address standardization which involved address labeling and normalization. Chinese address labeling separates long address into different parts based on their semantic roles and adds labels onto them. We used Conditional Random Field model in our experiments, the work included:1. Chinese long address segmentation: we used heuristic method and statistical language model to improve the token results of existing tool.2. Chinese address structured labeling: we used Conditional Random Filed model to label the address elements, and validated our results. We also built a corpus containing 6000 long address with labels. The experimental result showed that our method had a great improvement.3. Chinese address normalization: we used rule-based method to solve the prob-lems including word missing, misspelling and duplication of name.

Keywords/Search Tags:

Named Entity Recognition, Statistical Learning Methods, Maximum Entropy Model, Conditional Random Model, Feature Weight

PDF Full Text Request

Related items

1	Research Of Named Entity Recognition Based On Conditional Random Fields
2	Named Entity Recognition Based On Conditional Random Fields Chinese Research
3	Named Entity Recognition Based On Conditional Random Fields
4	Research On Method Of Chinese Named Entity Recognition Based On Maximum Entropy Model
5	Research On A Two-Stage Method For Chinese Named Entity Recognition
6	Statistical Model Based Chinese Named Entity Recognition Methods And Its Application To Medical Records
7	Chinese Named Entity Recognition With A Hybrid-Statistical Model
8	Chinese Nested Named Entity Recognition Research
9	Research On Chinese Named Entity Recognition Model Based On Deep Learning
10	Based On Maximum Entropy Model Of Chinese Named Entity Recognition