Font Size: a A A

Chinese Named Entity Recognition And Disambiguation Research

Posted on:2012-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:L H GongFull Text:PDF
GTID:2208330335497713Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Named Entity Recognition (NER), as the important basis of information extrac-tion, information retrieval, machine translation, chunking and Q&A System, belongs to the fundamental research area in Natural Language Processing (NLP). Therefore, the research work on Name Entity Recognition is of great practical significance.This thesis takes advantage of characteristics of modern Chinese and focuses on Chinese NER problem, whose core is the recognition of person name, location name and organization name. We employ a new statistical model, Conditional Random Field (CRF), as the base framework, design and implement a Chinese NER system. Besides that, we implement the disambiguation of named entity based on Latent Se-mantic Analysis (LSA). Specifically, the main content of this thesis is as follows:First, this thesis analysis the difficulty of NER, the characteristics of varient types of named entity, and briefly introduces some existing methods of NER and Chinese NER systems.After that, this thesis introduced CRF in detail with its definition, math model, pa-rameter estimation and model training method, etc. Furthermore, we apply CRF to Chinese NER mission and implement a Chinese NER system based on CRF with 2-layer structure. We design a few feature templates, compare and verify their perfor-mance and decide the best one.Then, this thesis introduce the research of named entity disambiguation, design a named entity disambiguation algorithm, NED-FS-LSA, based on feature selection and LSA, and verify the feasibility of building a valid entity library with the algo-rithm.At last, this thesis summarizes the contribution--roposing a possible solution of complete system that can turn text straight into entity library, and shows the outlook of further research based on it.
Keywords/Search Tags:Named Entity Recognition, Disambiguation, Conditional Random Field, Latent Semantic Analysis
PDF Full Text Request
Related items