Font Size: a A A

Research Of Named Entity Recognition Based On Conditional Random Fields

Posted on:2008-10-23Degree:MasterType:Thesis
Country:ChinaCandidate:J Q GuoFull Text:PDF
GTID:2178360212483659Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Named entity recognition (NER) is one of the focus and essential task in natural language processing research. The main work of NER is to seek to classify every word in a document as being a person name, location, organization, date, time, number or other named entities. NER plays a significant role in natural language processing, and it has been applied in Information Retrieval, Information Extraction, Machine Translation and some other domains. The following research has been done in this dissertation:Firstly, summarization and review of NER is produced in this paper, the state of the art of NER and various methods applied in NER are investigated as well.Secondly, maximum entropy modeling (MEM) is applied in Chinese word segmentation (CWS) and an approach of CWS by tagging the check points between Chinese characters is presented. CWS and Chinese NER interact with each other and can be improved by each other. CWS can be the pre-process of Chinese NER, on the other hand, one of the purposes of NER is to improve the performance of CWS. So this thesis presents an approach of CWS of check points tagging and a CWS system based on MEM is developed to participate the Third International Chinese Language Processing Bakeoff.Thirdly, NER techniques based on conditional random fields (CRFs) is investigated in this thesis. CRFs is a statistical approach of machine learning, it outperforms other approaches in segmenting and labeling sequence data. Maximum probability segmentation (MPS) has higher sentence recalling rate compared with other rough segmentation model. In this thesis an approach combining the MPS into CRFs model is presented in research of NER. Experiments on person name recognition and location recognition are conducted respectively, the results prove the validity of the approach.
Keywords/Search Tags:Chinese word segmentation, named entity recognition, maximum entropy modeling, conditional random fields, maximum probability word segmentation model
PDF Full Text Request
Related items