Font Size: a A A

The Research On Named Entity Recognition In Chinese Information Processing

Posted on:2007-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:J T ZhuFull Text:PDF
GTID:2178360212983694Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Named Entity Recognition (NER), an essential task for Natural language Processing (NLP), plays a great role in Natural language Understanding, Information Retrieval (IR) and Information Extraction (DE). Internationally, many Researchers have done research in this area, and excellent result is achieved. Due to the particularity of Chinese language, Named Entity Recognition still is a difficult task in Chinese Information Processing (CIP). In this paper, four problems in Chinese NER are studied as follow.Firstly, an improved Hidden Markov Model (HMM) is present for Chinese NER. HMM, a simple and efficient tool, would encounter some problem in NER system. After analysis the character of Chinese NER, We provide an improved format for HMM, and adopt it to identify Chinese NEs. This method can rebuild the relation between the context words to current word, which achieve higher performance than general HMMSecondly, Maximum Entropy Model (ME) is proposed to identify Chinese Organization Name which is the most difficult task in Chinese NER. In this paper, A new approach based on Maximum Entropy Model is presented for organization identification. Integrated with word, POS, semantic information and knowledge in feather selection, the good result is accomplished.Thirdly, a new approach in which segmentation result can be corrected according to heuristic information is proposed for enhancing NER performance and is adopted to recognize Chinese Name. In general Chinese NER system, sentence segmentation always is followed by NER, and then the segmentation errors would spread to named entity identification. In this paper, the relation between segmentation, POS and NER has been analyzed explicitly. After that, the new method is proposed for enhancing NER through segmentation revising. Through mutual information and surname, the candidate Chinese name is detected in the new method, and then, find the all possible segmentation paths in the context of the candidate Chinese name. Atlast, a HMM tagging model is used to find the best path to recognize the Chinese person name. The method which combined segmentation, POS with NER could reduce the effect of the segmentation error and improve the performance of Chinese name recognition.Finally, many tasks aim at the information processing from a paper in Chinese information processing, such as IR, IE, and Machine Translation (MT). In these tasks, Context information in the doucmenent has played an important effect in NER. A method which is integrated string and word statistic to identify candidate NEs, can find NEs in a document. Subsequently, the abbreviations of NEs are identified from NE in term of the abbreviate Rules. The experiment shows that the information from the document analysis could enhance the performance of NER system.
Keywords/Search Tags:Chinese Information Processing, Lexical Analysis, Segmentation, Named Entity Recognition, Hidden Markov Model, Maximum Entropy, Named Entity Recognition in Document
PDF Full Text Request
Related items