Font Size: a A A

Chinese Name Entity Recognition Based On Rules And Statistics

Posted on:2008-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y B QiaoFull Text:PDF
GTID:2178360212993951Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Natural language processing is an important research field of artificial intelligence, it is the technique to get,express and apply lingual knowledge by computers, which provides an efficient way of intercommunion between human beings and computers.Due to the writing custom, there is no space between words in Chinese. So word segmentation is the first and fundamental step for most of Chinese information processing systems. But name entity recognition is the primary cause of the decrease of word segmentation accuracy, and its performance can directly influence the following part of speech tags and syntax analysis. In addition, the research on name entities is of great benefit to many applied areas, such as text classification, information retrieving, questioning and answering system and so on. In a word, how to identify and classify named entities has great theoretic and practical significance.Up to now, the researches on name entities fall short of automatization and making use of lexical and semantic information. And most researches aim at Chinese personal names, but not placenames or organizational names. For overcoming the hereinbefore shortages, this paper brings forward an incorporate method based on rules and statistics, which adopting a double deck model to recognize all kinds of name entities including nested palcenames and organizational names. The following is its main idea: First, the lower deck model based on name entity searching algorithm is established before word segmentation, which can recognize some name entities with character words, lexical rules, statistical information and so on; Then, after word segmentation, the higher deck model based on Hidden Markov Model is established to recognize personal names, translated names, placenames and organizational names, including nested name entities with the help of the lower deck model.This paper is focused on how to realize name entities separately before and after word segmentation, and how to combine them into the accomplished lexical-syntax system, which assures that name entity recognition has great compatibility with the lexical system and sustains the syntax system preferably.In the double deck model, the lower deck model can assist the higher one to solve the inaccurate and omitted mistakes caused by word segmentation. And a filtered decoding algonthm based on dynamic programming is advanced to assure the speediness of name entity recognition. The experimental results show that the precision and recall rates are all above 90% and the syntax system can correctly analysis the structure of sentences with name entities. All in all, the double deck model has certain research meaning and applied worthiness.
Keywords/Search Tags:Natural Language Processing, Name Entity, Hidden Markov Model, Lexical Rule, Filtered Decoding Algorithm
PDF Full Text Request
Related items