Research On The Identification For Chinese Named Entity Based On Combination Of Rules And Statistic Analysis

Posted on:2013-05-26

Degree:Master

Type:Thesis

Country:China

Candidate:P Yan

Full Text:PDF

GTID:2248330377958330

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

Chinese named entity recognition is a foundational task in Chinese information processing. It is the key technique in many Chinese information processing applications, such as text understanding, text proofing, text clustering, text mining, text filtering, information extraction and machine translation. Therefore, it is important for lexical analysis, syntax analysis, semantic analysis or Chinese information processing to make researches on Chinese named entity recognition.This paper is concerned primarily with Chinese personal name recognition automatically in allusion to contemporary Chinese language. After making statistical analysis on personal name sample set and personal name corpus. Besides, we study emphasisly the statistical laws of the context of the first300surnames as single word and the part—of-speech laws of each surname. This paper presented a Chinese named entity recognition system that combined the statistics-based and rules-based method. The mainly work is as follows.This paper analyzes the difficulty of Chinese personal name recognition, makes introduction to existing approaches, and makes comparison among these approaches. Then we build some linguistics resource, such as personal name sample set, surname set and personal name corpus. After making statistical analysis on them, we also build personal name words list, probability list of surnames, context information list of personal name, prefix and suffix list of surnames etc, which are necessary for the process of recognizing personal name in text. The recognition model implementation approach is:the first is to test the text pretreatment, that is the main use is improving the reverse maximal matching algorithm dictionary, the increase the speed of the slit, secondly, the probability and statistics and the method of combining the rules for its identification. At the same time for even produce the intersection of ambiguity is introduced into the algorithm of the mutual information.To certain conditions about even of the automatic identification problem solved. Therefore, the improved the recognition method name for this word segmentation system performance mention was improved. Through the tests found that this model named entity recognition accuracy and the recall rate reached the higher standard, it is able to use Chinese syntactic analysis system contains named entity sentences on proper analysis. All in all, the model has certain research meaning and applied worthiness.

Keywords/Search Tags:

named entity, rules, probability statis

PDF Full Text Request

Related items

1	Entity Recognition Research And Application On Hotspot Information Of Internet Web
2	The Field Of Music, A Combination Of Rules And Statistical Named Entity Recognition
3	Named Entity Linking Based On Multisource Knowledge
4	Research On Chinese Named Entity Recognition Based On Rules And Conditional Random Fields
5	Research On Named Entity Recognition Based On Rules
6	Research On Named Entity Recognition And Disambiguation Based On Network Semantic Resource
7	Research On Product Named Entity Recognition And Normalization
8	Research And Application Of The Chinese Organization Names Recognition And Disambiguation
9	Research On Association Rules Mining Based On Multi-topic Classification And Named Entity Recognition
10	Combination Of Machine Learning Methods Named Entity Recognition Research