Font Size: a A A

The Research And Implementation Of Chinese Names Recognition Based On Probability Distribution And Rules

Posted on:2014-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z ZangFull Text:PDF
GTID:2248330398975420Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Chinese word segmentation technology is a foundational task, in Chinese information processing, which is widely used in information extraction, search engines, machine translation, text clustering and other fields. At present, ambiguous word segmentation and identification of unknown words impact the word quality mostly. Chinese personal name is in the largest number and difficulty to identify in unknown words.There are always modules specialized in Chinese personal name identification in Chinese word segmentation system. The Improvement of the quality in Chinese personal name recognition helps not only the accuracy of word segmentation but also the information extraction and lexical analysis.This paper is mainly about Chinese personal name automatically identification on the basis of the modern Chinese text. We researched personal name words and context information after making statistical analysis on large-scale of personal name sample set and corpus, then summed the law of personal name words and context information. We also modified a statistical model based on reliability and designed some rules based on the characters of the system which can be used in the process of personal name recognition. Specifically, this paper reads as follows:firstly, summarizing the difficulties encountered in the automatic identification of Chinese personal name, introducing and making comparisons about some of the existing personal name identification methods; secondly, using statistics based on the relative reliability statistical to learn the model of large-scale corpus and build personal name words list; thirdly, classifying the personal name context words according to the part of speech on the basis of statistical analysis to adjust the probability of the names valuation; there also designing many of the rules used in name recognition process, which are mainly used for the extraction of the candidate’s name and recognition on the result of correction; and then learn the respective threshold values and parameters from experiments; finally, designing some experiments and comparing some of the methods which are used in the research process.Experimental results show that the name recognition model proposed in this text provides satisfactory results. Tests carried out on January1998《People’s Daily》 corpus, obtained good recall rate and precision rates.
Keywords/Search Tags:Chinese personal name identification, rules, large-scale corpus, statisticalmodel
PDF Full Text Request
Related items