The Research And Implementation Of Chinese Names Recognition Based On Probability Distribution And Rules

Posted on:2014-02-02

Degree:Master

Type:Thesis

Country:China

Candidate:Y Z Zang

Full Text:PDF

GTID:2248330398975420

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Chinese word segmentation technology is a foundational task, in Chinese information processing, which is widely used in information extraction, search engines, machine translation, text clustering and other fields. At present, ambiguous word segmentation and identification of unknown words impact the word quality mostly. Chinese personal name is in the largest number and difficulty to identify in unknown words.There are always modules specialized in Chinese personal name identification in Chinese word segmentation system. The Improvement of the quality in Chinese personal name recognition helps not only the accuracy of word segmentation but also the information extraction and lexical analysis.This paper is mainly about Chinese personal name automatically identification on the basis of the modern Chinese text. We researched personal name words and context information after making statistical analysis on large-scale of personal name sample set and corpus, then summed the law of personal name words and context information. We also modified a statistical model based on reliability and designed some rules based on the characters of the system which can be used in the process of personal name recognition. Specifically, this paper reads as follows:firstly, summarizing the difficulties encountered in the automatic identification of Chinese personal name, introducing and making comparisons about some of the existing personal name identification methods; secondly, using statistics based on the relative reliability statistical to learn the model of large-scale corpus and build personal name words list; thirdly, classifying the personal name context words according to the part of speech on the basis of statistical analysis to adjust the probability of the names valuation; there also designing many of the rules used in name recognition process, which are mainly used for the extraction of the candidate’s name and recognition on the result of correction; and then learn the respective threshold values and parameters from experiments; finally, designing some experiments and comparing some of the methods which are used in the research process.Experimental results show that the name recognition model proposed in this text provides satisfactory results. Tests carried out on January1998《People’s Daily》 corpus, obtained good recall rate and precision rates.

Keywords/Search Tags:

Chinese personal name identification, rules, large-scale corpus, statisticalmodel

PDF Full Text Request

Related items

1	Chinese New Word Identification Based On Large-scale Corpus
2	Researches Into New Chinese Words Identification Based On Large-Scale Corpus
3	Research On Chinese New Word Discovery Technology Based On Large Scale Network Corpus
4	Design And Implementation Of A Large Corpus Of Multi-level Feature Index Retrieval Algorithm
5	Automatic Approaches To Develop Large-scale TCM Electronic Medical Record Corpus For Named Entity Recognition Tasks
6	Research On Segmentation Consistency Checking Technology Of The Large-scale Chinese Corpus
7	Automatic Chinese Collocation Extraction Based On Large-scale Corpus
8	Research On Corpus Parallel Processing In Chinese Proofreading
9	Research On Fast Retrieval Algorithm Chinese Expressions And Sentences Based On Chinese Corpus
10	The Method Of Chinese Synonym Extraction Based On Large-scale Corpus