Font Size: a A A

Research On Chinese Person Name Recognition Based On Hybrid Models

Posted on:2016-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2308330452468983Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Chinese named entity recognition is a basic theory of natural language processing.Chinese personal name is in the largest number and difficulty to identify in Chinese NamedEntity. Its existence is an important factor which is influence the segmentation accuracy.There are always modules specialized in Chinese person name recognition in Chinese wordsegmentation system. Therefore, it is important that the work of Chinese person namerecognition. This paper is mainly about Chinese person name automatically identification onthe basis of the modem Chinese text.This paper is mainly about Chinese person name recognition on the basis of the Chinesetext. We proposed the following two methods of Chinese person name recognition:(1) On the basis of the traditional Naive Bayesian classification algorithm that justconsidered characters of Chinese person names, we brought boundary words of Chinesenames in it. In order to overcome the difficulty of boundary defining, we counted Chinesename’s character frequency and boundary templates’ frequency from tagged corpus. Thenthese recognized person names are used to match the missed occurrence in the text. Themethod is easy and the final result is good.(2) First, tokens are tagged using Conditional Random Fields model with different rolesaccording to their functions in the generation of Chinese personal name. Then, the roles arecorrected by transformation-based error-driven learning. Finally, the possible names arerecognized on the roles sequence. Sequence labeling model based on Conditional RandomFields model could achieve better performance than the traditional classification methods. Wecan make full use of the scene information of Chinese person names when we usingerror-driven learning algorithm. And the problem of data sparseness in corpus can be solved.
Keywords/Search Tags:person name recognition, Naive Bayesian classification, boundary templates, Conditional Random Fields, error-driven learning
PDF Full Text Request
Related items