Chinese Automatic Segmentation And Chinese Personal Name Recognition Technology Research

Posted on:2007-11-26

Degree:Master

Type:Thesis

Country:China

Candidate:D M Xiong

Full Text:PDF

GTID:2208360182466720

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Automatic Chinese segmentation and name entity recognition are two key tasks in natural language processing and computational linguistics. Its research and application have great theoretical and practical significance. With the increasing demand on automatic natural language processing, high accuracy Chinese segmentation and name entity recognition become indispensable since its result directly affects many applications like parsing, semantic analysis, speech recognition, machine translation, information retrieval, information filtering and so on.Comparing with other languages, automatic segmentation and name entity recognition of Chinese have its own difficulties, and currently the results of automatic Chinese segmentation and name entity recognition are still not quite satisfying. This dissertation begins with the problem of automatic Chinese segmentation and Chinese person name recognition, which is the subtask of name entity recognition, and presents the automatic Chinese segmentation model and Chinese person name recognition model based on statistical method.In detail, this dissertation has conducted the following researches:We present a integrated hiberarchy, which incorporates automatic Chinese segmentation, disambiguation, part-of-speech tagging and Chinese person name recognition. Firstly, by the comparing of input text with pre-segmenting characters and back-segmenting characters, Chinese segmentation task is reduced into the segmentation problem of several Chinese characters field. At the stage of rough segmentation, which is based on the N-best strategy, we obtain the first N best results which are produced by maximum probability algorithm. Rough segmentation tries to cover the correct segmentation with as few candidates as possible. Those N best candidates are the objects of next evaluating stage. we have observed that the part of speech of language has the feature of relatively stable distributing, so we make use of the word's part-of-speech and the collocation between those part-of-speech and givethe evaluating values of those candidates. The top one is the result of our first segmentation. The processing of Chinese person name recognition is based on the top segmentation result. We regard the roles which compose the Chinese person name as one part-of-speech, then we can take the processing of recognition into part-of-speech tagging. After the tagging of top segmentation result using HMM, we can get one part-of-speech sequence. Chinese person name is recognized by rules matching on this part-of-speech sequence. Added processing is simple if we want to get the final segmentation result or part-of-speech tagging sequence.The frame we present in this dissertation is hiberarchy and intelligible. Experiments show that our Chinese segmentation model and Chinese person name recognition model are effective.

Keywords/Search Tags:

automatic Chinese segmentation, Chinese person name recognition, part-of-speech tagging, N-best strategy, HMM

PDF Full Text Request

Related items

1	Research On The Methods Of Automatic Correction Of Chinese Word Segmentation And Part-of-Speech Tagging
2	Research On Chinese Word Segmentation And Part-of-speech Tagging Based On Deep Learning Methods
3	The Research Of Applying Conditional Random Fields To Chinese Word Segmentation And Part-Of-Speech Tagging
4	BiLSTM And CNN Based Joint Model For Chinese Word Segmentation And Part-of-speech Tagging
5	Research On The Learning Of Integrating Chinese Word Segmentation With Part-of-Speech Tagging And Domain Adaption Approach
6	Word Segmentation And Pos Tagging In Chinese
7	The Research Of Chinese Automatic Segmentation Method Based On HowNet Semantic Relevancy Computing
8	Full-text Search For The Modern Chinese Text Processing, Automatic Word Generic System
9	Research And Implementation For Chinese Lexicon Analysis System Based On Neural Network
10	Chinese Part-of-Speech Tagging Based On Ameliorated Hidden Makov Model