Font Size: a A A

Chinese Automatic Segmentation And Chinese Personal Name Recognition Technology Research

Posted on:2007-11-26Degree:MasterType:Thesis
Country:ChinaCandidate:D M XiongFull Text:PDF
GTID:2208360182466720Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Automatic Chinese segmentation and name entity recognition are two key tasks in natural language processing and computational linguistics. Its research and application have great theoretical and practical significance. With the increasing demand on automatic natural language processing, high accuracy Chinese segmentation and name entity recognition become indispensable since its result directly affects many applications like parsing, semantic analysis, speech recognition, machine translation, information retrieval, information filtering and so on.Comparing with other languages, automatic segmentation and name entity recognition of Chinese have its own difficulties, and currently the results of automatic Chinese segmentation and name entity recognition are still not quite satisfying. This dissertation begins with the problem of automatic Chinese segmentation and Chinese person name recognition, which is the subtask of name entity recognition, and presents the automatic Chinese segmentation model and Chinese person name recognition model based on statistical method.In detail, this dissertation has conducted the following researches:We present a integrated hiberarchy, which incorporates automatic Chinese segmentation, disambiguation, part-of-speech tagging and Chinese person name recognition. Firstly, by the comparing of input text with pre-segmenting characters and back-segmenting characters, Chinese segmentation task is reduced into the segmentation problem of several Chinese characters field. At the stage of rough segmentation, which is based on the N-best strategy, we obtain the first N best results which are produced by maximum probability algorithm. Rough segmentation tries to cover the correct segmentation with as few candidates as possible. Those N best candidates are the objects of next evaluating stage. we have observed that the part of speech of language has the feature of relatively stable distributing, so we make use of the word's part-of-speech and the collocation between those part-of-speech and givethe evaluating values of those candidates. The top one is the result of our first segmentation. The processing of Chinese person name recognition is based on the top segmentation result. We regard the roles which compose the Chinese person name as one part-of-speech, then we can take the processing of recognition into part-of-speech tagging. After the tagging of top segmentation result using HMM, we can get one part-of-speech sequence. Chinese person name is recognized by rules matching on this part-of-speech sequence. Added processing is simple if we want to get the final segmentation result or part-of-speech tagging sequence.The frame we present in this dissertation is hiberarchy and intelligible. Experiments show that our Chinese segmentation model and Chinese person name recognition model are effective.
Keywords/Search Tags:automatic Chinese segmentation, Chinese person name recognition, part-of-speech tagging, N-best strategy, HMM
PDF Full Text Request
Related items