Font Size: a A A

Research On The Methods Of Automatic Correction Of Chinese Word Segmentation And Part-of-Speech Tagging

Posted on:2004-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y L QianFull Text:PDF
GTID:2168360095953786Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The building of corpus is the basic work in the area of Chinese information processing. The processing of Chinese corpus includes Chinese word segmentation and part-of-speech tagging. They are widely used in many researches (for example, the automatic searching of Chinese text, machine translation, and Chinese characters identification and so on), and they provide important study resources for these researches.The effective use of corpus strongly depends on its processing level and quality. Now, we have written a lot of software for Chinese corpus processing, and have gained great achievements. But the outcome of them cannot answer our needs very well, and needs further improvements.The paper aims at improving the accuracy of Chinese word segmentation and part-of-speech tagging, studies and analyzes the two phases respectively:1. It discusses and analyzes the actuality of Chinese word segmentation, and describes an approach to correcting the Chinese word segmentation automatically based on rules. It compares the corpus processed by computer with the right, acquires the rules for Chinese word segmentation correction, and then corrects the corpus automatically based on these rules.2. It discusses and analyzes the actuality of Chinese part-of-speech tagging, and describes an approach to correcting the Chinese part-of-speech tagging automatically. It mines rules from right-tagged corpus using the method of rough sets, and then corrects the results of part-of-speech tagging automatically.3. We have designed and implemented an experiment system for the correction of Chinese word segmentation and part-of-speech tagging. The results of close-test and open-test of the system for Chinese word segmentation correction are 93.75% and 81.05% respectively, and the results of close-test and open-test of the system for Chinese part-of-speech tagging correction are 90.40% and 84.85% respectively.
Keywords/Search Tags:Automatic Correction of Chinese Word Segmentation, Automatic Correction of Chinese Part-of-Speech Tagging, Rough Sets, Chinese Information Processing, Quality Assuring of Chinese Corpus Processing
PDF Full Text Request
Related items