Research On The Methods Of Automatic Correction Of Chinese Word Segmentation And Part-of-Speech Tagging

Posted on:2004-03-17

Degree:Master

Type:Thesis

Country:China

Candidate:Y L Qian

Full Text:PDF

GTID:2168360095953786

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The building of corpus is the basic work in the area of Chinese information processing. The processing of Chinese corpus includes Chinese word segmentation and part-of-speech tagging. They are widely used in many researches (for example, the automatic searching of Chinese text, machine translation, and Chinese characters identification and so on), and they provide important study resources for these researches.The effective use of corpus strongly depends on its processing level and quality. Now, we have written a lot of software for Chinese corpus processing, and have gained great achievements. But the outcome of them cannot answer our needs very well, and needs further improvements.The paper aims at improving the accuracy of Chinese word segmentation and part-of-speech tagging, studies and analyzes the two phases respectively:1. It discusses and analyzes the actuality of Chinese word segmentation, and describes an approach to correcting the Chinese word segmentation automatically based on rules. It compares the corpus processed by computer with the right, acquires the rules for Chinese word segmentation correction, and then corrects the corpus automatically based on these rules.2. It discusses and analyzes the actuality of Chinese part-of-speech tagging, and describes an approach to correcting the Chinese part-of-speech tagging automatically. It mines rules from right-tagged corpus using the method of rough sets, and then corrects the results of part-of-speech tagging automatically.3. We have designed and implemented an experiment system for the correction of Chinese word segmentation and part-of-speech tagging. The results of close-test and open-test of the system for Chinese word segmentation correction are 93.75% and 81.05% respectively, and the results of close-test and open-test of the system for Chinese part-of-speech tagging correction are 90.40% and 84.85% respectively.

Keywords/Search Tags:

Automatic Correction of Chinese Word Segmentation, Automatic Correction of Chinese Part-of-Speech Tagging, Rough Sets, Chinese Information Processing, Quality Assuring of Chinese Corpus Processing

PDF Full Text Request

Related items

1	Research On Corpus Parallel Processing In Chinese Proofreading
2	Chinese Word Auto-segmentation Design And Algorithm Realization For Chinese Network Information Retrieval
3	Chinese Automatic Segmentation And Chinese Personal Name Recognition Technology Research
4	Research On Chinese Word Segmentation And Part-of-speech Tagging Based On Deep Learning Methods
5	Full-text Search For The Modern Chinese Text Processing, Automatic Word Generic System
6	Word Segmentation And Pos Tagging In Chinese
7	Natural Language Processing Of Chinese Text Automatic Proofreading
8	The Research Of Applying Conditional Random Fields To Chinese Word Segmentation And Part-Of-Speech Tagging
9	Research On The Traditional Chinese Spelling Error Detection
10	The Research And Implementation Of Automatic Chinese Word Segmentation System