Chinese Text Automatic Proofreading System

Posted on:2016-04-11

Degree:Master

Type:Thesis

Country:China

Candidate:M Shi

Full Text:PDF

GTID:2308330479998252

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the development of computer and information technology, the rapid development of statistical natural language processing, has made remarkable achievements. The demand of Electronic text automatic proofreading makes the text automatically proofreading research emerged. There are two steps in Chinese text automatic Proofreading: automatic error checking and automatic correction. This paper has done the following work:1. Local proofreading of Chinese homonymsThe error type of Chinese text varied, in this paper, based on a detailed analysis of each error type and combined with the actual discovered homophone errors accounted for a large proportion, so we did some work about homophone errors proofreading. At first, using the most simple n-gram models- 2-gram model; then combine 2-gram model and context; after analyzing the result, this paper proposes a method of utilizing contextual generalization synonyms, improved the problem of sparse data and system performance. Finally, test the system with a real test text, the recall rate was 81.2%, the accuracy was 73.4%, 88.9% correct rate.2. Long distance proofreading of Chinese homonymsFor those errors that could not be identified by local features, we used Chinese collocation. First, according to the corpus we got collocation automatically, which was the basic source; then we extracted collocation message of word in text, computed supports of collocation of all words in confusion set, then judged whether the original text was wrong according to the size of supports of collocation, at last, offered two words with two biggest support as advice.3. Non-word error proofreadingThis paper also studied how to proofread the non-word error. Here only for long term errors, including four words, five words, six words, it is a common idiom type of error. Non-word error is a concept of English text proofreading in fact, in this paper, it is for long-term, rather than characters. For solving this problem, we use the method of construct wrong words set with dictionary and massive corpus by fuzzy matching, and then we got couples of â€œright word wrong wordâ€. If the text is matched to the wrong word, the system would be able to give the correct word when proofreading. We used this method proofread compositions, the effect is obvious.Finally, we built a text automatic proofread system, which mainly proofread the two kinds of errors. After testing with real test text, we pointed some shortcomings and future research directions.

Keywords/Search Tags:

text automatic proofread, homophone proofread, chinese collocation, fuzzy matching, non-word error proofread

PDF Full Text Request

Related items

1	To Proofread Differentwords Of Chang Duan Jing Between The Si Ku Versions And The Song Dynasty Versions
2	Research And Implementation Of Chinese Text Proofreading Algorithms Based On Deep Learning
3	The Design And Implementation Of Intelligent Proofreading Legal Documents System
4	The Study OF Qiandian And Shuo WenJieZiJiaoQuan
5	Chinese Punctuation Proofreading Based On Deep Learning
6	The Design Of PH Meter Based On MSP 430
7	Guodian Ziyi "school Release
8	Research And Realization Of Non-word Error Automatic Proofreading System In Chinese Text
9	An Information Input And Processing System Based On Handwriting Numeral Recognition Technology
10	Study On The Method Of Automatic Proofreading Of Word-level Chinese Text