Font Size: a A A

Research On Chinese Real-word Error Automatic Detection And Correction

Posted on:2018-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:D Z GuFull Text:PDF
GTID:2348330536977522Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and information processing technology,e-books,digital newspapers,e-mail and other electronic text have become an important part of people's daily life.Meanwhile,there are more and more text errorsChinese text errors,which are classified into real-word error and non-word error.Real-word error refers to the wrong word itself is a right word in dictionary.This paper mainly studies the Chinese real-word errors automatic proofreading method.At present,the researches of real-word errors just stay in the debugging stage and mostly use relatively simple features and single model.Therefore,the checking accuracy and recall rate of existing Chinese real-word error automatic proofreading methods are relatively low,the false alarm rate is particularly high.Based on the analysis of the real-word errors,this paper proposes an automatic proofreading method based on the combination of confusion set,statistics models,context feature generalization and collocation.(1)The analysis of Chinese text errors reason and error type is the premise of studying Chinese text error proofing.Many errors are due to the typos,missing and erroneous input,this article carries on the analysis and classification of the Chinese errors in the text.(2)Chinese real-word errors automatic proofreading requires a large amount of linguistic knowledge and statistical knowledge.This article studies the representation and construction methods of the resources real-word error proofreading requires,which includes real-word confusion set,synonyms set,word N-gram model,collocationfeature.(3)This paper presents a Chinese real-word error automatic proofreading method based on real-word confusion set,context feature generalization,statistical models,and collocation features.This method not only considers the local context features,but also uses collocation features,which can cover long distance language restrictions.The experiments show that the recall rate is 88%,and accuracy rate is 76%,error correction accuracy is 69% for this method.The automatic proofreading method proposed in this paper can not only correct the local errors and global errors in the text,but also integrates the text checking and error correction.
Keywords/Search Tags:Real word Error, Confusion set, N-gram Model, Bayesian, Collocation Feature, Context Feature Generalization
PDF Full Text Request
Related items