Font Size: a A A

The Study On Automatic Checking Technology Of Lexical Errors In Chinese

Posted on:2013-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:L P KongFull Text:PDF
GTID:2298330422473789Subject:Military communications science
Abstract/Summary:PDF Full Text Request
The traditional manual text input method in Command Automation System mayproduce errors inevitably. The artificial review is a bottleneck of the system. Automaticerror checking methods are urgently needed. In the filed of Internet content regulatory,many text content interference technology are trying to avoid regulatory and launchattacks by texts containing dangerous information, they substitute characters to affectthe accuracy of text processing. So automatic error checking technology is also needed.Although a lot of the automatic proofreading research work have been done inrecent years,there is also a great gap between the performance of the current system andpeople’s need. Low Error Recall Rate, high False Rate, and not enough accuracy ofError Correction Suggestion, all need to be improved.After studied on related work at home and abroad and the error characteristics, thispaper focused on the Chinese text error checking theory and technology, made effort onsome key parts to improve the performance of automatic error checking system.In this paper, the followings has been made:Expand probability model analysis in error finding method. Make the2orderHMM used in the combination of3-gram Parts of speech analyst and probability ofword analysis for the Chinese text error finding. Through this method, the independentbulk string analysis method, word probability analysis method, and parts of speechanalysis method are extended. And discussed the construction method of wordsdictionary which contains subsequent words information.Proposed a fast method of Chinese chunking. Support Vector Machine is used todo supervised learning. By optimizing the coding, building several models, finding thechunk internal rules,the analysis speed and accuracy is improved. The Chinese Chunkanalyzer make a progress in the accuracy of the text chunking (3%-6%). The analyzeris used to analysis news texts, from which the high-frequency chunks are extracted.These chunks enlarge the Knowledge Base, make a conservation of manpower.Proposed a similar Chinese word finding method which is based on thepen-shaped. Base on the analysis of editing and proofreading progress and currentconfusion set generation method,the error correction suggestions are picked out by itspronunciation and its shape. This shape based method expanded the auto-correctionresearch. Combined with the current confusion words sets method, long stringmatching method, it can give partial word error correction suggestions effectively.At the end of this article, works are summarized to point out some shortcomings inthe current experimental system to discuss the next step of research ideas and priorities.
Keywords/Search Tags:error checking, continuation analysis, Hidden Markov models, chunking, pen-shaped similar
PDF Full Text Request
Related items