Font Size: a A A

Research And Realization Of Non-word Error Automatic Proofreading System In Chinese Text

Posted on:2018-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:H B LiuFull Text:PDF
GTID:2348330536477494Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The study of automatic proofreading of Chinese text has been a difficult issue in the field of natural language processing for many years.With the rapid development of Internet technology,the amount of the Web text is growing bigger and bigger in which errors are also more and more,and Non-wordError is a common mistake.At present,most of Automatic Construction of traditional Chinese researches focus on automatic error checking,while the study of error correction is relatively few.As to these problems,in this paper,the types of Chinese errors are classified into “Non-wordError” and “Real-wordError”.Firstly,we use the statistical method to obtain the wrongly written word in Web corpus,and construct the Typo-pairs,then the Typo-pairs can be applied to the automatic proofreading of Non-wordError.Specific researches include the following aspects:1.Chinese wrongly written character analysisAt first,this paper analyses the cause of the wrongly written character in the Chinese text,and then we classified these error according to the error cause of characters and the nature of the error string.Finally,we carried on the statistics and analysis for Chinese error from the web news corpus.2.Automatic construction method of Typo-pairs libraryIn order to realize the automatic proofreading of Non-wordError in the text,which requires a great deal of knowledge and resources,and Typo-pairs is a very important knowledge.This paper puts forward two methods of Typo-pairs automatic construction.One is a method of Typo-pairs automatic construction which combines fuzzy matching and statistics,and it based on N-gram model and fuzzy matching model,so we can obtain Typo-pairs by verifying the Context statistical information.The other based on Typo-pairs of confusion sets which is an automatic construction method.This method replaces Chinese words which need to match with Chinese character confusing set,so as to form confusing word string.Then utilizes statistical knowledge to verify each confusing word to get Typo-pairs.3.Automatic construction method based on the fuzzy segmentation of Non-word ErrorThis paper splits the target sentence faintly to form word graph by combining the generated Typo-pairs and correct word dictionary.Finally,we get the optimal path in the word graph with the shortest path algorithm,to realize automatic proofreading ofNon-wordError.4.Automatic checking error method of the Non-word Error based on the N-gram modelThis paper proposed the Non-wordError automatic error checking method based on the N-gram model.We analyze the scatters after segmentation through binary and ternary statistical model to automatically check the errors.
Keywords/Search Tags:Non-word Error, Typo-pairs, fuzzy segmentation, N-gram model, automatic proofreading
PDF Full Text Request
Related items