Spelling errors and grammatical errors in text are very common in daily life,and these errors are usually caused by the language users’ writing,automatic speech recognition,optical character recognition,etc.Text checking technology can automatically identify spelling errors and grammatical errors in texts,which has important research significance.Chinese text checking technology has an important guarantee for quickly verifying massive electronic texts,helping Chinese beginners learn Chinese,and ensuring the input accuracy of downstream tasks.It plays an important role in applications closely related to people’s lives.This paper focuses on the fields of Chinese spelling and grammar checking,and conducts in-depth research on the Chinese Spelling Check(CSC)task and the Chinese Grammar Error Diagnosis(CGED)task.The research work of this topic is as follows:1.Chinese spelling check with enhanced knowledge of phonetic and glyph.Most of the misspellings in Chinese texts are caused by the similarity of the phonetic or glyph.This work focuses on this problem by using GRU to encode the pinyin and Cangjie codes of characters,so that the model can model the knowledge of the phonetic and glyph similarity of the characters while modeling the semantics.Experimental results on SIGHAN 2014 and SIGHAN 2015 show that the method has better performance than the baseline model,which proves the effectiveness of the method.2.Pre-training-based Chinese spelling check with enhanced knowledge of phonetic and glyph.Based on the previous section,this topic expects that the language model can learn important knowledge for the CSC task during pre-training,such as similarity information between characters.This work designs Mask strategies and pre-training tasks that are more applicable to Chinese Spelling Check,which alleviates the problem of inconsistency between pre-training tasks and downstream tasks.Experimental results on SIGHAN 2014 and SIGHAN 2015 show that this work achieves state-of-the-art performance and proves the effectiveness of the method.3.Chinese grammar error diagnosis based on multi-stage training and edit-level voting.Three types of models are included in the method for handling different problems: Chinese grammar error checking model,Chinese grammar error correction model,and Chinese spelling error correction model.This work adopts the method of editing-level voting for model integration,fusing the results of the above three types of models.In addition,a multi-stage training strategy including one pre-training stage and two fine-tuning stages is adopted.In comparison with the metrics reported by previous studies on CGED 2020,this work achieves the highest F1 at all levels,demonstrating the effectiveness of the method.Especially on the sub-task of error checking,the single model of this work achieves higher F1 values than the results of previous studies with multiple model integrations.The research is a further attempt of Chinese spelling check and Chinese grammatical error diagnosis.Experimental results show that the method proposed in this paper achieves state-of-the-art performance at both spelling level and grammar level.It surpasses some previous work and validates the effectiveness of the model. |