With China’s increasing international social status and the increasing number of people learning and using Chinese in the world,the text content is filled with a lot of error information.Chinese text grammar correction technology can automatically correct the text to improve the accuracy of the language,and can reduce the cost of manually reviewing a large number of text content,and has auxiliary application value in a large number of fields.In view of the common types of errors(spelling errors,grammatical errors,semantic errors)in Chinese,this thesis designs a segmented Chinese grammar correction model based on natural language processing technology,and implements the corresponding prototype system.The main research contents of this thesis are as follows:(1)This thesis implements a Chinese text spelling correction model combining BiGRU and BERT.The model is divided into two parts: error checking and error correction.The residual connection is used to combine the two parts of the network to improve the error correction effect.Through Bi-GRU,the error probability of each character in the text is predicted,and its prediction results are input into the network of the error correction part,and finally the spelling correction is completed.Compared with the existing models,the implemented spelling correction model has a certain effect improvement.(2)In this thesis,Seq2 Seq and Seq2 Edit models are used to correct grammatical errors in Chinese text.The first model uses the Transformer structure to translate the text with grammatical errors into correct text;The second model is only based on the Encoder of Transformer,and takes a series of editing operations to correct the wrong text sequence.Combined with the advantages of the two types of models,the model integration based on edit-level voting is adopted to combine the two types of models to complete the grammar correction of Chinese text,thus improving the performance of the model compared with the existing methods.(3)This thesis collects a large number of Chinese texts containing semantic errors to build rule-based error templates,and summarizes the corresponding modification methods for semantic error templates.A large number of subtitle corpus of film and TV series have been collected and collated,and the difference between them and the written corpus has been calculated.According to the difference formula,and with the help of the Modern Chinese Dictionary(7th Edition),a small oral corpus has been built to check the possible oral content in the Chinese text.This thesis implements a prototype system based on the Chinese text grammar error correction model.The system can correct the text uploaded or submitted by the user,and return the error correction results to the user.The system has been tested to verify the effectiveness of the system,which has certain use value. |