Chinese text error correction is an important technique for automatic sentence checking and error correction,which aims to improve the language quality and reduce the cost of manual verification.Its application prospects are very broad.For example,in the search engine,input texts often exist wrong words,missing words and multiple words.By analyzing the input texts,search engines can automatically correct the text error and feedback to users,which make search results more satisfying.In the Chinese teaching system,the automatic text correction technology realizes the basic function of automatic text correction.In sign language recognition system,the text error correction technique can be used to normalize the text results of continuous sign language recognition.In the field of speech recognition,in order to improve the recognition accuracy and enhance user experience,text error correction technology is often embedded to assist.In addition,automated text error correction is also widely used in intelligent question-answering,intelligent manuscript review and text editing systems.It can be seen that text error correction technology is ubiquitous and indispensable in various fields and applications.Most of the existing Chinese error correction methods are based on machine translation,but there are still some problems such as low processing accuracy and difficulty in correcting common sense entity errors.Based on the Chinese text error correction task,this paper makes an in-depth study on the different types of text errors.The main contributions and innovations are:(1)For general grammatical errors,such as wrong characters and out of order,based on the Transformer model with the idea of machine translation,a retention mechanism algorithm is proposed,and the interface of bidirectional decoding is opened,so as to improve the accuracy of model decoding;(2)Considering Transformer model cannot solve the problem of common sense entity errors well,the Chinese text knowledge extraction model was designed,and a weighted cosine similarity matching algorithm is proposed,by building a knowledge base with context information,realizing the lightweight,high precision of knowledge matching,and then correct text common sense entity.The specific work is as follows:1.Research on Chinese grammatical error correction method based on Transformer.This paper builds a Transformer translation model by releasing the Transformer’s bi-directional decoding capabilities so that the model can take advantage of the future information when decoding.On the other hand,the retention mechanism algorithm is proposed at the decoder end to change the original model’s sequential decoding process for input characters,so that it can copy the content without modification from the input text to the output interface during decoding,which means error-free text can remain correct and thus decoding accuracy can be improved.2.A knowledge base integrated with context semantic information is developed.The Glove model and Complex model were used to train word embeddings and knowledge graph embeddings,respectively.By extracting the keyword information of the text in which the triplet is located and forming the text vector by weighted average,the representation vector corresponding to the triplet and the text vector are spliced together to form the knowledge base,which provides the data set for the follow-up research on common sense error correction of Chinese text.3.Research on Chinese common sense error correction method based on knowledge graph.Firstly,the deep learning model of CNN-Attention is built to achieve triplet extraction of Chinese text.Secondly,the proposed new cosine similarity algorithm is used to match the similarity between the triples extracted from the input text and the knowledge base which has been constructed and integrated with the context information.Finally,the triplet with the highest matching accuracy is replaced to the corresponding position of the input text,and the correct text is output. |