With the rapid development of computers and the Internet,electronic documents are used more and more frequently in work and life,and the traditional manual proofreading method has been unable to meet people's needs.Chinese text error correction is to check whether there is an error in the Chinese text and correct it.This technology has wide practical value in real life,so it is regarded as one of the important topics in the field of Chinese natural language processing.It is used in keyboard input methods,document editing,search engines,and speech recognition.After in-depth research on error correction research at home and abroad,this article has done related research on word errors and semantic errors.In terms of word error correction,this paper improves the traditional sequence labeling algorithm,and proposes a CSC-Bi LSTM-CRF algorithm based on sequence labeling.This algorithm divides the error correction task into two parts: error detection and correction.First,the target word is checked for errors through the context word vector,then the suspicious set is replaced with the confusion set according to the output of the sequence label,and finally the best candidate word is selected by probability statistics.In terms of semantic error correction,this paper proposes a DAE-Decoder algorithm,which divides the error correction task into two parts:encoding and decoding.It is pre-trained based on Bert and provides input text according to the mask language model(MLM)Each initial character in generates a set of replacement characters as candidate words,and then the decoder selects the correct characters from multiple candidatewords according to the character similarity and contextual suitability.Based on the analysis of the advantages and disadvantages of the CSC-Bi LSTM-CRF algorithm and the DAE-Decoder algorithm,this paper proposes a hybrid algorithm.After experimental verification and analysis,the hybrid algorithm has improved both in accuracy and recall rate,Embodies the feasibility and superiority of the hybrid algorithm.And the algorithm is more versatile and can be used to correct errors in different corpora.It provides a certain reference and reference for the research of Chinese text error correction related algorithms.It also has great significance for the research of NLP related fields. |