In recent years,grammatical error correction has gradually become an important research topic in natural language processing.It is aimed at converting an ungrammatical sentence to its correct version.Grammatical error correction is widely used in document checking,education,data prepossessing,speech recognition post-processing,and so on.With the improvement of computing power,deep learning plays an important role in grammatical error correction tasks.With deep learning,the accuracy of grammatical error correction has been greatly improved.At present,the mainstream deep models for grammatical error correction are divided into two types:one is based on sequence to sequence architecture,and the other is based on the sequence labeling model.Although the mainstream methods have already achieved excellent performance,there are still some problems.Such as the current mainstream sequence to sequence model does not directly add syntactic information and the traditional model ensemble strategy and decoding strategy have limited improvement.Given these problems,we carry out the following research:(1)We propose a sequence to sequence grammatical error correction model based on dependent self-attention.Currently,transformer,the mainstream sequence to sequence grammatical error correction model,only encodes additional position embedding of each word in a sentence and does not encode additional syntactic information.The syntactic information needs to be learned by the transformer in the training data.In machine translation tasks,many studies have shown that adding additional syntactic information to the transformer can improve the translation quality of models.As a high syntactic structure-related task,it is worth trying to add syntactic information to the grammatical error correction model.However,different from machine translation,the input sentence in the grammatical error correction task is ungrammatical.A normal syntactic parser will give wrong syntactic information.Therefore,a dependency parser for ungrammatical sentences is proposed to extract valid dependency information from ungrammatical sentences.Then,the dependency information is integrated into the transformer through the dependent self-attention.Experiments are carried out on BEA-2019,CoNLL-2014 and JFLEG test sets,and the experimental results verify the effectiveness of the proposed method.(2)We propose a grammatical error correction ensemble model based on sequence labeling and sequence to sequence model.We use an editing combination strategy to ensemble the sequence labeling correction model and sequence to sequence correction model according to their different performance on different error types.Firstly,we use validation sets to obtain the performance differences of sub-models on different error types.During the correction,the sub-model only corrects the errors that it is good at,which helps improve the overall correction accuracy.In the model decoding stage,the iterative inference is used to correct sentences for multiple rounds.And R2L Rerank strategy is also used to consider the bidirectional fluency of the output sentences.Experiments are carried out on BEA-2019 and CoNLL-2014 test sets.The results verify the validity of ensemble strategy and decoding strategies.The final ensemble model achieved 76.8%F0.5 score on the BEA-2019 test set. |