Font Size: a A A

Automatic Grammatical Error Detection Technology And Application For Chinese Text

Posted on:2022-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:H WangFull Text:PDF
GTID:2518306764477374Subject:Journalism and Media
Abstract/Summary:PDF Full Text Request
As China's economy grows rapidly and stably,more and more people with different mother tongues and knowledge backgrounds are interested in learning Chinese as a second language.Due to the complexity of Chinese,many Chinese beginners often suffer from language problems during the learning process,so the research on grammatical error detection is of great significance.The research on automatic grammatical error detection for Chinese text is still in its infancy,and the effect of error detection is poor.Based on deep learning,this thesis mainly studies the techniques and applications of automatic grammatical error detection for Chinese text,and develop an automatic grammatical error detection prototype system for Chinese text,which can judge whether sentences are wrong,identify the types of errors and locate errors.This thesis mainly completed the following work:Firstly,this thesis proposes a classification grammatical error detection model BTextCNN-ED,which treats judging whether a sentence contains errors as a text binary classification task,and uses BERT's Transformer encoder to extract shallow information,syntactic features and deep information in the text and generate the word vector matrix is input to the TextCNN network for convolution,pooling and prediction output.Compared with the existing officially published models,the F1 value of Detection level in the test set of BTextCNN-Ed model is improved by 0.0169.Then,this thesis regards the text error detection task as a sequence annotation problem,designs three sequence labeling error detection networks,and proposes three strategies to improve the performance of the model.The first is the RoC-ED network,which inputs the text to the RoBERTa layer to obtain the hidden state vector,and then passes the vector through the mapping layer and the conditional random field layer to predict the label sequence corresponding to the input sequence.The second is the RoIdCED network,which is an improved RoC-ED network.RoBERTa layer is followed by a layer of IDCNN to extract features,which can improve the training speed and prediction speed.The third network is RoBmIdC-ED,which supplements the global features of text extracted by BiLSTM layer for RoIdC-ED network,and splicing the features extracted by IDCNN layer,taking into account both local and global features.The first strategy is data enhancement,which alleviates the problem of insufficient data and uneven distribution.The second strategy is dynamic RoBERTa fusion,which assigns different weights to the hidden state vectors generated by encoders of different layers in RoBERTa layer to learn more abundant information.The third strategy is model ensemble,which integrates multiple fault detection models based on voting mechanism to improve model performance.Experiments show that the F1 values of RoC-ED-DA-DF,RoIdC-ED-DADF and RoBmIdC-ED-DA-DF models using data enhancement and dynamic fusion RoBERTa strategy at the Identification level of the test set reach 0.6186,0.6095 and0.6893,respectively.Among them,RoBmIdC-ED-DA-DF model is improved by 0.0157 compared with existing officially published models.The F1 value of Position level of the Ensemble model using the three strategies in the test set reached 0.4224,which increased by 0.0183 compared with the existing officially published models.Finally,a prototype system for automatic grammatical error detection of Chinese text on Web is developed.The system is based on the text grammatical error detection algorithm proposed in this thesis and constructed according to the software engineering specifications,which has certain practical value.
Keywords/Search Tags:Text grammatical error detection, Text classification, Sequence annotation, Data enhancement, Feature fusion, Text grammatical error detection prototype
PDF Full Text Request
Related items