Font Size: a A A

Chinese Grammar Error Analysis Based On Deep Learning And Its System Implementation

Posted on:2021-05-14Degree:MasterType:Thesis
Country:ChinaCandidate:Z W LiFull Text:PDF
GTID:2428330620964042Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the enthusiasm of foreigners for learning Chinese,the development of a Chinese grammar error analysis tool will greatly help the teaching and learning of Chinese.However,there are more mature English grammar error analysis tools on the market,while Chinese-related tools are still to be developed.Therefore,we aim to develop a Chinese grammar error analysis system in this thesis.The system can determine whether there is a grammer error in a Chinese text segment(Detection),identify the types of grammar errors contained in the text segment(Identification),and indicate the position of the grammar errors(Position).Based on the existing research,a newer research method is proposed to implement the Chinese grammar error analysis model,and a simple system is developed to implement the model.Firstly,in this thesis,the task is considered as a sequence labeling problem using LSTM-CRF as the basic model.Based on this,three improved models are proposed from the perspective of model input and model network structure: The first is a Chinese grammar error analysis model based on fused word embeddings.This model is based on the LSTM-CRF model and uses the fusion word embedding technology to process the model input.The second is a Chinese grammar error analysis model based on Multi-Task Learning.Based on the first improved model,this model splits the task into two tasks: the text classification task and the sequence labeling task.A multi-task learning mechanism is introduced to realize these two tasks for common learning.The third is a Chinese grammar error analysis model based on BERT model.Based on the second improved model,this model uses a BERT model instead of the fusion word embedding model in the feature representation layer.Secondly,in this thesis,we conduct comparative experiments on the four Chinese grammar error analysis models designed in this thesis on the three test sets of CGED2016_HSK,CGED2017 and CGED2018.Experiments show that the three improved models all improve the model performance to a certain extent.Among them,the BERT-based Chinese grammar error analysis model has the best effect.The best F1 values obtained during Detection,Identification,and Position levels are 0.758,0.517,and 0.330,respectively.At the same time,the best F1 value of this thesis is compared with the best F1 value of the CGED2017 and CGED2018 champion teams.This thesis is 0.012 and 0.008 higher in the Detection level than the CGED2017 champion team and the CGED2018 champion team,respectively,and it is 0.061 higher than the CGED2017 champion team in the Position level.The rest of the evaluation results are closer to the best results of the two champion teams.I use less manual intervention and simpler model training method than in this thesis the two champion teams,and still can get better results,which proves that the model proposed in this thesis is effective.Finally,we build a Chinese grammar error analysis system to implement the model.The test of the system's grammar error analysis function shows that the system has good practical application value.
Keywords/Search Tags:Chinese grammar error analysis, LSTM-CRF, Fused word embeddings, Multi-Task Learning, BERT
PDF Full Text Request
Related items