Font Size: a A A

Research On Statistical Analysis And Automatic Recognition Of Grammatical Errors In Modern Chinese

Posted on:2021-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:M J ZhongFull Text:PDF
GTID:2428330602472946Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the improvement of China's international status,more and more foreigners begin to learn Chinese,and foreigners will unconsciously make grammatical errors when writing articles.The research on the data set of grammatical errors is beneficial to the targeted teaching of Chinese as a foreign language,and the research on the automatic identification of grammatical errors is helpful to lighten the teaching burden of teachers.This paper studies the large-scale grammatical error data set,makes a statistical analysis of the words with grammatical errors,and studies the automatic recognition of grammatical errors,and puts forward three different models for experiments,the specific contents are as follows.(1)Statistical analysis of data sets of grammatical errors in modern Chinese.This paper makes a statistical analysis of two large-scale grammatical error datasets Lang-8 and HSK.Firstly,the data sets are preprocessed by removing the data that is meaningless to the statistics and retaining the data with grammatical errors.Then,different statistical methods are adopted to count the content words and function words with grammatical errors,and the statistical results are analyzed respectively,finally the two statistical results are compared and analyzed.The results show that the words with grammatical errors are regular,there are "?(de)","?(le)","?(zai)" and so on.(2)Research on automatic recognition of Chinese grammatical errors.Three different models are adopted for research,the Bi LSTM-CRF model with multiple features,the model joint Bi LSTM-CRF and CRF,and the Bi LSTM-CRF model based on BERT and attention mechanism.The multi-features in the multi-features' model include words,parts of speech,coarse-grained usage of functional words,fine-grained usage of functional words,dependency syntax,features are combined with each other.The model joint Bi LSTM-CRF and CRF combines the results of the two models and gets the final prediction results.The Bi LSTM-CRF model based on BERT and attention mechanism is different from the BERT + Bi LSTM-CRF model.The model proposed in this paper,BERT obtains dynamic word vector,Bi LSTM encodes the dynamic word vector,Bi LSTM decodes the encoded output,the attention mechanism obtains the internal information of the sequence at the decoding end,and the sequence is constrained by CRF to get the final prediction result.The experiment tested on the CGED 2018 dataset,trained on the CGED 2016 dataset.The experimental results show that the F1 value of the model incorporating function word usage and dependency parsing features increases by 1% on the first subtask,the F1 value of the joint Bi LSTMCRF and CRF model increases by 1% on the first two subtasks,and the BERT-based model performs best,it achieved the highest F1 value among the three sub-tasks,which were 0.7482,0.5015,and 0.2521 respectively.It increased by 5 percentage points at the level of automatic recognition,increased by 7 percentage points at the level of error type,and an increase of 16 percentage points at the position level.Experimental results show that the BERT-based model performs optimally.
Keywords/Search Tags:Grammatical error, Grammatical error recognition, Teaching Chinese as a Foreign Language, BiLSTM, CRF
PDF Full Text Request
Related items