Font Size: a A A

Research On Automatic Scoring Of L2 Chinese Composition Based On Fusion Strategy

Posted on:2022-11-20Degree:MasterType:Thesis
Country:ChinaCandidate:L L DongFull Text:PDF
GTID:2518306752493334Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
The upsurge of Chinese learning at home and abroad is rising day by day.As an important part of reflecting the Chinese level,the application of automatic scoring can alleviate the problems such as low efficiency of manual scoring and scoring deviation caused by subjectivity.Driven by NLP and deep learning technology,the automatic scoring task is more comprehensive and detailed.Automatic scoring of Chinese composition from shallow features such as words and sentences can not comprehensively evaluate the quality of composition but should pay attention to deep semantic information at the same time.Therefore,the establishment of a fusion model that captures the shallow and deep linguistic features of composition is of great significance to improve the performance of the scoring system.In the Chinese composition scoring model,this paper studies how to capture the shallow and deep semantic information of composition and improve the scoring effect.Firstly,using the statistical information of composition,this paper constructs a baseline model based on a multiple regression model;Secondly,aiming at the poor effect of the manual statistical shallow feature model,a scoring model based on a deep neural network is proposed to realize feature autonomous extraction;Finally,aiming at the low scoring performance of a single model,a scoring model based on fusion strategy is proposed to improve the overall scoring performance.The main work of this paper is as follows:(1)Through the study of the shallow characteristics of L2 Chinese composition,17characteristics that can reflect the quality of Chinese composition are put forward.Firstly,30 shallow features in four levels of words,words,sentences and articles that can reflect the quality of composition are counted,a baseline model based on multiple regression analysis is established,and the fitting degree of multiple linear regression model is observed by RMSE and R~2.Secondly,aiming at the evaluation of automatic scoring results of the model,this paper uses Pearson correlation,Spearman correlation and QWK to evaluate the performance of scoring.Finally,the experiment shows that among the statistical features,17 features strongly related to the scoring results are obtained,and the results predicted by these features are in good agreement with the manual scoring.(2)Aiming at the problem that the baseline model needs to construct the feature set manually,a deep neural network is proposed to extract the composition features independently to improve the scoring effect.Firstly,CNN is used to independently learn the vocabulary and local semantic information in the composition,and LSTM is used to overcome the long-distance dependence and learn the sequence information between sentences.Secondly,considering the different contributions of different parts of the composition to the scoring,the model adds an attention mechanism to give more weight to the scoring related features to improve the scoring effect.Finally,experiments show that the proposed ACRNN scoring model not only solves the problem of time-consuming and laborious manual annotation,but also improves the effect of automatic scoring for small composition data scale,which proves the usefulness and effectiveness of the scoring model.(3)Using the pre-trained language model can capture the deep semantics of the text,build a fusion model to learn the shallow and deep features of the composition,and improve the performance of the scoring model.Firstly,the pre-trained language model is used to autonomously learn the deep linguistic information such as discourse coherence and semantic differences.Secondly,the model based on shallow linguistic features is stacked and fused with the model that independently captures deep semantic information.Finally,the experiment shows that compared with the baseline model,the model fusion method can not only improve the scoring performance,but also enhance the interpretability of the results,and can achieve better results in the overall scoring of L2 Chinese composition.
Keywords/Search Tags:Automatic scoring of Chinese composition, Linear regression, Pre-trained Language Model, Natural Language Processing
PDF Full Text Request
Related items