Font Size: a A A

Validity Of Automated Essay Scoring

Posted on:2016-12-09Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2295330467996260Subject:English Language and Literature
Abstract/Summary:PDF Full Text Request
In order to assess the validity of Pigai automated essay scoring system,150CET-4essays and150CET-6essays were randomly selected from Chinese Learner English Corpus (CLEC) as the research samples to locate the difference between Pigai scoring and human scoring.The thesis purports to answer the following questions:(1) To what extent do the scores assigned by Pigai agree with those by expert human raters?(2) What type of essay tends to be misjudged by Pigai?(3) How do quantitative features of an essay influence automated and human scores?The procedures of the present study are as follows:(1)150CET-4and150CET-6pre-scored essays with human scores ranging from6points to15points were randomly selected from CLEC. All the essays were then submitted to Pigai platform to obtain automated scores.(2) EXCEL and SPSS18.0were used to compute the three indexes: exact-plus-adjacent agreement, Pearson correlation coefficient r and maximum score difference.(3) The essay errors previously coded in the corpus were checked, classified and quantified. The analytical tools Vocabprofilers, L2Syntactic Complexity Analyzer, and Coh-Metrix3.0were employed to analyze the essay features at the lexical, syntactic and discourse levels. Then, all these quantified features were treated as independent variables and the Pigai scores and human scores as the dependent variables. Finally, a multiple regression analysis was carried out to establish regression equations of the human scores and Pigai scores.The major findings include:(1) The three indexes show that Pigai is unable to assess CET essays reliably. Although the exact-plus-adjacent agreement rates of CET-4and CET-6essays are moderately high, the correlation study reveals no correlation between Pigai and human scoring for CET-4essays and there is only a weak correlation between Pigai scores and human scores of CET-6essays. Additionally, the maximum score differences between human and Pigai scoring are quite high, posing great challenges to the validity of Pigai.(2) Paired-sample t test demonstrates that the human scores are significantly higher than Pigai scores. All the analyses show that Pigai tends to misjudge essays with high human scores.(3) The quantitative features have greater impact on Pigai scores. Multiple regression analysis reveals that11variables of errors, vocabulary, syntax and discourse have predictive power for the human scores and that they can account for less than25%of the score variance. Moreover,13variables can predict the machine scores, accounting for more than65%of the score variance.Finally, the thesis points out that (1) Different rating scales and sampling for the establishment of AES models may lead to lower agreement and coefficients in comparison with the results of the foreign studies;(2) Pigai tends to misjudge high-quality essays, which can be explained by the internal weakness of Pigai, namely its inability to read, appreciate and judge an essay and incapacity to analyze deep-level syntactical patterns or lexical collocations;(3) The quantitative features of the sample essays have different impacts on the human and machine scores and this may be also explained by the fact that machine scoring mainly depends on the surface features that are easily computed and it cannot imitate human scorers to identify the deep features of an essay.
Keywords/Search Tags:Pigai, validity, CET-4&CET-6essays
PDF Full Text Request
Related items