Font Size: a A A

Effects of scoring method and rater experience on ESL essay rating processes and outcomes

Posted on:2009-12-09Degree:Ph.DType:Thesis
University:University of Toronto (Canada)Candidate:Barkaoui, KhaledFull Text:PDF
GTID:2445390005960158Subject:Education
Abstract/Summary:
This study examined the effects of scoring method and rater experience on ESL essay rater performance. Each of 31 novice and 29 experienced raters rated 24 essays using a holistic and a multiple-trait scale. Interviews and think-aloud protocols provided data about the participants' decision-making behaviors and the aspects of writing they attended to. Essay scores were analyzed to estimate rater severity and self-consistency and the relationships between the multiple-trait and holistic scores.;Novices exhibited greater intra- and inter-rater variability and tended to refer more frequently to the rating scale, to focus on local aspects of writing more often, and to spend more time interpreting and/or editing text than the experienced raters did. Experienced raters tended to refer more frequently to other criteria than those in the rubrics, to report more judgment strategies and rhetorical and ideational focus, to spend more time reading and assessing the essays overall, and to be more efficient, confident, self-consistent, and homogeneous in their ratings than were the novices.;Scoring methods seem to have a greater effect on the severity of experienced raters and the self-consistency of novices. In addition, multiple-trait scoring focused the novices' attention on the criteria in the scale and led them to organize these criteria coherently and to employ more judgment strategies, thus making the rating task manageable and improving their self-consistency. The effects of scoring methods on experienced raters' performance were less pronounced.;Overall, these findings suggest that multiple-trait scoring is most appropriate for assessing L2 writing. However, the two scoring methods might be useful for different assessment purposes, contexts, raters, and examinee populations. The thesis also has implications for test validation research.;The findings indicated that both scoring methods measured the same construct, but the multiple-trait method allowed finer distinctions among examinees in terms of writing ability. Holistic scoring resulted in higher inter-rater reliability, while multiple-trait scoring led to higher rater self-consistency, particularly for novices. Multiple-trait scoring prompted more judgment and self-monitoring strategies, while holistic scoring elicited more interpretation strategies and language focus. Furthermore, multiple-trait scoring reduced the complexity of the rating task and prompted raters to attend to all rating criteria in the scale.
Keywords/Search Tags:Scoring, Rater, Rating, Essay, Effects, Method, Criteria, Scale
Related items