| With the pervasiveness of computer and rapid emergence of new technologies,there is an ongoing shift towards raters marking digitally scanned copies of examination scripts or the originally word-processed essays on screen rather than the original paper and pencil scripts.But before transferring from one rating mode to a new one,it is crucial to ascertain the reliability and validity of the ratings of the new modes.However,a growing body of empirical studies comparing between modes of testing and assessment consider how the modes of assessment might affect candidates’ performances while a narrow literature considers whether the different rating modes may have an impact on rater performance.To investigate whether raters’ performance is consistent in different rating modes and whether the modes of marking have any influence on rater behavior,the writer conducted a study on rater severity and behavior across three different rating modes in a continuation writing task.Six raters were invited to rate essays written by college EFL students in China in three rating modes: paper-based marking for paper-and-pencil essays(PBM),online marking for word-processed essays(OM),and onscreen marking for digitally scanned images(OSM).After the rating,to explore the inner thoughts of the raters while rating,raters were required to give retrospective explanations regarding their rating behaviors during their scoring session.Then MFRM model was constructed to analyze the ratings obtained from rating session and the interview recording was transcribed to detect the rater’s rating behavior across the three rating modes.Results revealed that raters could produce reliable and consistent scores across the three rating modes.But raters did not all exercise the same level of severity across all three rating modes.Raters were harsher in scoring handwritten essays(PBM)than word-processed essays(OM),though OSM and PBM were comparable.And Raters tended to associate the quality of writing with the quality of penmanship.Raters tended to find more grammatical errors in PBM and OSM than in OM.Raters preferred re-reading scripts on paper while were reluctant to re-call scripts on screen,and experienced a sense of intimacy in PBM and a sense of rhythm in OM but a sense of stress in OSM.The findings of the study suggest that OSM and OM can be usedlike PBM without losing their reliability,but quality-control measures should be taken differently in different rating modes to ensure the rating reliability. |