Font Size: a A A

Effects of common -item selection on the accuracy of item response theory test equating with nonequivalent groups

Posted on:2004-02-23Degree:Ph.DType:Dissertation
University:Stanford UniversityCandidate:Michaelides, Michalis ParaschouFull Text:PDF
GTID:1468390011967785Subject:Educational tests & measurements
Abstract/Summary:
Three studies address issues on the accuracy of IRT test equating related to the selection of common items. The choice of items to include as common in a common-item nonequivalent groups design influences the equated scores and their accuracy. In the first study it was shown that the treatment of few common items that behaved in unexpected ways across administrations, i.e. the outliers flagged by the delta-plot method, could have substantial influence on equated score summaries. In two out of four assessments analyzed using four IRT methods for equating and two mixed IRT models for test calibration, mean scores, annual gains, and proportions above a cut score differed significantly depending on whether the outlying items were included in the equating or not. In the second study, the Mantel-Haenszel procedure, widely used in studies for identifying DIF, was proposed as an alternative to the delta-plot method and applied in a test-equating context for flagging common items that behaved differentially across cohorts of examinees. The Mantel-Haenszel procedure has the advantage of conditioning on ability when making comparisons of performance of two groups on an item. There are schemes for interpreting the effect size of differential performance, which can inform the decision as to whether to retain those items in the common-item pool, or to discard them. However, there may be some test-design limitations that preclude the application of this procedure in a test-equating framework. In the third study, the process of selecting common items to embed in more than one test form was treated as random; the amount of error due to the sampling of common items was quantified by an analytic formula derived using the delta method and by a computational bootstrap procedure. Compared to other sources of sampling and measurement error, the relative size of the common-item sampling error was small with respect to individual scores, but loomed large for group-level score interpretations.
Keywords/Search Tags:Common, Test, Equating, Accuracy, IRT
Related items