Font Size: a A A

The effects of content homogeneity and equating method on the accuracy of common-item test equating

Posted on:1998-04-14Degree:Ph.DType:Dissertation
University:Michigan State UniversityCandidate:Yang, Wen-LingFull Text:PDF
GTID:1465390014479280Subject:Education
Abstract/Summary:
Often in educational testing and measurement, people use alternate test forms to achieve comparable test scores for measuring growth or ensuring test security. To obtain valid comparisons between groups and to enhance test fairness, they rely on various equating techniques to equate forms of the same test. It is important to evaluate the adequacy of these equating methods and the accuracy of their outcomes. In my dissertation, I studied the effects of test characteristics on the accuracy of equating outcomes when different methods were used to equate two test forms of a test. Specifically, I wanted to know whether equating accuracy improves with a test made of content-homogeneous items, whether it improves with an anchor test that is content-representative of its total test, and whether such content effects depend on the particular equating method used for equating. My major goal is to improve test results, which often lead to critical educational decisions.;The data I analyzed is the test results from a professional in-training examination. It has a negatively skewed score distribution because the test was written for a minimum-competency examination. In equating practice, such test outcome receives less attention than it should have. The common-item equating design was used because the two groups of examinees taking different forms were not randomly formed or assigned. I used an item-sampling design to create four tests that differ in the content homogeneity of their items and the content representativeness of their anchor items. All the items in these tests are from one overall content domain, but fall into 23 different content areas. Each of the four tests has two forms, and a set of common anchor items is embedded in each form. I applied linear, equipercentile, and two IRT-based equating methods to equate the two forms of each test. By means of the item-sampling designs, I was able to establish two innovative criteria based on true score for evaluating the accuracy of equating outcomes from these methods. I also used two other criteria based on the outcomes of arbitrary equatings to examine how well equating accuracy is estimated with such criteria. I also elaborated on issues of construct validity and test dimensionality, which are relevant to test equating.;Overall, I found that all the equating methods yielded accurate results to a moderate degree. They all produced more accurate results when the anchor items were more representative of the total test, or when the items in a test had homogeneous content. Therefore, to improve equating accuracy, I recommend an inclusion of anchor items that fully reflect the overall test content. I also found that the IRT-based equating outcomes were more accurate than the outcomes from the other equating methods. However, the differences are small thus may not have practical significance. If the degree of equating accuracy is critical for decision-makings of a testing program, such as high-stake examinations, IRT-based equating methods are recommended.
Keywords/Search Tags:Test, Equating, Accuracy, Content, Forms, Effects, Anchor items
Related items