Font Size: a A A

Comparison of Different Equating Methods and An Application to Link Testlet-Based Tests

Posted on:2011-04-23Degree:Ph.DType:Dissertation
University:The Chinese University of Hong Kong (Hong Kong)Candidate:Zhang, ZhonghuaFull Text:PDF
GTID:1448390002469051Subject:Education
Abstract/Summary:
Test equating allows direct comparison of test scores from alternative forms measuring the same construct by employing equating procedures to put the test scores on the same metric. Three equating procedures are commonly used in the literature including the concurrent calibration method, the linking separate calibration methods (e.g. the moment methods and the characteristic curve methods), and FPC (Fixed Parameter Calibration) method. The first two types of methods for traditional IRT model have been well developed. FPC is being emphasized recently because of its utility for constructing item bank and computerized adaptive testing (CAT). However, there are few studies that examine the equating accuracy of the FPC method compared to that of the linking separate calibration method and the concurrent calibration method.;The equating methods for the traditional IRT model are not appropriate for linking testlet-based tests because the local independence assumption of IRT model cannot be held for this type of tests. Some measurement models, such as testlet response model, bi-factor model, and Rasch testlet model, were advanced to calibrate the models for the testlet-based tests. Few equating methods, however, that take into consideration the additional local dependence among the examinees' responses to items within testlets have been developed for linking testelet-based tests.;To address the need to better understand the FPC method and to develop new equating methods for linking testlet-based tests, the studies were to compare the effectiveness of the three types of equating methods under different linking situations and to develop equating methods for linking testlet-based tests. Besides the equating methods concerned, other factors, including sample size, ability distribution, and characteristics of common items and testlets that might affect equating results were also considered. Three simulation studies were carried out to accomplish the research purposes.;The first study compared the equating accuracies of the FPC, the linking separate calibration, and the concurrent calibration method based on the IRT model to equate item parameters under different conditions. The results indicated that the FPC method using BILOG-MG performed as well as the linking separate calibration method and the concurrent calibration method for linking the equivalent groups. However, the FPC method produced larger equating errors than the other two methods did when the ability distributions of the base and target groups were substantially nonequivalent. Differences in difficulties between the common items set and the total test did not substantially affect the equating results with the three methods, with other conditions being held equal. As expected, both small sample size and less number of common items led to slight greater equating errors.;The second study developed an item characteristic curve method and a testlet characteristic curve method for the testlet response model to transform the scale of item parameters. It then compared the effectiveness of the characteristic curve methods and the concurrent calibration methods under different conditions in linking item parameters from alternate test forms which were composed of dichotomously scored testlet-based items. The newly developed item characteristic curve method and the testlet characteristic curve method were shown to perform similarly as or even better than the Stocking-Lord test characteristic curve method and the concurrent calibration method did. Ignoring the local dependence in model calibration substantially increased equating errors. And larger testlet variances for the common testlets led to greater equating errors.;The last study used the concurrent calibration method under the multidimensional Rasch testlet model to link the testlet-based tests in which the testlets were composed of dichotomous, polytomous, and mixed-format items. The results demonstrated that the concurrent calibration method under the Rasch testlet model worked well in recovering the underlying item parameters. Again, equating errors were substantially increased if the local dependences were ignored in model calibration. And smaller testlet variances for the common testlets led to more accurate equating results.;The results of the studies contribute to a better understanding of the effectiveness of the different equating methods, particularly those for linking testlet-based tests. They also help clarify influences of the other factors, such as characteristics of the examinees, features of the common items and common testlets on equating results. Testing practitioners and researchers may draw useful recommendations from the findings about equating method selection. Nevertheless, generalizations of the findings from the simulated studies to practical testing programs should be cautious.;Keywords: Equating, IRT, Testlet Respons Model, Rasch Testlet Model, LSC, Concurrent, FPC...
Keywords/Search Tags:Equating, Testlet, Method, FPC, Model, IRT, Different, Studies
Related items