| Object: The reliability coefficient for Item Response Theory (hereinafter referredto as IRT reliability) is a global index of precision to the ability distribution ofexaminees, whereas the test information is conditional on a particular level of ability.In this study, we use goodness of fit and estimation methods of IRT reliability asindependent variable to find out which estimation methods of IRT reliability performsbetter at its certain level.Method: This research adopts3(goodness of fit)×5(estimation methods of IRTreliability) mixed experimental design running computer simulation program codedby R. The within-subject factor is estimation methods of IRT reliability, including fivelevels: total reliability, marginal reliability, theoretical reliability, empirical reliabilityand Nicewander reliability. The between-subject variable is goodness of fit, includedthree levels:“highâ€,“medium†and “lowâ€. An absolute value of reliability estimationerror is adopted as dependent variable.Results: The main effects of estimation methods of IRT reliability weresignificant. The main effects of goodness-of-fit were significant. The higher goodnessof fit is, the less IRT reliability estimation error becomes, and vice versa. Thegoodness of fit and estimation methods of IRT reliability have significant interactiveeffects on IRT reliability estimation error. The estimation error of the marginalreliability is the least one affected by goodness of fit, while the estimation error ofNicewander reliability is the most one affected by goodness of fit. That shows themarginal reliability is the most stable one and Nicewander reliability is most sensitiveto goodness of fit. At “high†level of goodness-of-fit, total reliability has the leastestimation error. Nevertheless, at “medium†level and “low†level of goodness of fit,marginal reliability has the least estimation error.Conclusion:Need to chose appropriate methods toestimate IRT reliability.Whengoodness-of-fit is good, total reliability is best. But, at “medium†level and “lowâ€level of goodness of fit, marginal reliability is the first choice. |