Font Size: a A A

Issues and challenges in current generalizability theory applications in rated measuremen

Posted on:2015-06-22Degree:Ph.DType:Dissertation
University:University of Illinois at Urbana-ChampaignCandidate:Lin, Chih-KaiFull Text:PDF
GTID:1455390005981955Subject:Educational tests & measurements
Abstract/Summary:
This dissertation pertains to theoretical and applied investigations of G theory in rated measurement. The introduction (Chapter 1) sketches an overarching theme that situates the separate papers in a thematic unity and also provides a brief summary of each paper. The first paper (Chapter 2) reports on findings from comparing two analytical methods, under the G-theory framework, which are designed to analyze sparse rated data commonly observed in performance-based assessments. The rater method identifies blocks of fully crossed sub-datasets and then estimates variance components based on a weighted average across these sub-datasets, while the rating method forces a sparse dataset to be a fully crossed one by conceptualizing ratings as a random facet and then estimates variance components by the usual crossed-design procedures. This paper aims to compare the estimation precision of the two methods via a Monte Carlo simulation study and an empirical study. Results show that when all raters are expected to be homogeneous in their score variability, either method has good estimates of variance components. However, when some raters exhibit more variability in their ratings than others, the rater method yields more precise estimates than the rating method.;The second paper (Chapter 3) is carried out in the context of examining correspondence between English language proficiency (ELP) standards and academic content standards in the US K-12 setting. Such correspondence studies provide information about the extent to which English language learners are expected to encounter academic language use closely associated with academic disciplines, such as mathematics. This paper describes one approach to conducting ELP standards-to-standards correspondence research based on reviewer judgments, and it also touches on reviewer consistency in judging the cognitive complexity of the target standards. Results suggest that there seems to be a relationship between reviewer consistency in their judgments and the level of specificity in the target standards. As an extension of the second paper, the third paper (Chapter 4) seeks to advance new applications of G theory in correspondence research and to examine reviewer reliability in relation to the numbers of raters. Ratings of the cognitive complexity germane to language performance indicators were collected from 28 correspondence studies with over 700 trained reviewers, consisting of content-area experts and English as a second language (ESL) specialists. Under the G-theory framework, reviewer reliability and standard errors of measurement in their ratings are evaluated with respect to the numbers of reviewers. Results show that depending on the particular grades and subject areas, 3-6 reviewers are needed to achieve an acceptable level of reliability and to control for a reasonable amount of measurement errors in their ratings.;The fourth paper (Chapter 5) attempts to advance the discussion of nonadditivity in the context of G-theory applications in rated measurement. Nonadditivity occurs when some or all of the main and interaction effects, pertaining to the objects of measurement and measurement facet(s), are significantly correlated. As such, the paper analytically and empirically illustrates the distinction between additive and nonadditive one-facet G-theory models. In addition, the paper aims to explore existing statistical procedures of detecting nonadditivity in data. Tukey's single-degree-freedom test for nonadditivity is evaluated in terms of Type I error and statistical power. Results show that the test is satisfactory in controlling for occurrences of erroneously identifying nonadditivity (Type I error) and that the test is successful in identifying one type of nonadditive interaction (power).;Finally, the conclusion (Chapter 6) functions as a discussion of some unsolved issues in G-theory applications and ideas for future research. First, issues regarding the use of many-facet Rasch measurement to complement G-theory analysis are discussed. Second, given that a performance test usually involves examinee responses being rated on a discrete ordinal scale, the consideration of the discrete ordinal nature in measurement variables under the G-theory framework is an unsolved area of research. Finally, nonadditivity in multi-faceted G-theory models is also an area that deserves more research efforts because most performance tests would entail more than one measurement facet, such as those associated with raters and tasks. (Abstract shortened by UMI.).
Keywords/Search Tags:Measurement, Rated, Theory, Chapter, Applications, Paper, Issues, Raters
Related items