| Argument-based approach to validating language tests can be traced back at least to the 70s and 80s, from which time more attention began to be drawn to the importance of both verification of positive explanations and falsification of rival hypotheses of test validity. In recent years, argument-based approach has been widely accepted and used in our validation practice. However, as to how to go about arguing, there is no unanimous consensus; on the contrary, hot debates can be found in the recent publications. Following the applications and debates, two aspects of validity arguments are increasingly becoming more of our concern: the logic of argument and the interpretation of validity.Firstly, the present thesis analyzes the logic errors of three most influential argument-based validation frameworks, Assessment Use Argument—AUA AUA (Bachman, 2005, Bachman & Palmer, 2010), Evidence-Centered Design—ECD (Mislevy et al, 2003) and Interpretive Argument—IA (Kane, 1990, 1992, 2004). Because all the three models claim that their argument structure is the Toulmin structure of argument, a comparative study between these models and the Toulmin model (Toulmin, 2003) is carried out. The results show that all have modified to the basic structure of the Toulmin model before applying it to their frameworks and the modifications have caused serious logicality problems: 1) the reasoning process is an endless loop; 2) the argument is a typical paradox; 3) the claim is in fact a hypothesis. When there is no claim, the model is no longer an argument model. But even though there is a hypothesis, the model is not a hypothesis testing model either, because there is no conditional mechanism to decide whether to accept or reject the hypothesis.Further studies show that the causes of the logic errors are similar too. Due to a misunderstanding and misuse of the Toulmin Rebuttal, all counterclaims, including counter explanations and rival hypotheses, are regarded as Toulmin rebuttals. As matter of fact, the Toulmin Rebuttal refers to"the sorts of exceptional circumstance may in particular cases rebut the presumptions the warrant creates"(Toulmin, 2003, p.99, emphases added), which is just like the significance level (α) in the hypothesis testing. By nature, rebuttals belong to low probability events that can be and has to be ignored, but the modified versions emphasize that rebuttals be either verified or falsify before making a claim. This is exactly what causes the logicality problems.Secondly, the thesis proposes a new argument model called the Progressive Argument, which not only possesses a logical reasoning mechanism, but also incorporates scientific inquiry into rational reasoning. As is often the case of rational reasoning, the data must be simplistic and the warrant must be self-evident, in order for the claim to be plausible and easily accepted. But test data is often sophisticated and hardly any conclusion can be drawn without scientific inquiry. In face of complicated test data, data analysis has to be done so that more evidential data can be generated to authorize the logic reasoning process.To that end, the progressive argument embeds in its base structure two more elements in the Toulmin model, a Conditional to direct the reasoning procedure and an Analysis to carry out data analysis. Every time before starting the rational reasoning, the Conditional is invoked to decide whether there are sufficient warrants to authorize the reasoning step. If the condition is satisfied, the process is led into a Toulmin reasoning procedure; and if not, the process is directed into a data analysis procedure to generate new evidence. By including an Analysis element, the model possesses a recursion mechanism, which means that the justification of a claim may involve a recursive use of the Progressive Argument and the claim is based on the progression of all the sub-claims of the recursion steps. This is the reason why the argument is given the name Progressive Argument.Thirdly, this thesis proposes a construct-centered, stage-based progressive view of test validity, shortened as Progressive Validity. According to this view, test validity refers to the progression of the validity of all its stages; stage validity is defined as the extent to which data produced at the stage is an accurate representation of the target construct of the test; and validation is the process of providing evidence to justify claims about stage validity or test validity.The progressive view stresses that data produced at every stage should be representative of the target construct and all stages should be centered on the same construct. That is to say, when collecting data to validate a stage or the test, the evidence must be construct-centered. It also stresses that test validity lays its foundation on stage validity. For a test to be valid every stage has to be valid in the first place; if one stage is invalid, then the whole test is invalid. However validity progression is not like percentage accumulation, the validity of a test is no more than the lowest stage validity. Validity is a matter of degree by nature, but at the same time a stage or test can also be either"valid"or"invalid". If the degree is high enough so that it is acceptable, then the stage or test is valid; or on the contrary, if the degree is too low to be acceptable, then the stage or test is invalid. By saying that a stage or test is valid or invalid, we do not mean to propose an absolute assertion, but rather a qualitative evaluation that contains our fundamental attitude towards the test.Another point that is of critical importance in the progressive view of validity is that validation should not be limited to score interpretation and use. Validity begins to emerge from the onset of test design. Before the test is administrated, there exists expected validity; after the administration, actual validity comes into being. To guarantee that actual validity is desirable, expected validity has to be justified—plausible interpretations need to be achieved, appropriate decisions to be made, intended and unintended consequences to be anticipated.Fourthly, this thesis advocates, from the perspective of cognitive processing of discourse information, an information processing view of language use ability. It is stressed that it is not enough to have only macro-level classifications of language ability components or cognitive processes, micro-level discourse and semantic analyses play a far more substantial role in language use. To attain a more accurate measure of language use ability, item-writers need to consider to what degree the candidates can process information contained in the test with expected accuracy and speed; raters need to carry out in-depth analyses of the quality and quantity of the discourse generated by the candidates.Information processing requires a practical solution to quantifying and computing the semantic items in specific discourses. Inspired by Systematic, Informatics and Cybernetics, a system framework and an ability model of cognitive processing of discourse information have been constructed and under the guidance of Object-Oriented Knowledge Representation Theory, the semantic structure and semantic unit for computing semantic items are proposed, on the basis of which an algorithm of cognitive quantification of discourse information and an item-writing method called Information-Maximization Item Development (IMID) have also been developed.Fifthly, this thesis includes two application examples to illustrate how to apply IMID and the Progressive Argument in test development stage. In the first example, four multiple-choice items were created by using IMIN method. Each item has four options and all the items are based on the same 150-word passage. The second example is an empirical study designed to develop falsification arguments against option guessability for the purpose of taking control of multiple-choice item-writing quality. The example investigates the listening and reading comprehension parts of 3 NME papers, with a total of 74 items, 259 options. The findings show that more effective measures need to be taken to better control option guessability.Due to the wide range of issues covered in this thesis the present study has to refrain from digging deeper into the different stages of language testing. In the meanwhile, the progressive argument model, information ability model and IMID method all await further research and feasibility test. |