Vocabulary evaluation is an indispensable field of second language acquisitionresearch. Lexical richness, which includes lexical diversity, lexical sophistication,lexical density and lexical originality, is an important way to measure L2learners’ useof vocabulary in their spoken or written production. Its classification andmeasurement accuracy have become hot issues in second language vocabularyassessment in recent decades. One of the most controversial topics is the measurementof L2lexical sophistication. However, all of the three measures–Lexical FrequencyProfile (LFP), P_Lex and Advanced D, which are used in previous research on L2lexical sophistication, have claimed their measure to be the best, but there is noconsistent agreement on which measure is the best among them. Therefore, this studyintends to compare the three important measures of L2lexical sophistication so as tofind which one is the best and provide some insights for measurements of lexicalproduction. The thesis addresses the following research questions:1) Of the three measures, which one is the most reliable in terms of L2lexicalsophistication?2) Of the three measures, which one is the most valid in terms of L2lexicalsophistication?The reliability of the three measures was examined by two sets of essays writtenby50third-year English majors. For the purpose of comparison, L2learners wereasked to produce two argumentative essays that were written during a two-weekinterval, each finished within forty minutes in class.The validity of the three measures was investigated in terms of construct validityand concurrent validity. Construct validity in the three measures concerned the effectof both text length and the variety of the advanced types. Descriptive analyses of threeessays from Corpus for English Majors were conducted to test the validity of the threemeasures and give the evidence to the construct validity. The scores of the threeessays were18,13and9that respectively represent “advancedâ€,“intermediate†and “low†level writing proficiency. The computer simulation data on the basis of thethree essays were collected to examine the effect the variety of the advanced typescould exert on the three measures. In the first simulation, the total number ofadvanced tokens remained unchanged in the accompany of a decrease in the numberof advanced word types. However, in the second simulation the number of bothadvanced tokens and types of each version declined in synchronization.80argumentative TEM-8essays were randomly extracted from Corpus forEnglish Majors to investigate the concurrent validity of the three measures. Thequality of each essay was differentiated in terms of writing score (the total score was20), with the highest being18and the lowest8. Most students got12and13pointsfor their essays. According to the writing score assigned to each, these essays werecategorized into two groups: the group of low level writing proficiency (group1,42students) whose scores were less than13; the group of high level writing proficiency(group2,38students) whose scores were more than12.The major findings are displayed as follows:As for reliability, Pearson correlation analysis shows significant correlationsbetween the two sets of essays in terms of L2lexical sophistication (LFP: r1=.327, p1=.020<.05; P_Lex: r2=.308, p2=.030<.05; Advanced D: r3=.441, p3=.001<.05).A paired-samples t test demonstrates a significant difference in mean lexicalsophistication scores between the two sets of essays (LFP:t1=4.804, p1=.000<.05ï¼›P_Lex:t2=8.837, p2=.000<.05ï¼›Advanced D:t3=-2.742, p3=.008<.05).As regards construct validity, in measuring the effect of text length on the threemeasures, the minimum text length for the LFP, the P_Lex and the Advanced D to bereliable is200words,120words and120words. But the Advanced D is considered tobe better than the P_Lex. Because at the low-level and intermediate-level theAdvanced D stabilizes quickly from the length of60words while the P_Lex stabilizesfrom the length of120words. In testing the effect of the variety of the advanced typeson the three measures, the lexical sophistications of the three essays at the threedifferent levels in the first simulation measured by the LFP and the P_Lex almostremain unchanged and decrease or fluctuate in the second simulation. However, the lexical sophistications of the three essays at the three different levels in anysimulation measured by the Advanced D decline at different degree. This might showthe Advanced D is superior to the LFP and the P_Lex in measuring the effect of thevariety of the advanced types on lexical sophistication.As far as concurrent validity is concerned, the three measures find a positive butweak correlation between each measure and its writing quality (LFP: r1=.248, p1=.027<.05; P_Lex: r2=.253, p2=.024<.05; Advanced D: r3=.257, p3=.021<.05)and another correlation analysis shows the lexical sophistication scores of the threemeasures are significantly related to lexical diversity scores. The moderatecorrelations are found between LFP and Advanced D, while the weak correlation isfound in P_Lex (LFP: r1=.332, p1=.003<.05; Advanced D: r3=.340, p3=.002<.05; P_Lex: r2=.236, p2=.035<.05).To sum up, the Advanced D and the P_Lex are better than the LFP in the way ofcontrolling the effect of text length and the Advanced D is superior to the LFP and theP_Lex in measuring the effect of the variety of the advanced types on L2lexicalsophistication. For these reasons, the Advanced D is the best among the threemeasures. This thesis can make contributions to the measure of L2lexicalsophistication in methodology and provide important practical implications forvocabulary evaluation. |