| Text readability is a quantitative study to measure the difficulty of reading texts,and its core lies in the scientific measurement of the influencing factors of textual difficulty.Readability research has important implications for language reading theory,language teaching testing,and grading of reading materials.The study of the readability of the new Chinese proficiency test(HSK)reading text and the construction of the readability formula have certain application value for the selection of HSK reading corpus,the preparation of HSK for Chinese as a second language learners,and the teaching of Chinese as a foreign language.Firstly,this study uses the new HSK level 4-6 reading texts as research materials,and constructs a HSK reading text corpus containing 174 paragraphs,348 articles,and 221,100 words.At the same time,combined with the relevant research results of the characteristics of the new HSK reading texts and the readability of Chinese texts,34 language feature indicators that affect the difficulty of text reading are selected from the three levels of Chinese characters,vocabulary and syntax.Subsequently,the Spearman correlation coefficient in statistics is used to measure the correlation between these 34 text feature indicators and HSK difficulty levels,and quantify the influence of each text feature on HSK difficulty levels.The results show that there is a negative correlation between the characteristic indicators such as the number of B-level words,few-stroke characters and the difficulty of the text.Language features such as the number of text words,superclass words,and nouns have the highest correlation with the text difficulty level,and show a typical positive correlation.Finally,the probability of the change of HSK difficulty level is used as the dependent variable,and the 12 text feature indicators that have been tested for significance are used as independent variables,and uses the multivariate ordinal logistic regression model to construct a new HSK level 4-6 reading text difficulty.Readability formula: Probability of change in HSK text difficulty = 3.763 + 0.309 × number of words-0.25 × number of firstclass words-0.205 × number of prepositions + 0.165 × number of second-class words +0.084 × number of super-class words + 0.066 × single sentence + 0.308 × Average sentence length.At the same time,the validity of the formula was tested using the validation text data set,and the prediction accuracy of the formula was 81.42%,indicating that the formula can more accurately predict the readability of HSK reading texts.From a practical point of view,correlation analysis can dig out the key features that affect the difficulty of HSK reading texts,and reveal the key points in teaching Chinese as a foreign language and HSK reading test preparation.The readability formula can realize automatic identification of the difficulty level of the text,and provide an important reference for the selection of HSK reading test materials and HSK preparation for Chinese as a second language learners. |