Font Size: a A A

A Research Based On Decision Tree For Evaluating English Texts Difficulty

Posted on:2019-10-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y B FuFull Text:PDF
GTID:2405330548967036Subject:Education Technology
Abstract/Summary:PDF Full Text Request
Reading is the best way to improve English learning.With the rapid development of Internet technology and educational informatization,there are more and more available English reading materials online.However,it is easy for learners to get lost in the selection and filtering from massive amounts of online reading resources.So,it is difficult for ones to obtain satisfied reading resources accurately and efficiently.Therefore,how to provide learners with personalized reading materials that meet their ability levels and meet their learning needs has gradually gotten more and more attention in the field of educational technology.In order to provide learners with reading materials that meet their competency levels,it is necessary to evaluate the difficulty of reading materials.This paper proceeds from the textual factors which is the most important factor that affects the difficulty of English reading materials.Researches on textual difficulty(also known as readability)have been around for a hundred years.However,there are a few methods for measuring the difficulty of texts.According to the research results in the existing literature,the main methods for text difficulty measurement are the level evaluation method,the text readability formula method and machine learning algorithms.The level assessment method is too subjective.Although the formula method can quantify the readability of text objectively,but the measurement variables are few and lack of scientific reasoning modeling process.Machine learning is a relatively scientific research method,but it is used less currently and there is no specific research result.In machine learning algorithms,decision tree can make feasible and effective results for relatively large data sources in a relatively short period of time,and the result is easy to understand and interpret.Therefore,this paper attempts to propose a text difficulty evaluation method based on the decision tree classification algorithm in machine learning,expecting to improve the accuracy and scientificity of the difficulty assessment of English texts.The main contents of the paper include:First of all,the paper introduces the research techniques that are used in the article,including literature research method,mathematical statistics technique and decision tree classification algorithm in machine learning.What's more,it selects the eighth most influential indicators to represent the text difficulty from the 26 indicators that may affect text difficulty through experiments and the eight indicators are used as the attributes of the training data set for building decision tree.The eight indicators are:Total Words,Families,Number of PETS 1,Number of Baseword 1,Number of PETS 2,Average sentence length,Numbers of PETS 3 and Number of Clauses.The difficulty levels are defined as Junior middle,Junior high,Senior middle,Senior high,College-1 and College-2.Next,the study selects 360 texts from the teaching materials of the junior and high school textbooks published by People's Education Press,the 21st Century College English and New Horizon College English(second edition)as the training data set to construct the decision tree and prune the decision tree.Finally,the paper selects 120 texts from another 4 textbooks(including the junior textbooks published by Shandong Education Press,New Century high school English textbooks,College Intensive English and New Horizon College English(third edition))as the test data set to verify the generated decision tree model.The decision tree model constructed by the training data set has a correct classification accuracy rate of 92.50%for the test data set,which proves the validity of the generated decision tree model.The innovations of this paper are:(1)Comparing with the traditional level assessment method,which mainly relies on the subjective judgment of the experts,the method of this paper is more objective and scientific.(2)There are few measurement variables in the traditional formula method that usually quantifies the difficulty of texts through a linear relationship,while the decision tree can evaluate the text difficulty in various aspects.(3)Decision tree classification algorithm has never been used in the previous researches which use machine learning algorithms to evaluate the difficulty of English texts.What's more,the influencing factors in these researches are defined based on experienced experts subjectively,while the influencing factors in this study are selected by machine learning algorithms.
Keywords/Search Tags:Text Difficulty, Readability, Decision Tree, Influencing Factors, Attribute Selection
PDF Full Text Request
Related items