Font Size: a A A

Research On Text Readability Assessment Based On Neural Network Models

Posted on:2024-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:W B LiFull Text:PDF
GTID:2568307070499074Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Readability assessment involves analyzing the level of difficulty of a given text to determine its appropriate reader level.Several studies have demonstrated that easy-to-read texts improve comprehension,enhance reading enjoyment,increase reading speed,and encourage readers to continue reading.This paper utilizes neural network models to assess the readability of Chinese texts at both the passage and book level.For the passage-level readability assessment,we propose a revolutionary approach that synergizes linguistic and neural network features.First of all,we designed rich traditional linguistic features,and trained the Correlation Explanation(Cor Ex)topic model in a semisupervised way with a vocabulary with different difficulty levels to obtain the difficulty-aware topic features.Secondly,we utilized the pre-trained model to obtain the neural network features,and filter out the linguistic features orthogonal to the neural network features through feature projection.Finally,we fuse the two features to get the final passage-level text representation.In addition,we also designed a novel length-balanced loss to deal with the problem of uneven length distribution of readability assessment data.We conducted a series of experiments on three English benchmark datasets and one Chinese textbook dataset.The results show that our model has better performance than the traditional model and the pre-trained model.Moreover,our model obtains comparable results with human experts in consistency test.For the book-level readability assessment,we propose a two-stage separated modeling approach with difficulty-aware segment pre-training and multi-view difficulty representation.In the first stage,we first split the book data into fixed-length segments and label a unique difficulty label to each segment based on the linguistic features of the segment using an unsupervised clustering algorithm.Second,the labeled segment data is used to finetuning the pre-trained model to enhance the model’s ability to represent difficulty knowledge.In the second stage,we leverage the model trained in the previous stage to perform inference and extract the difficulty semantic features of the segments.This results in a new representation of the book in the form of a sequence of segment difficulty semantic features.Given that books are often quite lengthy,they often comprise distinctive levels of difficulty information.We set multiple difficulty representation vectors to represent the different levels of difficulty information in the book separately by cross-attention.Ultimately,we merge these difficulty representations to generate a final book-level text representation.We have assembled a graded children’s book dataset,which comprises of popular stories and renowned novels.And detailed experiments are conducted on this dataset.The results show that our proposed model significantly outperforms traditional machine learning methods and mainstream pre-trained models.This paper proposes methods for assessing the readability of texts at both the chapter and book levels.We conducted detailed experiments and analyses to demonstrate the effectiveness of our proposed methods.However,due to the limited annotated data,there is still room for improvement in the performance of text readability assessment.The research outcomes of this paper have direct applications in educational settings by assisting teachers and parents in selecting reading materials suitable for children in terms of difficulty.
Keywords/Search Tags:Readability assessment, Text classification, Long text representation, Deep learning
PDF Full Text Request
Related items