Research On Readability Prediction Methods Based On Linear Regression For Chinese Documents

Posted on:2016-11-02

Degree:Master

Type:Thesis

Country:China

Candidate:G Sun

Full Text:PDF

GTID:2308330461956535

Subject:Computer software and theory

Abstract/Summary:

With the development of Internet, more and more information has emerged online. People usually submit a query to a search engine and then the most relevant results are presented to them. However, people have various reading and understanding abilities. For each person, how to find the web documents suitable for his or her reading ability has become an important issue where the readability prediction of the documents plays an important role. Therefore, how to predict the readability of the documents accurately has important significance.Readability prediction of a document is to measure itâ€™s reading difficulty, which has applications in many domains, such as language education and information re-trieval. Until now, readability formulae are the most used methods for readability assessment, and usually they are built by linear regression models based on simple document features. Recent research has employed machine learning methods, and de-signed new and complex features, taken from achievements in other research areas, such as natural language processing techniques, to improve the performance of read-ability prediction. These newly developed methods have shown superiority over the classical readability formulae. However, the mediocre performance of the readability formulae may be due to limited use of readability features and specific training corpus.This thesis summarizes and analyzes existed studies for documentâ€™s readability prediction and presents our method:linear regression model incorporating feature s-election for readability prediction of Chinese documents. Then we conduct a set of empirical studies to evaluate the effectiveness of this method.Now the main contributions of this thesis are summarized as follows:1. Review existed methods for documentâ€™s readability prediction. Firstly, issues on documentâ€™s readability prediction are introduced, including the basic concept of documentâ€™s readability and the definitions for readability prediction. Then the existed readability prediction methods are summarized and classified into four categories:traditional readability formulae, methods based on cognitive theory, methods based on language models, and methods based on machine learning techniques. Finally, the four categories of readability prediction methods are introduced in detail respectively.2. Propose a method (linear regression model incorporating feature selection) for readability prediction of Chinese documents. First, the motivation of the proposed method is introduced. Then the framework of our method is explained and introduced from three aspects:Chinese feature computing, feature selection and linear regression models. Finally, the designment and implementation of our method are presented.3. Conduct empirical studies to assess the effectiveness of our method. Two main research questions are discussed firstly for evaluating the effectiveness of our method. Secondly, the dataset used in the experiments is introduced. Then the experimental design is described, including experimental settings and some performance evaluation metrics. Finally, the experimental results are analyzed to verify the effectiveness of the method proposed in this thesis.

Keywords/Search Tags:

Readability prediction, Chinese documents, Linear regression models, Feature selection, Readability formulae

Related items

1	A Study Of Readability Based Information Retrieval Model
2	Research On Text Readability Assessment Based On Neural Network Models
3	Research And Implementation Of Readability Oriented Word Embedding Technology
4	Icon In Human-computer Interaction Readability
5	Research On The Factors Influencing The Readability Of Micro-financial Reports In The New Media Era
6	Research On The Grading Of Chinese Children's Books
7	Bridging the second digital divide: Readability of news Web sites
8	Research On Text Representation Technologies For Readability Assessment
9	A Study On The Readability Of Science News
10	Research On Simplification Of Automatic Chinese Text Based On Readability Evaluation