Font Size: a A A

Research On Readability Prediction Methods Based On Linear Regression For Chinese Documents

Posted on:2016-11-02Degree:MasterType:Thesis
Country:ChinaCandidate:G SunFull Text:PDF
GTID:2308330461956535Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of Internet, more and more information has emerged online. People usually submit a query to a search engine and then the most relevant results are presented to them. However, people have various reading and understanding abilities. For each person, how to find the web documents suitable for his or her reading ability has become an important issue where the readability prediction of the documents plays an important role. Therefore, how to predict the readability of the documents accurately has important significance.Readability prediction of a document is to measure it’s reading difficulty, which has applications in many domains, such as language education and information re-trieval. Until now, readability formulae are the most used methods for readability assessment, and usually they are built by linear regression models based on simple document features. Recent research has employed machine learning methods, and de-signed new and complex features, taken from achievements in other research areas, such as natural language processing techniques, to improve the performance of read-ability prediction. These newly developed methods have shown superiority over the classical readability formulae. However, the mediocre performance of the readability formulae may be due to limited use of readability features and specific training corpus.This thesis summarizes and analyzes existed studies for document’s readability prediction and presents our method:linear regression model incorporating feature s-election for readability prediction of Chinese documents. Then we conduct a set of empirical studies to evaluate the effectiveness of this method.Now the main contributions of this thesis are summarized as follows:1. Review existed methods for document’s readability prediction. Firstly, issues on document’s readability prediction are introduced, including the basic concept of document’s readability and the definitions for readability prediction. Then the existed readability prediction methods are summarized and classified into four categories:traditional readability formulae, methods based on cognitive theory, methods based on language models, and methods based on machine learning techniques. Finally, the four categories of readability prediction methods are introduced in detail respectively.2. Propose a method (linear regression model incorporating feature selection) for readability prediction of Chinese documents. First, the motivation of the proposed method is introduced. Then the framework of our method is explained and introduced from three aspects:Chinese feature computing, feature selection and linear regression models. Finally, the designment and implementation of our method are presented.3. Conduct empirical studies to assess the effectiveness of our method. Two main research questions are discussed firstly for evaluating the effectiveness of our method. Secondly, the dataset used in the experiments is introduced. Then the experimental design is described, including experimental settings and some performance evaluation metrics. Finally, the experimental results are analyzed to verify the effectiveness of the method proposed in this thesis.
Keywords/Search Tags:Readability prediction, Chinese documents, Linear regression models, Feature selection, Readability formulae
PDF Full Text Request
Related items