Font Size: a A A

Research On Simplification Of Automatic Chinese Text Based On Readability Evaluation

Posted on:2022-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:S YangFull Text:PDF
GTID:2518306575965679Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the era of big data,the increasing number of news and professional texts has brought greater challenges to non-native speakers and people with cognitive impairments in understanding texts.At the same time,with the deepening of research on natural language understanding,automatic text simplification also helps to improve the performance of natural languages processing tasks such as machine translation and information extraction.This makes automatic text simplification attract more scholars' attention.At present,some methods and technologies have been proposed in this field,but they are far from perfect.There are relatively many researches on English text simplification,and relatively few researches on other languages;lack of public corpus databases,insufficient methods,and insufficient simplification effects.Based on the existing research on language simplification,this paper focuses on the problem of insufficient research on Chinese texts.Based on the constructed Chinese readability formula,the automatic text simplification of the Chinese text is studied from two aspects: vocabulary and syntax.main tasks as follows:1.Propose a new Chinese readability formula.Text readability recognition is a key issue for judging whether the text needs to be simplified and the effect of the simplification.This paper constructs a Chinese text readability formula for the problem of Chinese text readability recognition.First,based on the selected Chinese corpus,perform data preprocessing,extract Chinese features,collinearity diagnosis,and classification;second,perform linear regression analysis on the classification features,and select the optimal matching linear regression model as the readability formula of this article;Then,it compares with the existing representative text readability formula.The results show that the readability formula in this article is more effective and can be used to evaluate the readability of the text.2.A Chinese automatic text simplification method is constructed based on Chinese syntactic and morphological features.Firstly,based on the Chinese syntactic features,perform compound sentence detection for simplified sentences,and split long and difficult sentences to achieve syntactic simplification;secondly,based on Chinese lexical features,perform abbreviation expansion and replaceable word recognition on the text with syntactic simplification And candidate word selection;then,the candidate words are sorted according to word frequency,the candidate words with high word frequency are selected to calculate the substitution sentence probability of the replaceable words in the text,and the substitution sentence with the highest probability is selected as the simplified result.The evaluation of the simplified text shows that the method in this paper is effective for the simplification of Chinese text.Compared with the original text,its readability score is improved by 3.68,and the SARI score reaches 36.02,which can provide a reference for the research of Chinese automatic text simplification.3.A Chinese-oriented automatic text simplification prototype system was constructed.The prototype system is based on Chinese corpus,combined with NLP technology to recognize and simplify complex texts,and finally obtain sentences that match the reading ability of the target readers.The system can simplify and automatically evaluate Chinese texts.It provides an experimental environment for the verification of the feasibility and effectiveness of the Chinese automatic text simplification method proposed in this paper and also provides a reference for the later development of the practical system.
Keywords/Search Tags:automatic text simplification, vocabulary simplification, syntax simplification, readability evaluation
PDF Full Text Request
Related items