Font Size: a A A

Research On Automatic CET-4Writing Generation

Posted on:2015-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:H T XingFull Text:PDF
GTID:2298330422491913Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, some products based on Natural Language Processingtechnologies, like Siri, are stepping into our ordinary life gradually, which inspirespeople’s greater enthusiasm for Natural Language Processing. In our research, wewant to explore the means of automatic CET-4writing generation by using theexisting Natural Language Processing technologies.In this paper, we conduct our research in three aspects. And the generalresearch content are presented as follows:Firstly, we construct a repository getting candidate composition. We get thecomposition based on portals and Search Engines. And then build a retrievalsystem based on Lucene.Secondly, we explore the technologies of generating composition. After weconduct a survey of existing methods, we decide to extract sentences fromrepository which consists of lots of candidate compositions. Guided by thisstrategy, we present three different technologies, which are based on the wordfrequency, a centroid cluster, and Latent Dirichlet Allocation(LDA) respectively.Given a title, we can output a composition between120words and150words.After evaluation, we observe that there are good and bad results for differenttechnologies in the same time. Besides, the results show that the compositionsgenerated by LDA-based technology is better. We also conduct experiments ondifferent repository, and the result shows that we can get a much bettercomposition by using model candidate composition other than some compositionsfrom Internet.Thirdly, we explore the technologies of scoring the composition in anautomatic way. When we score the composition, we consider the features ofcontent, grammar, spelling, and consistency. We use the co-occurrence of N-gram,Sikp-gram, LCS between composition and Scoring Samples to evaluate its content.We use the total number of grammatical errors and spelling errors to evaluate itsgrammar and spelling. When we evaluate its consistency, we consider the overlapwords between sentences and paragraphs. Besides, we consider the value of LSA for the composition and the different connectives in the composition. In the end,we get an Regression Model and the correlation coefficient between human scoreand machine evaluation score is0.83, which turns out to be a good result.
Keywords/Search Tags:generation of composition, extracting sentence, automatically scoring, NLP technology
PDF Full Text Request
Related items