Research On Automatic CET-4Writing Generation

Posted on:2015-03-03

Degree:Master

Type:Thesis

Country:China

Candidate:H T Xing

Full Text:PDF

GTID:2298330422491913

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent years, some products based on Natural Language Processingtechnologies, like Siri, are stepping into our ordinary life gradually, which inspirespeople’s greater enthusiasm for Natural Language Processing. In our research, wewant to explore the means of automatic CET-4writing generation by using theexisting Natural Language Processing technologies.In this paper, we conduct our research in three aspects. And the generalresearch content are presented as follows:Firstly, we construct a repository getting candidate composition. We get thecomposition based on portals and Search Engines. And then build a retrievalsystem based on Lucene.Secondly, we explore the technologies of generating composition. After weconduct a survey of existing methods, we decide to extract sentences fromrepository which consists of lots of candidate compositions. Guided by thisstrategy, we present three different technologies, which are based on the wordfrequency, a centroid cluster, and Latent Dirichlet Allocation(LDA) respectively.Given a title, we can output a composition between120words and150words.After evaluation, we observe that there are good and bad results for differenttechnologies in the same time. Besides, the results show that the compositionsgenerated by LDA-based technology is better. We also conduct experiments ondifferent repository, and the result shows that we can get a much bettercomposition by using model candidate composition other than some compositionsfrom Internet.Thirdly, we explore the technologies of scoring the composition in anautomatic way. When we score the composition, we consider the features ofcontent, grammar, spelling, and consistency. We use the co-occurrence of N-gram,Sikp-gram, LCS between composition and Scoring Samples to evaluate its content.We use the total number of grammatical errors and spelling errors to evaluate itsgrammar and spelling. When we evaluate its consistency, we consider the overlapwords between sentences and paragraphs. Besides, we consider the value of LSA for the composition and the different connectives in the composition. In the end,we get an Regression Model and the correlation coefficient between human scoreand machine evaluation score is0.83, which turns out to be a good result.

Keywords/Search Tags:

generation of composition, extracting sentence, automatically scoring, NLP technology

PDF Full Text Request

Related items

1	The Design And Implementation Of The Chinese Writing Composition Scoring Suggestion System For Senior High School Entrance Examination
2	Research On Automatic Scoring Of L2 Chinese Composition Based On Fusion Strategy
3	Research On Mean Shift Target Tracking Algorithm Based On Window Extracting Automatically
4	Extracting Parallel Sentence From Large Scale Web Data
5	Research On Model And Method Of Automated Essay Scoring
6	Research On Key Techniques Of Automated Essay Scoring
7	Research And Application Of Chinese Composition Scoring Based On Deep Learning
8	Sentence Generation Method In Multi-turn Dialogue System
9	The Research For Cross-media Sentence Generation And Localization
10	Research And Implementation Of Web Service Composition Scoring Method