Font Size: a A A

Research On Theme Analysis And Generation For Gaokao Composition

Posted on:2018-12-25Degree:MasterType:Thesis
Country:ChinaCandidate:J H LiuFull Text:PDF
GTID:2348330536981908Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the fast development of AI technology in recent years,it is tempting for us to test the intelligence limit of machine.To do this,in 2015,China initiated studies related to Answering Machine for the National College Entrance Examination Program.Within these studies,how to use machine to compose articles for essay writing examination is one of the key points to study.Aiming on this key point of study,we mainly dig into two aspects including theme analysis and text generation to tackle this task.Theme analysis refers to abstract and generate a set of topic words that can explicitly define the task of writing given a question of an essay.Using the method of rule matching and key words extraction,we can deal with about 40% of total essay questions.To analyze the rest of essay questions,we viewed it as a special text tag recommendation task,which is the key part of the theme analysis.Considering the specialty of this task,we proposed a hierarchical Deep Neural Networks model.First,we use GRU or CNN learning to get representation of sentences.Then,we use vector representation of sentences as inputs to get document representation through sentences layer GRU.After that,we use the document representation as feature to put it in the logistic regression model and forecast the confidential levels of each text tag candidate.Our experiments showed that with adequate training data,the hierarchical Deep Neural Networks model can generate results that are better than other models,with F1 value increased by 8 percentage points.Although the hierarchical Deep Neural Networks model can achieve extraordinary results on theme analysis task,it requires large quantity of training corpus,which is usually time consuming and hard to obtain.Hence,we proposed a method of combining Deep Neural Networks model with Transfer Learning.First,we train the Deep Neural Networks model on source domain.After that,we use Transfer Learning method to train target domain and use the knowledge learned on source domain to help the learning on target domain.Our experiments on two data set have proved that the model based on Transfer Learning method is significantly superior than supervised learning method.On the data set of Douban,the F1 value increased by 7 percentage points;while on the data set of Essay Themes,the P@5 value increased by as high as 31.4 percentage points.As for the task of text generation,we focus on paragraph level text generation that match with multiple topics.We hope the model can be controlled by multiple topic words and generate a paragraph that cover these topic words.To do this,we proposed Coveragebased LSTM model.In this model,we maintain a multi-topic coverage vector,which learns the weight of each topic and is sequentially updated during the decoding process.After that,we input the vector to the attention networks to guide the generator.Moreover,we automatically construct two paragraph-level Chinese essay corpora,305,000 essay paragraphs and 56,621 Zhihu text.Empirical results show that our approach obtains much better BLEU score compared to various baselines.Furthermore,human judgment shows that Coverage-based LSTM has the ability to generate essays that are not only coherent but also closely related to the input topics.
Keywords/Search Tags:Text Tag Recommendation, Deep Neural Networks, Transfer Learning, Text Generation
PDF Full Text Request
Related items