Font Size: a A A

Research On Text Generation With Topic Model

Posted on:2020-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y C LiFull Text:PDF
GTID:2518306548491034Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Due to the abstract semantics and diverse content of natural language texts,there are two challenges in text generation: on the one hand,natural language texts are diverse in expression and contextual content changes dynamically,requiring accurate understanding and online modeling of large-scale texts.On the other hand,the content generated by the text generation model is semantically prone to inconsistency in contextual topics.For the above problems,I study the text generation method based on the topic model,aiming at the modeling dynamically the context of the text based on the dynamic topic model,providing the ability to process large-scale data online,while guiding the model generation process through the consistency of contextual topic.The main contributions of the article are as follows:1.In the process of text generation,to address the dynamic semantic modeling problem of sequence,I propose a dynamic topic model for large-scale online text,which mainly solves two problems: 1)The text generation application needs to deal with many types of dynamic text with topic distribution changes(for example,in a conversation generation scenario,the context distribution corresponding to the context changes with the process of the session).The traditional LDA-based static topic model needs to specify the number of topics in advance when training corpus,which is difficult to adapt the problem of changing the corresponding topic distribution dynamically.2)The existing online topic model mainly focuses on the model expression ability in the case of single machine,and it is difficult to adapt to the large-scale text data with dynamical topic modeling.Aiming at the above problems,I propose a parallel version of the online HDP algorithm in this paper,the training task is split into many sub-batch tasks and distributed across multiple worker nodes,such that the whole training process is accelerated.The model convergence is guaranteed through a parallel variation inference algorithm.Extensive experiments conducted on several real-world datasets demonstrate the usability and good scalability of the proposed algorithm.2.In the process of text generation,the generated text content and the input data are always inconsistent with the context semantic topic.I design and implement a text generation method based on conditional variational autoencoder(CVAE)to integrate the topic model.The method captures the context topic information of the generated data,and obtains the topic embedding corresponding to the text.Set the condition information in the CVAE to the subject code corresponding to the text data leading to adding a topic constraint to the text generation process.Finally,three sets of experiments were conducted on the datasets and it was proved that the this method tends to generate texts which are of high BLEU value,are more fluent,and tend to be long in length.To sum up,for the above two research contents,the dynamic topic model for large-scale online text(Dist HDP)and the text generation method(Trans CVAE)with topic model are respectively proposed in the paper.Finally,two models were validated on multiple datasets.Experiments illustrates that Dist HDP is faster than the single thread version of the algorithm,especially when dealing with large-scale datasets.At the same time,when the number of threads Dist HDP is extended to30,its parallel efficiency still remains above 7%,which shows its good scalability.For Trans CVAE,the model's BLEU value is maximized on the entailment datasets.In addition,Trans CVAE tends to generate 50% longer than Transformer model.
Keywords/Search Tags:text generation, conditional variational autoencoder, dynamic topic model, parallel algorithm
PDF Full Text Request
Related items