Font Size: a A A

Research On Text Generation Technology Combining Topic Information

Posted on:2022-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:F Q ZhouFull Text:PDF
GTID:2518306509984639Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
At present,in an era of information explosion,people are creating or receiving all kinds of text information.To a certain extent,generating text via computers can alleviate people's mechanical and repetitive creation,and improve writing efficiency as well as provide assistance or inspiration for people's creation.Usually,the content of a text is topic-related.If the content is loose,and lacks clear topics,the readability of the text will decrease.However,many current researches on text generation rarely pay attention to modeling topic information.Hence,we mainly put our focus on the combination of text generation and topic information.Firstly,we explore the topic representation ability of the topic model.Specifically,we introduce word vectors and temperature mechanism into the LDA,a typical kind of topic models,converting the topic-word distribution into a mixed distribution.Then train the topic model with Gibbs sampling and tailored loss function jointly.The experimental results show that the obtained topic model can improve the consistency within the topic and the diversity among topics to a certain extent.Besides,the topic information can also improve the performance in the downstream tasks.Secondly,we integrate topic model into the framework of generative text summarization,addressing the problems of generating the focus of the trial according to the case documents.By inputting case documents into the neural topic model,the weight matrixes of topic-word distributions and the vectors of document-topic distributions are obtained.Then combine them with the Transformer-based generative text summarization model using linear transformations.The experiments on the datasets of three crimes of fraud,theft and intentional injury show that the results combined with the topic information are basically better than the ordinary generative model in various evaluation metrics,and significantly better than the extractive text summarization models and some unsupervised algorithms or heuristic approaches.Finally,we propose a topic-enhanced model of Jintishi generation called TPoet.By training the topic model on the poetry corpus,the obtained topic information is introduced into the Transformer-based autoregressive language model,including the topic identifiers in the input layer and the topic attention mechanism in the Transformer computing blocks.Meanwhile,we also design some input identifiers,including format,position,rhyme,tone,etc.,so that TPoet can explicitly learn the constraints of poetry generation.The experimental results on the dataset of Jintishi show that the poems generated by TPoet are superior to the current advanced models or systems in multiple evaluation metrics,such as topic consistency,format accuracy,rhyme accuracy and so on.The strategies of integrating topic information into the task of text generation can provide references for other researches.Our work also has practical application scenarios.For example,the research on generating the focus of the trial is helpful to assist the staff of procuratorial organs in pre-trial planning.Chinese poetry generation has application value in literary creation,education,entertainment and so on.
Keywords/Search Tags:Text Generation, Topic Model, Focus of the Trial, Poetry Generation
PDF Full Text Request
Related items