Font Size: a A A

Research On Key Technologies Of Text Generation In Social Media

Posted on:2023-07-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:F Q LinFull Text:PDF
GTID:1528307169977309Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
With the development and popularization of the Internet,social networks have gradually become the primary medium of information dissemination and play an essential part in daily life.As the main carrier of online communication,massive social texts are generated by users to express,gather and collide with individual thoughts,and then form a group will and react to the human society,becoming an essential factor affecting users’ thoughts,behaviors,and consciousness.A series of intelligent applications,such as writing robots,intelligent assistants,and customer service,which take natural language understanding and generation as their core competencies,emerge as the times require,making the research of text generation widely concerned by academia and industry.Text generation aims to generate controllable texts with multiple dimensions of semantics topic,style attribute,and diverse expression forms on demand.It is a leading direction to promote machine understanding human language and has essential research significance.Additionally,text generation techniques commonly serve as the core testbed for a wide range of practical applications,such as news reporting,dialogue generation,and question answering.Recently,the improvement of hardware computing power and the rapid iteration of deep learning technology have promoted related work in the field of text generation to make remarkable progress.Despite the promising improvement,there currently remain many issues and challenges in text generation for social media that need to be addressed and innovated.This paper focuses on text generation,starting from the four elements of topic constraints,style attributes,diversity,and nonverbal symbols,and thereby carries out research on four aspects: topic-to-essay generation,text style transfer,diverse response generation,and emoji insertion.The main contributions and innovative achievements of this paper are listed as follows:The topic-to-essay generation task mainly focuses on whether the text generation system can customize the text conforming to the constraints given topics,which aims to generate novel,diverse,and topic-consistent essays based on multiple topic words.Although existing sequence-to-sequence generation models provide a feasible solution,the model performances are unsatisfactory.Due to the large semantic gap between the source topic sequence and the target text,the generated essay is uninformative and suffers from poor semantic consistency.To this end,this paper proposes a knowledge-enhanced topic-to-essay generation approach that integrates external knowledge to enrich the source semantic information,thereby improving the semantic diversity of the generated essay.Besides,to address the issue of topic redundancy or deficiency in the attention mechanism,we further apply a coverage loss function to promote the balanced distribution of multiple topics in the generated text.Experimental results show that our method can generate novel,diverse,and topic-consistent essays,and bring considerable performance improvement.The text style transfer task focuses on whether the text generation system can generate text with diverse style attributes,which aims to alter the style attributes of text while retaining the original semantic content,thus improving the style diversity of the generated textual content.As commonly used latent style representation is insufficient to project diverse style expressions,making the transferred text of poor contextual consistency that only contains general expressions matching the target style.Therefore,we propose a memory-enhanced unsupervised style transfer method,which constructs a memory module to extract,cluster,and learn fine-grained content and style representation from the non-parallel corpus.To address the deviation caused by noise samples or outliers in learning style representation,we further introduce a style representation calibration method by modeling inter-style relationships.Experimental results on three benchmark datasets,aiming at sentiment and normality attributes,show that the proposed method is superior to the competitive baselines in terms of style transfer accuracy and content preservation.Besides,the generated transferred sentences contain diverse stylistic phrases that consistent with context.The diverse response generation task focuses on whether the text generation system can generate informative and diverse interactive text,which aims to generate diverse responses based on multi-modal input contextual data.Current response generation models suffer from the issue of the universal and uninformative generated output.To address this issue,we propose a multi-view meta-learning approach for diverse response generation,which customizes generation models for different scenarios via multi-task setting of meta-learning,thereby encouraging generation diversity.Towards modeling multi-modal data,we design a multi-view meta-learning algorithm to fuse different modal information,where the model attends to both common information shared by all modalities and unique information from each modality.Experimental results on two multi-modal open-domain dialogue datasets validate the effectiveness of our method,which achieves the best performance in terms of both quality and diversity.The emoji insertion task focuses on modeling the use of nonverbal symbols in the multi-form text,which aims to make the generated sentences more vivid and interactive by adding appropriate emojis.Most current studies on modeling emoji usage focus on the emoji prediction task that only examines the one-to-one matching relationship between textual content and emoji,ignoring the issues of multiple emojis and emoji positions.Additionally,sentiment factors are not taken into account when predicting emojis.To address these issues mentioned above,we propose a sentiment-aware emoji insertion task and construct a corpus,named Multi Emoji,for this task.Furthermore,we exploratively propose two text emoji insertion frameworks,i.e.a sequential tagging paradigm and a two-stage pipeline paradigm.Both paradigms achieve considerable improvement on both the Multi Emoji dataset for emoji insertion and the Semeval-EN dataset for emoji prediction.This paper conducts research on key technologies of text generation for social media from four aspects: topic-to-essay generation,unsupervised text style transfer,multi-modal diverse response generation,and sentiment-aware emoji insertion.Accordingly,we propose solutions for the four settings,which can effectively improve the performance of corresponding tasks.In the future,we plan to explore new paradigms for applying emerging technologies,such as the pre-trained language model and prompt learning in the field of text generation for social media,and promote the further development of relevant research.
Keywords/Search Tags:Natural Language Generation, Topic-to-Essay Generation, Text Style Transfer, Response Generation, Emoji Insertion, Attention Mechanism, Knowledge Enhancement, Memory Network
PDF Full Text Request
Related items