Font Size: a A A

Research On Text Style Transfer Based On Seq2seq Framework

Posted on:2023-08-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z N YangFull Text:PDF
GTID:2568306836469424Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of big data era,textual data has gradually become one of the most accessible types of data on the Internet.The technique of natural language processing(NLP)aims to establish an artificial intelligence system that can easily understand and even generate human texts.As an important branch of natural language processing,text style transfer(TST)aims to transform the style of text on the premise of preserving the text content.Text style transfer is applied in many real-world scenarios such as writing tools,dialog system,automatic poetry generation and toxic texts rephrasing.Although in-depth explorations have been made by many previous works,these methods are still facing the following challenges: 1)extreme lack of parallel corpora;2)lack of multilingual training data(i.e.data in other languages besides English);3)existing TST methods hardly disentangle content and stylistic features of texts.In order to solve the above challenges,a systematic study on TST is carried out in our work,and specific methods are proposed as follows:To alleviate the lack of parallel and multilingual training data for TST,a sentence-level parallel corpus of ancient-modern Chinese is collected and established.On this dataset,several Seq2Seq-based models are applied for style transferring.In order to further improve the quality of generated texts,a pre-trained model named UNILM is used for Seq2 Seq training.The quality of text style transfer is evaluated through both human metrics and automatic metrics.Experimental results under all metrics show that our method can produce higher quality transferred text.In order to solve the problem that the existing TST methods cannot well disentangle the content and style of the texts,a Seq2 Seq model based on contrastive learning is proposed.Contrastive learning can leverage unlabeled data to learn the underlying features of the data.Firstly,the training corpus is augmented with back-translation technique;Secondly,the encoders learn to distinguish the content and style of the texts via contrastive learning;Thirdly,the Seq2 Seq model with content-style dual encoders is used and combined with a style classifier to guide text style transfer.Experimental results show that our method achieves better performance on both human and automatic metrics and can generate higher-quality style-transferred texts.
Keywords/Search Tags:Natural Language Processing, Text Style Transfer, Seq2Seq Framework, Pretrained Language Models, Contrastive Learning
PDF Full Text Request
Related items