Research On Text Style Transfer Based On Seq2seq Framework

Posted on:2023-08-05

Degree:Master

Type:Thesis

Country:China

Candidate:Z N Yang

Full Text:PDF

GTID:2568306836469424

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the advent of big data era,textual data has gradually become one of the most accessible types of data on the Internet.The technique of natural language processing(NLP)aims to establish an artificial intelligence system that can easily understand and even generate human texts.As an important branch of natural language processing,text style transfer(TST)aims to transform the style of text on the premise of preserving the text content.Text style transfer is applied in many real-world scenarios such as writing tools,dialog system,automatic poetry generation and toxic texts rephrasing.Although in-depth explorations have been made by many previous works,these methods are still facing the following challenges: 1)extreme lack of parallel corpora;2)lack of multilingual training data(i.e.data in other languages besides English);3)existing TST methods hardly disentangle content and stylistic features of texts.In order to solve the above challenges,a systematic study on TST is carried out in our work,and specific methods are proposed as follows:To alleviate the lack of parallel and multilingual training data for TST,a sentence-level parallel corpus of ancient-modern Chinese is collected and established.On this dataset,several Seq2Seq-based models are applied for style transferring.In order to further improve the quality of generated texts,a pre-trained model named UNILM is used for Seq2 Seq training.The quality of text style transfer is evaluated through both human metrics and automatic metrics.Experimental results under all metrics show that our method can produce higher quality transferred text.In order to solve the problem that the existing TST methods cannot well disentangle the content and style of the texts,a Seq2 Seq model based on contrastive learning is proposed.Contrastive learning can leverage unlabeled data to learn the underlying features of the data.Firstly,the training corpus is augmented with back-translation technique;Secondly,the encoders learn to distinguish the content and style of the texts via contrastive learning;Thirdly,the Seq2 Seq model with content-style dual encoders is used and combined with a style classifier to guide text style transfer.Experimental results show that our method achieves better performance on both human and automatic metrics and can generate higher-quality style-transferred texts.

Keywords/Search Tags:

Natural Language Processing, Text Style Transfer, Seq2Seq Framework, Pretrained Language Models, Contrastive Learning

PDF Full Text Request

Related items

1	Research On Text Style Transfer Based On Delete-Retrieve-Generate Framework
2	Universal Text Generation Transfer Learning Based On Advanced Semantics
3	Research On Feature Space Backdoor Attack Methods For Natural Language Processing Models
4	Research On Chinese Text Summarization Based On Deep Learning
5	Research On The Construction And Anal Sis Of Common Sense Corpora For Natural Language Generation
6	From Code To Natural Language: Type-aware Sketch-based Seq2seq Learning
7	Explore The Construction Of A Natural Language Programming Framework In Auditing
8	Research On Text Similarity Based On Bert
9	Research On Text Sentiment Analysis Based On Pre-trained Language Model
10	Research On Key Technologies Of Sentence Text Matching And Recognition