Font Size: a A A

Research On Text Style Transfer Based On Delete-Retrieve-Generate Framework

Posted on:2023-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y FeiFull Text:PDF
GTID:2568306836476614Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the continuous development of information technology,a large amount of text data has appeared on Internet platforms such as ecommerce websites and social media and has shown an exponential growth trend.In recent years,Text Style Transfer(TST)has aroused extensive interest of researchers and has become one of the most popular topics in the field of natural language processing.This task aims to transfer the specific style(e.g.,sentiment,tense,gender,etc.)of the text through editing or generating in the premise of preserving the text content.There are many challenges in TST research: 1)Lack of parallel corpus,thus failing to adopt the robust sequence-to-sequence model which highly relies on the parallel training data;2)The content and style in the text are often entangled,so it is difficult to separate their representations in the latent space;3)As TST is a relatively new research topic,there is currently no unified evaluation metrics.Regarding the issues above,this paper systematically carries out the following works:Firstly,this paper reviews the existing TST methods systematically.According to the parallelity of training data,The existing methods are divided into three categories,i.e.,supervised-learningbased methods,unsupervised-learning-based methods and semi-supervised-learning-based methods.Among them,the TST methods based on unsupervised learning are the mainstream research methods,which can be further divided into implicit methods and explicit methods.Implicit methods aim to learn and distinguish latent representations of text content and text style,including disentanglement strategies,back-translation strategies,pseudo-parallel corpus strategies,etc.;explicit methods aim to separate text content and text style from the text itself,including deletion strategy based on word frequency,deletion strategy based on attention mechanism,and deletion strategy based on combination of word frequency and attention mechanism.Secondly,in order to better separate text content and text style,this paper proposes a probabilistic variation-based attribute word deletion strategy on the basis of a delete-retrieve-generate(DRG)based TST framework.Firstly,attribute words are located and deleted according to the probability change of the sentence style predicted by the style classifier,thus leaving the text content;Then,the sentences similar to the text content are retrieved from the target corpus by calculating their cosine distance;Finally,the target attribute words are extracted from the sentence and synthesized with the text content to generate the target sentence.The experimental results show that our method outperforms the baselines in terms of text content conservation,and also has good performance on transfer accuracy and text fluency.Finally,in order to guide the model to generate more fluent target sentences,this paper combines the DRG framework with the reinforcement learning method where the fluency reward is introduced besides the existing transfer accuracy reward and the content conservation reward.The method consists of three modules: neutralization module,generation module and reward module.Among them,the neutralization module is to explicitly filter attribute words to extract the text content,the generation module is to combine target attribute words and the text content to generate the target sentences and the reward module is to further optimize the neutralization module and the generation module.The experimental results show that our method has higher fluency than the baselines and also can well balance the transfer accuracy and the text content conservation.
Keywords/Search Tags:Text Style Transfer, Natural Language Processing, Style Classifier, Reinforcement Learning
PDF Full Text Request
Related items