Font Size: a A A

Paraphrasing For News Title In Financial Field Based On The Neural Netwrok

Posted on:2022-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:J L YangFull Text:PDF
GTID:2518306479493914Subject:Software Engineering
Abstract/Summary:PDF Full Text Request
The task of paraphrase is one of the important research fields of natural language processing,which aims to use different words to express the similar semantics of the text.This task,together with automatic question answering,text summarization and machine translation,are listed as the four core criteria for measuring machine natural language understanding.At the same time,this task is also widely used in assisting automatic question answering,textual implication,machine translation and other scenarios.The current mainstream paraphrase methods are mainly based on neural networks and the sequence to sequence architecture.Although this mainstream technology has good performance in general fields,when it comes to a specific field scenario,for example,in the financial field,since this field involves professional financial terminology and entity information,general domain knowledge cannot be directly migrated and used,and the performance cannot satisfy industrial precision requirements.Researches of paraphrase have started on financial scenes,which can assist industry investment researchers to screen high-value news events and conduct secondary searches for related news.However,a text paraphrase generation model for news headlines in the Chinese financial field has not yet been proposed.In order to integrate knowledge in the Chinese financial field and improve the paraphrase performance of Chinese financial headlines,this dissertation proposes an improved neural networkbased paraphrase generation model.The core contributions of this dissertation are as follows:1.A financial data set of news headline paraphrase was constructed,and it verified the experimental conclusions:Combining the topic of text paraphrase generation discussed and the real industrial scenarios,this dissertation shows the construction process of the news headline paraphrase data set and verifies the experimental conclusions based on this data set.A variety of data enhancement techniques were used to expand the data set.And the pre-trained language model was improved and fine-tuned based on the results of named entity recognition to further gaining the annotated data set with stronger generalization ability.2.A new type of neural network-based paraphrase generation model was proposed,and it enhanced the ability of model:This dissertation proposes a text paraphrase method based on the Seq2Seq architecture in response to the key technical challenges in the financial field.In this dissertation,the pre-training language model is used as the feature extractor on the Encoder side,the LSTM model is used as the text generator on the Decoder side,and the attention mechanism layer of Context is added on the Encoder and Decoder side to strengthen the model's ability to capture Context information.3.Several optimized and improved paraphrase methods was proposed,and they were applied to solve some critical tasks:This dissertation propose a model which based on the neural network,the performance of the model is further improved by the following improvements.To eliminate the influence of Chinese word segmentation,we introduce mixed granular embedding information.To solve the out-of-vocabulary problem,we introduce the attention mechanism.To solve the "poor readability" situation of the paraphrase text generated,we introduce pre-training language model for integrating domain knowledge,To solve the problem of "poor diversity" in the paraphrase text generated by the existing model framework,we introduce the beem search generation strategy.
Keywords/Search Tags:text paraphrase, paraphrase generation, attention mechanism, pre-training language model, data augmentation
PDF Full Text Request
Related items