Font Size: a A A

Research On Fine-grained Chinese Paraphrase Extraction Technology Based On Deep Learning

Posted on:2020-07-16Degree:MasterType:Thesis
Country:ChinaCandidate:X YanFull Text:PDF
GTID:2428330590994383Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The related technical research of the paraphrase has been carried out very early at home and abroad,and it is an important field in the task of natural language processing.If the task of paraphrase is divided according to the granularity of the text,it can be divided into lexical level,phrase level and sentence level;according to the specific task,it can be divided into extraction,discriminant and generation.This project is based on the research of fine-grained Chinese paraphrase extraction techniques for deep learning.It aims to study the fine-grained(vocabulary and phrase)Chinese paraphrase extraction,hoping to obtain high-quality paraphrase resources.The paraphrase task is a low-level task in the natural language processing task and the quality of the paraphrase resource can directly affect many upper-level tasks,such as information retrieval,question answering system,machine translation,etc.can enhance data by paraphrase resources,thereby improving the task effect.This research is based on the deep-learning fine-grained Chinese paraphrase extraction technology,mainly considering that the traditional methods have become increasingly bottlenecks in the paraphrase tasks,and deep learning is now hot.Try to introduce the deep learning method into the Chinese paraphrase task,hoping to get a better quality paraphrase resource.The main contents of this research include the following three parts: first,the lexical Chinese paraphrase extraction technology research;then the Chinese phrase segmentation technology research;and finally the phrasal Chinese paraphrase extraction technology research.In the lexical Chinese paraphrase extraction task,we propose a Chinese candidate paraphrase extraction method based on pivoting.With a rich online English dictionary,a large number of candidate paraphrase data can be obtained.At the same time,we also propose a multi-model fusion discriminant model with negative sampling mechanism for discriminant filtering based on candidate Chinese lexical paraphrase.Through the artificial evaluation of the random extraction of the final Chinese lexical resources,the results show that the Chinese lexical paraphrase extraction method proposed in this research is better than other Chinese lexical extraction methods.In the task of Chinese phrase division,we propose a sequence annotation model of 2*BiLSTM+BiLSTM+CRF,and carry out model training and testing based on the corpus of CTB8.0.This model is mainly used to classify Chinese phrases for Chinese monolingual corpus.Through experimental comparison,the final proof of our proposed model is better than other models in Chinese phrase segmentation.In the Chinese phrasal paraphrase extraction task,we use the model proposed by ourselves to divide the phrase,and obtain about 1.03 million high-quality Chinese phrases through rule filtering.Based on the relevant contrast experiments,we propose a BattRAE model to learn the vector representation of Chinese phrases.By calculating the COSine similarity,the 40 phrases with the closest semantic distance are used as candidates.The phrase is rehearsed,and the phrase paraphrase is sorted according to the semantic similarity.Finally,based on the translation data and rules,the error or low-quality candidate phrase paraphrase is filtered.The experimental comparison shows that the Chinese phrase paraphrase extraction method proposed in this project is better than other current models.
Keywords/Search Tags:natural language processing, Chinese paraphrase extraction, fine-grained text, phrase division, deep learning
PDF Full Text Request
Related items