Font Size: a A A

Chinese Multi-turn Utterance Rewritting Approach Based On Target Mention Text Extraction

Posted on:2022-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:J J XuFull Text:PDF
GTID:2518306569981809Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the multi-turn human-machine dialogue scenario,there are often problems of coreferencing or information omission in user's utterances,which makes it difficult for the dialogue system to accurately identify the user's intention.To this end,the researchers proposed to use the sequence-to-sequence model to use historical utterances to rewrite the current utterance with coreferencing or information omission into a complete and unambiguous utterance.Such model actually has two tasks in utterance rewriting:1)Extract text from historical utterances that can complement the semantics of the current utterance.2)Use the text to assist in generating rewriting utterance.Through the analysis of the existing datasets,this project found that only a small part of the text in the historical utterances affects the rewriting of the current utterance.Therefore,when the historical utterances is long,only using the sequence-to-sequence model may not be able to capture the key information of the historical utterances,resulting in a decline in the effect of utterance rewriting.To solve the problem above,this project calls the continuous text in the historical utterance that affects the rewriting of the current utterances as target mention text and proposes a utterance rewriting method based on the target mention text extraction.The main works of this project are as follows:1)This project first uses an extractive machine reading comprehension model to extract the target mention text,and then inputs the target mention text,historical utterances,and current utterance together into the sequence-to-sequence model to generate rewriting utterance.This method of explicitly taking the target mention text as inputs highlights the importance of the target mention text.2)Considering that historical utterances,current utterance,and target mention text have different importance in utterance rewriting task,not only does the utterance rewriting model of this project generate the word probability distribution on each type of text separately,but use the attention mechanism to weight the word probability distributions.There are two benefits of this:First,the utterance rewriting model can assign appropriate weight to the target mention text through the attention mechanism so as to increase the robustness of the model and avoid the model from generating bad utterance when the target mention text is inaccurate.Second,If the user's current utterance has complete semantics and does not need to be rewritten,the model should give higher weight to the current utterance,so that in the process of generating the rewritten utterance,the text of current utterance is more likely to be selected as output.In order to verify the effectiveness of the model,experiments are carried out on two public chinese datasets.Experiments show that the proposed model can capture effective text information from historical utterance and get better results than other compared models.
Keywords/Search Tags:coreference resolution, utterance rewriting, machine reading comprehension
PDF Full Text Request
Related items