Font Size: a A A

Research On The Outlier Dialogues In The Construction Of Multi-turn Conversations Corpus

Posted on:2019-07-23Degree:MasterType:Thesis
Country:ChinaCandidate:G D ZhengFull Text:PDF
GTID:2428330566996844Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of computer technology,the major areas have begun to focus on artificial intelligence technology,leading a wave of progress in the era of artificial intelligence.As an important form of artificial intelligence,the dialogue system has attracted wide attention from industry and academia.However,constructing multi-turn of dialogue system requires high-quality,large-scale spoken dialogue corpus in order to train the key technologies of dialogue system,such as the semantic understanding model of spoken language.The spoken dialogue corpus for the dialogue system is generally obtained and processed from community forums(such as Tieba,Weibo,etc.).Despite the large number of community forum conversations,these conversations often include a lot of spam such as games and shopping advertisements.In addition,some sensitive topics such as verbal abuse need to be removed.The analysis of outlier conversations is an important part of the construction of multi-turn of discourse corpus.This paper cuts in from the topic of the dialogue,analyzes the sentences which deviate from the topic of the multi-turn conversations in the dialogue corpus,and then constructs a high-quality pure pair of material.As mentioned the forum texts,the traditional methods of process text perfo rms less well because the form of texts is often short and the texts' s content is small and very fresh.In response to the above problems,this paper uses the deep learning,which is a currently popular technology,to deal with the discourse materials of the community.The main research contents of theresearch work include the following aspects:(1)Firstly,we apply some methons to discriminate whether the text of forum contains a sentence that deviates from the subject.Then we identify outliers from conversations that contain deviations from the subject.In addition,this article uses the topic segmentation method to divide the multi-turn of conversations into the individual subtopic-related conversations.(2)When discriminating whether a short-text conversation contains a deviation from a topic sentence,we used artificially constructed dialogue training data to train the Hierarchical Gated Recurrent Unit Network(HGRU)and the Hierarchical Gated Recurrent Unit network Convolutional Neural Network(HGRU-CNN)respectively.Then test on a small number of manually labeled data sets.(3)After discriminating dialogues that contain deviations from the topic,this paper proposes a Topic-Document Match-Gated Network and Topic-Document Network based on Attention Mechanism to determine each and locate sentence of the dialogue that deviates from the topic,and remove it.(4)The paper uses topic segmentation technique to divide conversations.We use an End-to-End Neural Network to segment the dialogue text so that eac h short text after segmentation is related to each sub-topic.The paper uses the TDT2 corpus to train the model and transform the model to the forum dialogue.The experimental results are verified by the constructed validation set and are in line with the expected.
Keywords/Search Tags:Short Text, Deep Learning, Topic Segmentation, Attention Mechanism, Transfer Learning
PDF Full Text Request
Related items