Font Size: a A A

Research On Visual Dialog Technology Integrating Dialog History

Posted on:2021-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:T H YangFull Text:PDF
GTID:2428330602994384Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
In recent years,deep learning technology has achieved success in the fields of computer vision and natural language processing.The development of deep learning technology drives the performance improvement of tasks such as visual analysis and natural language processing.Researchers begin to focus on multimodal tasks that combine vision and language,e.g.,image captioning,visual question answering,and visual dialog.Such multimodal tasks not only rely on accurate analysis of visual content,but also require accurate understandings of natural language.As a multimodal task,Visual Dialog requires the computer to hold a meaningful dialog with humans in natural,conversational language about visual content.Specifically,given an image,a dialog history,and a follow-up question about the image,the task is to answer the question.Visual dialog is more complicated than other multimodal tasks.Recently,visual dialog has become a research hotspot for its wide application prospects in the fields of chat robots,intelligent customer service,and blind assistant.Existing visual dialog methods mainly follow the "encoder-decoder" framework.In each round of dialog,the multi-modal encoder encodes the visual and linguistic in-puts into feature vectors,and the decoder further reasons out the answer of this round.However,existing methods do not fully consider the correlation between images,dialog history,and follow-up questions,thus lacking the ability to represent the collaborative information between them.Moreover,existing methods only employ the correct dialog history to generate the answers,and neglect the implicit impact of incorrect dialog history to the answers,resulting in the lack of sensitivity to dialog history and the inability to effectively perform context reasoning based on dialog history.To fully exploit the historical information in visual dialog and improve the performance of the visual dialog task,the main research work of this thesis is as follows:1)To effectively encode the images,dialog history,and follow-up questions,this thesis proposes a History-Aware Co-Attention Network(HACAN)for visual dialog.The model applies the coattention mechanism,which takes the influence of the other into account,to calculate the features of them.Therefore,HACAN achieves the feature interaction between the images,dialog history and follow-up questions,and fully represents both the unique information of each and the collaborative information between them.2)To improve the context-based reasoning ability over visual dialog,this thesis pro-poses a training strategy named History-Advantage Sequence Training(HAST).HAST intentionally imposes wrong answers in the dialog history,evaluates the answers obtained based on the original dialog history and the tampered dialog history,respectively.The difference between the evaluation results is toke as the historical advantage,which quantifies the impact of dialog history on answers,reflects the correlation between the manipulated dialogue history and the follow-up questions,and forces the model to learn the logical information within the dialog history.By introducing historical advantages into the gradient computation,which further back-propagates to the History-Aware Co-Attention Network,HAST acquires the ability of context-based reasoning.3)To verify the effectiveness of the proposed methods,this thesis conducted a series of ablation studies.Experiments show that the History-Aware Co-Attention Net-work and the History-Advantage Sequence Training strategy can effectively take advantage of the historical information of the dialog and improve the accuracy of the visual dialog task.On the three mainstream visual dialog datasets,the performance of the proposed methods also surpasses the existing visual dialog methods.
Keywords/Search Tags:Visual Dialog, Neural Network, Attention Mechanism, Training Strategy
PDF Full Text Request
Related items