Algorithm Research Of Visual Dialog Related Problems Based On Visual Anaphora Resolution

Posted on:2024-06-01

Degree:Master

Type:Thesis

Country:China

Candidate:J Zhao

Full Text:PDF

GTID:2568307079976969

Subject:Electronic information

Abstract/Summary:

With the continuous development of artificial intelligence technology,natural language processing and computer vision,as the core technical fields of artificial intelligence,have achieved remarkable achievements.Visual data and text data in various fields are growing explosively.How to effectively interact visual data(pictures,videos)and text data,and extract,filter,and infer effective information from them is an important challenge in the field of artificial intelligence.Based on the above challenges,researchers have proposed many cross-modal tasks such as image description,visual question answering and visual dialogue.Among them,the visual dialogue task aims to accurately answer continuous questions around visual content.The key to the visual dialogue task is to accurately understand the semantics of the question,locate the correct target from the picture,and then infer the correct answer.However,there are a large number of pronouns in the historical dialogue information in visual dialogue,and the dialogue model may not be able to determine the target entity referred to by the pronouns,resulting in biased answer results.In order to deal with the problem of unclear reference,this thesis conducts the following research contents:1.Aiming at the ubiquitous visual reference resolution problem in visual dialogue,a visual dialogue model based on double soft constraints is proposed.2.Based on the linguistic knowledge that the antecedents of pronouns can only be nouns or noun phrases,the first soft constraint is proposed,and learnable part-of-speech tags and part-of-speech tag prediction losses are introduced.3.Based on the fact that the reference of pronouns in dialogue often occurs in nearby dialogues,a second soft constraint using sinusoidal position encoding sentences is proposed,aiming to enhance the local interaction between sentences.In order to verify the effectiveness of the visual dialogue technology based on double soft constraints proposed in this thesis,in Vis Dial v0.9,Vis Dial v1.0 and Guess What? !Extensive experiments were conducted on this model on three datasets,including quantitative experiments,qualitative experiments,and ablation experiments.The experimental results show that the method based on double soft constraints has achieved better results than the previous methods.It can effectively resolve the entities referred to by pronouns and improve the accuracy of answers for visual dialogue models.

Keywords/Search Tags:

Visual Dialog, Anaphora Resolution, Soft Constraints, Part of Speech tags, Sinusoidal Position Encoding

Related items

1	Research On Visual Dialog Technology Based On Visual Coreference Resolution
2	Research On Speech Bandwidth Encoding Based On Nonlinear Mapping Of Sinusoidal Models
3	Text Comprehension Based On Resolution Of Anaphora
4	The Anaphora Resolution Research Based On Frame Semantic Annotation
5	Research And Realize On Pronominal Anaphora Resolution System In Chinese Text
6	Research And Implementation Of Anaphora Resolution
7	Research On Visual Dialog Technology Integrating Dialog History
8	Research And Application Of Question Generation Technology For Visual Dialog
9	Research On Anaphora Resolution In Uyghur Personal Pronouns
10	Research On Enhanced Word Embedding Learning Model With Fusion Of Part-of-Speech And Position Information