Research On Visual Question Answering Method For Bias Mitigation

Posted on:2024-04-09

Degree:Master

Type:Thesis

Country:China

Candidate:P J Li

Full Text:PDF

GTID:2568307136488024

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

Visual Question Answering(VQA)is a crucial task in the field of artificial intelligence.It has broad applications in human-computer interaction,assisting individuals with disabilities in accessing real-world information,visual retrieval,and other areas.However,current VQA models are often affected by biases,as they tend to rely too heavily on the relationship between the question and answer to predict results,ignoring the understanding of image and text semantics.This limitation restricts the performance of VQA models.Therefore,reducing the impact of biases on VQA models has become a hot research direction.Current research mainly focuses on mitigating the language prior problem caused by biases,which refers to the model relying too heavily on the grammar and structure of the text.However,existing methods still face two challenges:(1)the difference in data distribution makes it difficult for the model to focus on tail data;(2)the model overfits to answer class samples with a large number.In response to these problems,this paper proposes a bias-mitigating VQA method,which includes the following main contributions:(1)To address the issue of VQA models not being able to focus on tail data,this paper proposes a de-biasing VQA model based on category weighting strategy.The model considers the impact of biases from two perspectives: the distribution of question types and the overall distribution of answers.Specifically,the category weighting strategy module reshapes the loss of each question type so that the model can learn each type of question equally and improve its attention to tail question types.The decoupled training module re-trains the classifier weights after model training to reduce the impact of weight imbalance and improve the model’s attention to tail answer classes.Experimental results on VQA-CP v2,VQA-CP v1,and VQA v2 datasets show that the proposed model can effectively improve the accuracy of tail data.(2)To address the problem of the model overfitting to answer class samples with a large number,this paper proposes a locally bias-mitigating VQA model.The model uses question types to identify overfitting labels and generates punitive labels for this type of data.Finally,the model is regularized with the punitive labels to alleviate the problem of the model failing to truly understand the semantics of text and image due to the overfitting of answer class labels with a large number.Experimental results on VQA-CP v2,VQA-CP v1,and VQA v2 datasets show that the proposed model is effective in mitigating the overfitting problem of answer class samples with a large number,and significantly improves accuracy.

Keywords/Search Tags:

Visual Question Answering, Bias mitigation, Long-tailed distributions, Robustness

PDF Full Text Request

Related items

1	Research On Priors Mitigation And Multimodal Reasoning For Visual Question Answering System
2	Research On Affective Visual Question Answering
3	Research On Visual Qusetion Answering With Suppressing Biased Samples
4	Research Of Visual Question Answering Based On Cross-media Multimodal Representation Learning
5	Research On Methods Of Visual Question Answering Based On Adaptive Multimodal Feature Fusion
6	Research On Language Bias Of Visual Question Answering Model
7	Research On Visual Question Answering Based On Multi-Channel CNN-LSTM
8	Research On Visual Question Answering With Deep Metric Learning
9	Research On Question Answering Systems For Visual Content And Emotion Perception
10	Research On Key Technologies Of Visual Question Answering Based On Metric Learnin