Font Size: a A A

Research On Visual Question Answering Method For Bias Mitigation

Posted on:2024-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:P J LiFull Text:PDF
GTID:2568307136488024Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Visual Question Answering(VQA)is a crucial task in the field of artificial intelligence.It has broad applications in human-computer interaction,assisting individuals with disabilities in accessing real-world information,visual retrieval,and other areas.However,current VQA models are often affected by biases,as they tend to rely too heavily on the relationship between the question and answer to predict results,ignoring the understanding of image and text semantics.This limitation restricts the performance of VQA models.Therefore,reducing the impact of biases on VQA models has become a hot research direction.Current research mainly focuses on mitigating the language prior problem caused by biases,which refers to the model relying too heavily on the grammar and structure of the text.However,existing methods still face two challenges:(1)the difference in data distribution makes it difficult for the model to focus on tail data;(2)the model overfits to answer class samples with a large number.In response to these problems,this paper proposes a bias-mitigating VQA method,which includes the following main contributions:(1)To address the issue of VQA models not being able to focus on tail data,this paper proposes a de-biasing VQA model based on category weighting strategy.The model considers the impact of biases from two perspectives: the distribution of question types and the overall distribution of answers.Specifically,the category weighting strategy module reshapes the loss of each question type so that the model can learn each type of question equally and improve its attention to tail question types.The decoupled training module re-trains the classifier weights after model training to reduce the impact of weight imbalance and improve the model’s attention to tail answer classes.Experimental results on VQA-CP v2,VQA-CP v1,and VQA v2 datasets show that the proposed model can effectively improve the accuracy of tail data.(2)To address the problem of the model overfitting to answer class samples with a large number,this paper proposes a locally bias-mitigating VQA model.The model uses question types to identify overfitting labels and generates punitive labels for this type of data.Finally,the model is regularized with the punitive labels to alleviate the problem of the model failing to truly understand the semantics of text and image due to the overfitting of answer class labels with a large number.Experimental results on VQA-CP v2,VQA-CP v1,and VQA v2 datasets show that the proposed model is effective in mitigating the overfitting problem of answer class samples with a large number,and significantly improves accuracy.
Keywords/Search Tags:Visual Question Answering, Bias mitigation, Long-tailed distributions, Robustness
PDF Full Text Request
Related items