Font Size: a A A

Research On Visual Qusetion Answering With Suppressing Biased Samples

Posted on:2023-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y N L OuFull Text:PDF
GTID:2568306794481804Subject:Electrical engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer vision and natural language processing in recent years,the researches on their derivative tasks and cross-tasks have also become a current hotspot.Visual Question Answering(VQA)is a typical one,which aims at predicting the answer according to the specified image and the corresponding natural language question.However,most existing VQA models suffer from the language bias problem,which means current VQA models tend to fit question-answer pairs that have appeared during training.When the answer distributions of train split and test split are different,the performance of the models drops strongly.This behavior makes them difficult to apply to realworld scenarios.The key challenges of this task are: It requires the VQA model to fully understand the content of the question and the image,and narrow the gaps between the two modalities.As well as overcoming the language bias in the case of training samples contains bias,thus predicting the de-bias answer based on the image.Previous ensemble-based methods focus on combining multiple modular components to create a new anti-bias model,sacrificing the performance of the model on the biased dataset.Additionally,existing data-balanced methods will introduce new biases when generating training data,and the language bias problem has not been fundamentally addressed.To address the above problems,this thesis proposes a Suppressing Biased Samples(SBS)model to overcome language bias.SBS consists of two collaborative parts,i.e.,a data classifier module and a bias penalty module.The data classifier module uses the representation of language bias and the similarity in the semantic space to divide the training samples into biased samples and unbiased samples;the bias penalty module forces the model to learn unbiased features by forbidding the back-propagation of the biased samples to the question representation space,and use the designed similarity loss function to dynamically change the obtained loss to reduce the effect of biased samples.This thesis conducts experiments on the VQA dataset,VQA-CP dataset and VQA-CE dataset,and compares the results with the current benchmark models through specific examples and statistical analysis.The results show that proposed SBS model can alleviate the influence of language bias and make the model infer the answer after eliminating bias without harming the ability of the model to answer questions.Different from the data-balanced methods,suppressing bias samples will not introduce new biases.These findings will provide an instancelevel solution for the language bias in visual question answering and further promote the development of this task.
Keywords/Search Tags:Visual Question Answering, Multimodal, Language bias, Sample classification, Bias penalty
PDF Full Text Request
Related items