Research On Algorithms Of Chart Question Answering In Multi-Modal Scene

Posted on:2022-11-30

Degree:Master

Type:Thesis

Country:China

Candidate:J L Zou

Full Text:PDF

GTID:2568306326473384

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

As a data carrier commonly used in daily life and work,charts can transmit rich information such as the content,proportion,and trend of data in the form of images.In recent years,deep learning-based algorithms have continuously emerged in chart question answering task and have achieved good results,but there are still some problems and limitations.Therefore,this dissertation is devoted to research on the algorithms of chart question answering.Specifically,this dissertation divides the task into different subtask modules and then proposes solutions according to the characteristics and research status of different subtask modules.Meanwhile,this dissertation proposes new algorithm frameworks based on the work of each module to solve the chart question answering task.The research work of this dissertation can be summarized as follows:(1)In view of the current situation that most of the work relies on large-scale pretraining networks,and at the same time grasping the indispensable characteristics of lowlevel and high-level image features under scene tasks,this dissertation proposes an image encoder algorithm based on deconvolution operation to obtain richer chart characteristics.Specifically,this new image encoder algorithm applies deconvolution operation to fuse low-level and high-level image features,which can improve the accuracy of the Relation Network model on two open source datasets by about 5%.Moreover,it can be also applied in other methods.(2)Aiming at the problems of huge relation features and feature redundancy in the Relation Network,this dissertation proposes an affinity-driven relation pairing mechanism,and at the same time proposes an affinity-driven relation network by combining with the image encoder algorithm based on deconvolution operation to solve chart question answering.It is more effective than many existing algorithms.The accuracy of this model on DVQA dataset is more than 6%higher than that of the LEAFNet model,which has the best performance in the compared algorithms.Moreover,the number of relation features is reduced by nearly half,which is benefited from the affinitydriven relation pairing mechanism.(3)This dissertation proposes a multi-modal fusion reasoning network based on the Transformer framework in view of the problem that current methods based on attention mechanism can not make full use of the correlation between multi-modal information.Its core idea is to capture the relation between each text word and image feature in every iteration.The performance of this new method surpasses other algorithms based on attention mechanism.Specifically,the accuracy is about 2%and 4%higher than that of the LEAF-Net model respectively on two datasets,and even surpasses the affinity-driven relation network by about 1%on the FigureQA dataset.

Keywords/Search Tags:

Chart Question Answering, Multi-Modal, Neural Network

PDF Full Text Request

Related items

1	Research And Application Of Visual Question Answering Based On Deep Neural Network
2	Research And Application Of Key Technologies Of Community Question Answering
3	Research And Application Of Multi-domain Visual Question Answering System Based On Image Comprehension
4	Research Of Visual Question Answering Method Based On Deep Learning
5	Research And Algorithm Implementation Of Efficient Visual Question Answering Based On Deep Learning
6	Research On Visual Question Answering Based On Deep Learning
7	Research On Multi-hop Question Answering Based On Graph Neural Network
8	Research On Affective Visual Question Answering
9	Research And System Implementation Of Graph-based Question Answering Method
10	Layout Attention Network For Question Answering Over Table And Text