| As a data carrier commonly used in daily life and work,charts can transmit rich information such as the content,proportion,and trend of data in the form of images.In recent years,deep learning-based algorithms have continuously emerged in chart question answering task and have achieved good results,but there are still some problems and limitations.Therefore,this dissertation is devoted to research on the algorithms of chart question answering.Specifically,this dissertation divides the task into different subtask modules and then proposes solutions according to the characteristics and research status of different subtask modules.Meanwhile,this dissertation proposes new algorithm frameworks based on the work of each module to solve the chart question answering task.The research work of this dissertation can be summarized as follows:(1)In view of the current situation that most of the work relies on large-scale pretraining networks,and at the same time grasping the indispensable characteristics of lowlevel and high-level image features under scene tasks,this dissertation proposes an image encoder algorithm based on deconvolution operation to obtain richer chart characteristics.Specifically,this new image encoder algorithm applies deconvolution operation to fuse low-level and high-level image features,which can improve the accuracy of the Relation Network model on two open source datasets by about 5%.Moreover,it can be also applied in other methods.(2)Aiming at the problems of huge relation features and feature redundancy in the Relation Network,this dissertation proposes an affinity-driven relation pairing mechanism,and at the same time proposes an affinity-driven relation network by combining with the image encoder algorithm based on deconvolution operation to solve chart question answering.It is more effective than many existing algorithms.The accuracy of this model on DVQA dataset is more than 6%higher than that of the LEAFNet model,which has the best performance in the compared algorithms.Moreover,the number of relation features is reduced by nearly half,which is benefited from the affinitydriven relation pairing mechanism.(3)This dissertation proposes a multi-modal fusion reasoning network based on the Transformer framework in view of the problem that current methods based on attention mechanism can not make full use of the correlation between multi-modal information.Its core idea is to capture the relation between each text word and image feature in every iteration.The performance of this new method surpasses other algorithms based on attention mechanism.Specifically,the accuracy is about 2%and 4%higher than that of the LEAF-Net model respectively on two datasets,and even surpasses the affinity-driven relation network by about 1%on the FigureQA dataset. |