The Fine-Grained Visual Explanation Generative Model Based On Multimodal Fusion

Posted on:2019-09-17

Degree:Master

Type:Thesis

Country:China

Candidate:Z X Gao

Full Text:PDF

GTID:2428330545971459

Subject:Computer Science and Technology

Abstract/Summary:

In recent years,the fine-grained object classification algorithm based on deep learning has achieved great success in the field of computer vision.However,it is not enough to obtain simple prediction results for understanding artificial intelligence.Which are the key factors to understand the interaction system of artificial intelligence is that it can explain why the visual system produces an output conclusion and obtain it's corresponding visual evidence.Therefore,it is necessary to provide a good discriminative characteristic,explain the internal reason of the decision prediction system,and make a visual interpretation model.This thesis mainly studies the following two aspects.1、We use the California institute of technology's fine-grained bird database to be specific research object combined with the feature of the marked descriptive text of birds in the database.Studying the multi-modal fine-grained recognition algorithm by Q-learning and a compact bilinear information fusion method of multi-modal.That can make the object classification and interpretation of the visual to be a whole.And thus can synchronously generate natural explanatory language which has a good discrimination for the class and can avoid the excessive dependence of the visual interpretation and generation model to the semantic label.2、We make use of the image spatial information and present a visualization network model called G-CAM in the process of generating visual interpretation which can generate heat map that corresponding to the visual predicted results.The results can reveal the situation of the information utilization according to decision-making.The research focus of this paper tries to solve two main problems.(1)The synchronous implementation of the category prediction and the process of interpretation about the fine-grained objects;(2)Identifying the internal visual properties of the prediction process and realizing the visualization.The work of this paper was tested on the public California institute of technology's bird database.The experimental results show that the visual interpretation statement generated in this paper has a good performance in semantic expression which can prove that the model we proposed has good advantages.

Keywords/Search Tags:

multi-modal fusion, compact bilinear information fusion, fine-grained object classification, visual interpretation, visual heat map

Related items

1	Fine-grained Visual Classification Based On Convolutional Neural Network
2	Research On Information Fusion-based Fine-grained Image Classification Method
3	Multi-layer Weight-Aware Bilinear Pooling And Attention Mechanism For Fine-Grained Image Classification
4	Research And Application On Fine-Grained Image Classification Based On Bilinear Model
5	Research On Fine-Grained Visual Classification Based On Compact Vision Transformer
6	Fine-grained Visual Classification Via Weakly Supervised Information
7	Research On Visual Object Tracking Based On Bilinear Fusion
8	Fine-grained Image Classification Based On Convolutional Neural Network
9	Research On Fine-grained Image Classification Based On Deep Residual Network
10	Research On Fine-grained Image Classification Algorithm Based On Multi Convolution Neural Network Fusion