Font Size: a A A

The Fine-Grained Visual Explanation Generative Model Based On Multimodal Fusion

Posted on:2019-09-17Degree:MasterType:Thesis
Country:ChinaCandidate:Z X GaoFull Text:PDF
GTID:2428330545971459Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,the fine-grained object classification algorithm based on deep learning has achieved great success in the field of computer vision.However,it is not enough to obtain simple prediction results for understanding artificial intelligence.Which are the key factors to understand the interaction system of artificial intelligence is that it can explain why the visual system produces an output conclusion and obtain it's corresponding visual evidence.Therefore,it is necessary to provide a good discriminative characteristic,explain the internal reason of the decision prediction system,and make a visual interpretation model.This thesis mainly studies the following two aspects.1?We use the California institute of technology's fine-grained bird database to be specific research object combined with the feature of the marked descriptive text of birds in the database.Studying the multi-modal fine-grained recognition algorithm by Q-learning and a compact bilinear information fusion method of multi-modal.That can make the object classification and interpretation of the visual to be a whole.And thus can synchronously generate natural explanatory language which has a good discrimination for the class and can avoid the excessive dependence of the visual interpretation and generation model to the semantic label.2?We make use of the image spatial information and present a visualization network model called G-CAM in the process of generating visual interpretation which can generate heat map that corresponding to the visual predicted results.The results can reveal the situation of the information utilization according to decision-making.The research focus of this paper tries to solve two main problems.(1)The synchronous implementation of the category prediction and the process of interpretation about the fine-grained objects;(2)Identifying the internal visual properties of the prediction process and realizing the visualization.The work of this paper was tested on the public California institute of technology's bird database.The experimental results show that the visual interpretation statement generated in this paper has a good performance in semantic expression which can prove that the model we proposed has good advantages.
Keywords/Search Tags:multi-modal fusion, compact bilinear information fusion, fine-grained object classification, visual interpretation, visual heat map
PDF Full Text Request
Related items