Font Size: a A A

Semantic Network Based Image Information Representation And Visual Reasoning

Posted on:2020-03-28Degree:MasterType:Thesis
Country:ChinaCandidate:X B NiFull Text:PDF
GTID:2428330596975183Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
In the field of cognitive science,the use of deep learning technology to enable machines to perform certain intelligent behaviors like humans is an important research direction,and visual reasoning is one of the key technologies.However,the input of the existing visual reasoning model is mainly based on redundant visual features.Considering that the semantic representation of images has more explicit structured highlevel semantic information and interpretability in the process of reasoning compared to visual features,the main research direction of this paper is to propose the use of image semantic representation combined with natural language understanding technology to solve visual reasoning problems based on the existing visual reasoning research.It can be divided into the following parts:1)Aiming at the shortcomings of the existing research,a visual reasoning framework based on semantic network is designed.The whole framework is divided into two parts: semantic representation extraction module and problem understanding module.The semantic representation extraction module detects the entities in the image and extracts the attributes of each entity,and finally constructs a semantic network describing the scene.The problem understanding module uses recurrent neural network related techniques to map natural language questions to vector space or another representation form.This framework features high transparency and the ability to respond quickly to changes in demand.2)Analyze the performance and characteristics of semantic representations in visual reasoning tasks on an end-to-end baseline model.The performance of the model has been slightly improved after replacing the visual features of the image with unprocessed semantic representations.After the migration of the Gram matrix idea in the field of visual style transfering,the semantic representation is reorganized.This simple operation makes the model further improve on related and unrelated problems which verifies the validity and plasticity of semantic representation in visual reasoning tasks.3)Focusing on the visual reasoning dataset CLEVR,the semantic representation extraction module and the problem understanding module construction method in the visual reasoning framework are discussed in detail.A method for less training sample target detection and a generic attribute extractor constructed by a residual network and a maximum pyramid pooling layer are proposed in the semantic representation extraction module.In the problem understanding module,two kinds of problem understanding module are built which are problem embedding based problem understanding module and machine translation-based problem understanding module.Finally,the visual reasoning framework proposed in this paper has the highest test accuracy rate of 96.14% on the CLEVR dataset,which exceeds the existing baseline level and human level,and is close to the current leading research results,and has high transparency and low coupling that are not available in existing models.Based on the existing deep learning technology,this thesis gives the clear research ideas of solving visual reasoning tasks with semantic representation of image,the model construction method and the reasonable organization form of each part,which makes the process of visual reasoning more transparent and obtains leading research results.This achievement also shows that it is feasible and effective to introduce high-level and abstract knowledge representation forms in deep learning tasks,which lays a theoretical and experimental foundation for introducing different levels of knowledge representation in the field of deep learning.
Keywords/Search Tags:visual reasoning, visual question answering, semantic net, semantic representation
PDF Full Text Request
Related items