Font Size: a A A

Research On Visual Relationship Detection Method With Feature Enhancement And Fusion

Posted on:2024-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:J T LiFull Text:PDF
GTID:2568307115463944Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Visual relationship detection refers to predicting the interactive relationship between entities after recognizing the entities in the image and structurally expressing the relationship between entities in the form of the triple of <subject,predicate,object>.However,traditional visual relationship detection models have some problems in feature representation and feature fusion modules.On one hand,visual feature representation methods usually ignore the finer-grained information in the feature map,resulting in the loss of detailed information.Meanwhile,spatial feature representation methods ignore the importance of entities on relative positions,and there is also a special case that cannot produce a unique vector representation,which affects the classification effect of the model.In the feature fusion stage,the traditional methods ignore the importance issue between features,leading to poor fusion results.On the other hand,in the traditional spatial feature representation methods,there is a subjective error in setting the location information of entity pairs manually.In addition,the current machine automation methods for acquiring spatial features of entity pairs do not consider the relative position information of entity pairs.In feature fusion,the traditional methods ignore the contextual correlation between features,which affects the classification accuracy of the model.This thesis researches the above issues,and the main research contents are as follows:(1)A visual relationship detection model that enhances pixel-level features and incorporates weighted fusion is proposed.This model first optimizes the representation of visual and spatial features.Specifically,fine-grained information blocks are used to capture the pixel-level contextual detail information in the feature map,thus improving the classification effect of the model.Meanwhile,coordinate encoding is used to encode the respective and relative positions of entity pairs of bounding boxes to generate a unique and accurate spatial feature representation vector.Second,in the feature fusion process,the model adopts a feature weighting method.This method comprehensively considers the different importance relationships of semantic,visual and spatial features to obtain a more complete fusion vector.(2)A feature fusion visual relationship detection model based on contextual information transfer is proposed.In spatial feature representation,the model adopts a method based on the hard attention mechanism.This method can automatically extract and represent the spatial features of entity pairs and considers the local and global spatial location information.In addition,the model adopts a feature fusion method based on contextual information transfer.This method uses the message passing mechanism,which can effectively fuse the context-related information between different features,and further comprehensively utilize multiple features.In summary,the research in this thesis focuses on how to obtain richer feature representation vectors from raw image data to enhance the feature representation capability of the model,and also on how to better fuse multiple features of entity pairs.Therefore,two different models are proposed in this thesis and extensive experiments are conducted on the mainstream visual relationship detection dataset.The experimental results show that the proposed models both achieve better performance.
Keywords/Search Tags:Visual relationship detection, Feature representation, Feature fusion, Fine-grained information block, Message passing mechanism
PDF Full Text Request
Related items