Font Size: a A A

Visual Relationship Detection Method Based On Hierarchical Feature Fusion And Meta-learning

Posted on:2024-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:R GuoFull Text:PDF
GTID:2568307064996659Subject:Engineering
Abstract/Summary:PDF Full Text Request
Over the past few decades,there has been remarkable progress in the field of computer vision,with research goals gradually shifting from recognizing a single object to recognizing multiple objects and understanding the relationships between them.To address this shift,the visual relationship detection task was introduced,which aims to identify the connections between objects in an image.There are various ways to approach the task of improving the performance of visual relationship detection models.One approach involves increasing the amount of training data,incorporating additional prior knowledge,and designing more sophisticated feature extraction models to obtain richer and more accurate information.Another approach involves constructing more effective model structures and devising better methods to make better use of the information obtained.This paper focuses on the latter approach,proposing a hierarchical feature fusion method to enhance the utilization of information and improve the performance of visual relationship detection models.Moreover,data imbalance is a significant challenge in this field and can impact model performance.Certain relationship categories have much larger sample sizes than others,resulting in a bias towards the categories with larger samples and neglecting those with smaller samples,affecting generalization ability and increasing overfitting risk.To address this challenge,this paper proposes a novel meta-learning approach,training a relational classifier using meta-learning and integrating it with the model to improve predictive power on relational categories with small sample sizes and overall performance.In summary,this paper’s main work and innovations include:(1)This paper proposes a model design approach that utilizes hierarchical feature fusion to integrate spatial,visual,and semantic features extracted from target detection algorithms.The approach employs a hierarchical strategy to fuse features and controls the degree of preservation of original information by the level of participation of each feature in the fusion.Furthermore,an object loss function is introduced to supervise the feature fusion effect in this work.The real categories of subject and object objects are utilized as target values during training to ensure that the fused features can more effectively represent object information.Comparative experiments are conducted to investigate the levels of feature involvement in fusion and the impact of the object loss function on model performance.Results from the comparison experiments show that the proposed object loss function can effectively enhance model performance.Moreover,the hierarchical feature fusion model proposed in this paper exhibits favorable performance in tests on multiple datasets.(2)We present a meta-learning approach to address the challenge of poor performance in visual relationship detection on small sample size relationship categories caused by data imbalance.Our proposed method transforms this problem into a small sample learning task for these categories.Specifically,we construct a metalearning module called the Meta-Relation Classifier(MRC)based on the upper layer network structure of the original model,and train it with the meta-learning method to achieve high accuracy in small-sample category prediction.The MRC can be directly incorporated into any visual relationship detection model without requiring the design of a new network structure.As such,we classify it as a model optimization method for model-independent visual relationship detection,rather than as a visual relationship detection model.Experimental results demonstrate that our proposed method can significantly enhance the classification accuracy of the original model for small sample categories,thereby improving the overall performance of the model on the visual relationship detection task.
Keywords/Search Tags:visual relationship detection, feature fusion, few-shot learning, meta-learning
PDF Full Text Request
Related items