Research And Implementation Of Video-Oriented Multi-Cue Social Relationship Network Construction Algorithm

Posted on:2024-08-14

Degree:Master

Type:Thesis

Country:China

Candidate:C Y Cao

Full Text:PDF

GTID:2568306944463284

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

As a crucial task in video understanding,constructing social relationship networks can not only explore the potential semantic knowledge in video content but also help AI better understand human behaviors and emotions in the video.Many studies on knowledge mining of interpersonal relationships still focus on static images,lacking attention to temporal knowledge and other important modalities.Although humans can easily identify or infer social relationships between characters through various comprehensive clues,such as appearance,interaction,dialogue,clothing style,and background.Automatically capturing social relationships between characters is still a challenging task for AI,including how to effectively model the spatiotemporal structure and semantic information of videos,integrate multiple feature clues,and create scalable social network construction models.To this end,this article studies and implements the following contents:Firstly,we propose a multi-cue social network construction method based on multi-teacher knowledge distillation(McSRE)to extract social relationships in unconstrained scene videos.This method uses multiple teacher models and feature-based distillation methods to mine multiple clues from videos.Then,we design a method that combines multiple clue features and temporal features,and construct an attention-based temporal cue graph(ATCG).Under this method,the knowledge of multiple teacher models is transferred to multiple simple student models for model compression.Experiments on the ViSR and MovieGraphs datasets show that the McSRE model can achieve results close to or even surpass the SOTA methods with compressed models.Secondly,we propose a multi-cue video social relationship network construction method based on feature aggregation(RCRV)to aggregate meaningful contextual features that are important for identifying social relationships.We propose a novel global-local VLAD(GL-VLAD)module,using different scales of convolution to enlarge different receptive fields and extract the global and local information of features in the video.In addition,we propose a Multimodal Fusion Graph(MFG)to focus on the knowledge of different modalities,which can represent the general features in multi-modal video scenes.Thirdly,in combination with the big data analysis platform(BDAP),we design and develop basic functions and a video relationship generation module,which provides the platform with the ability to process and analyze video data and visualize it,allowing users to intuitively experience the charm of video social network construction.

Keywords/Search Tags:

video understanding, social relationship extraction, person recognition, knowledge distillation, multi-modal, temporal

PDF Full Text Request

Related items

1	Research And Implementation Of Multimodal Character Social Relationship Recognition Algorithm Based On Video
2	Research And Implementation Of Person Recognition Method Based On Video Data
3	Social Relationship Understanding In Visual Content
4	Research And Implementation Of Video Social Relationship Knowledge Graph Construction System
5	Research On Technologies Of Video Character Social Relationship Recognition Based On Multimodal Data
6	Temporal Relationship Recognition And Its Application For Emergency Events
7	Research On Entity Relationship Extraction Based On Self-Attention And Knowledge Distillation
8	Research On Key Technologies Of Social Relationship Extraction Of Video Characters
9	Research On Modality-invariant Feature Extraction For Cross-modal Person Re-identificatio
10	Research On Key Technology Of Action Recognition Based On Visual Perception