Research And Implementation Of Multimodal Character Social Relationship Recognition Algorithm Based On Video

Posted on:2023-01-12

Degree:Master

Type:Thesis

Country:China

Candidate:Z H Liu

Full Text:PDF

GTID:2558306914463704

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The high-speed evolution of network and multimedia technology has made the data volume of multimedia data such as text,pictures,and videos grow rapidly.Automatic extraction of social relationships from video data has great social and commercial value in the fields of multimedia content understanding,knowledge map construction,character tracking,character behavior and emotion analysis and so on.Social relationship recognition is a hot issue in the field of multimedia,and it has attracted extensive attention in academia and industry.It has achieved remarkable results in inferring social relationships from pictures and videos.Most of the existing research is based on still images.However,these methods are difficult to deal with the changing spatio-temporal information and multimodal information in video data.Therefore,it puts forward new challenges to the extraction and inference of people’s social relations in video.In this paper,the research and implementation of the following contents are carried out:Firstly,a new task "multiple relationship extraction in video(mrev)"is proposed to identify the relationship between multiple role pairs in video.In addition,based on the existing data sets,a video multiple relationship(VMR)data set is constructed,and subtitles are added to the visr data set to promote the research of multimodality in video.In order to solve the above problems,a vision text fusion framework(VTF)is proposed to jointly model visual and text information and mining rich multimodal clues.Comparative experiments and ablation studies on VMR data set and visr data set prove the effectiveness of VTF framework.Secondly,an end-to-end knowledge aggregation network(Kan)for video social relationship recognition is proposed.Design a branch architecture,including a main branch for relationship recognition and an auxiliary branch for human body analysis,scene recognition and text classification.At the same time,RKG is used to construct an effective context graph.Thus,an end-to-end trainable framework is formed,in which each branch task can train joint learning at the same time,so that the model can calculate context knowledge efficiently.The constructiveness of the KAN model is verified on the VMR data set.Thirdly,combined with the existing bdap(big data analysis platform)big data analysis platform,the data mining module is designed and developed,which provides the platform with the ability to process and analyze video data and provide visualization.Realize the expansion of the existing data analysis layer function of bdap,and facilitate users to analyze and calculate the social relationship recognition of large-scale video characters.

Keywords/Search Tags:

Social Relation Recognition, Multi-modal study, Knowledge distillation, Video understanding

PDF Full Text Request

Related items

1	Research And Implementation Of Video-Oriented Multi-Cue Social Relationship Network Construction Algorithm
2	Research And Implementation Of Video Social Relationship Knowledge Graph Construction System
3	Social Relationship Understanding In Visual Content
4	Research On Video Object Detection Based On Multimodality And Knowledge Distillatio
5	A Relation Extraction Algorithm In Multi-modal Knowledge Graph
6	Knowledge Distillation For Speech-assisted Lip Reading
7	Research On Knowledge Transfer Based Implicit Discourse Relation Recognition
8	Research And Implementation Of Person Recognition Method Based On Video Data
9	Human-object Interaction And Video Understanding Under Complex Scenarios
10	Multi-level And Multi-modal Named Entity Recognition