Font Size: a A A

Research And Implementation Of Multimodal Character Social Relationship Recognition Algorithm Based On Video

Posted on:2023-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z H LiuFull Text:PDF
GTID:2558306914463704Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The high-speed evolution of network and multimedia technology has made the data volume of multimedia data such as text,pictures,and videos grow rapidly.Automatic extraction of social relationships from video data has great social and commercial value in the fields of multimedia content understanding,knowledge map construction,character tracking,character behavior and emotion analysis and so on.Social relationship recognition is a hot issue in the field of multimedia,and it has attracted extensive attention in academia and industry.It has achieved remarkable results in inferring social relationships from pictures and videos.Most of the existing research is based on still images.However,these methods are difficult to deal with the changing spatio-temporal information and multimodal information in video data.Therefore,it puts forward new challenges to the extraction and inference of people’s social relations in video.In this paper,the research and implementation of the following contents are carried out:Firstly,a new task "multiple relationship extraction in video(mrev)"is proposed to identify the relationship between multiple role pairs in video.In addition,based on the existing data sets,a video multiple relationship(VMR)data set is constructed,and subtitles are added to the visr data set to promote the research of multimodality in video.In order to solve the above problems,a vision text fusion framework(VTF)is proposed to jointly model visual and text information and mining rich multimodal clues.Comparative experiments and ablation studies on VMR data set and visr data set prove the effectiveness of VTF framework.Secondly,an end-to-end knowledge aggregation network(Kan)for video social relationship recognition is proposed.Design a branch architecture,including a main branch for relationship recognition and an auxiliary branch for human body analysis,scene recognition and text classification.At the same time,RKG is used to construct an effective context graph.Thus,an end-to-end trainable framework is formed,in which each branch task can train joint learning at the same time,so that the model can calculate context knowledge efficiently.The constructiveness of the KAN model is verified on the VMR data set.Thirdly,combined with the existing bdap(big data analysis platform)big data analysis platform,the data mining module is designed and developed,which provides the platform with the ability to process and analyze video data and provide visualization.Realize the expansion of the existing data analysis layer function of bdap,and facilitate users to analyze and calculate the social relationship recognition of large-scale video characters.
Keywords/Search Tags:Social Relation Recognition, Multi-modal study, Knowledge distillation, Video understanding
PDF Full Text Request
Related items