Font Size: a A A

Character Relationship Understanding Based On Feature Correlation

Posted on:2022-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:J JiangFull Text:PDF
GTID:2518306557968269Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Understanding the social relationships of the characters in the video is a meaningful and difficult task in the field of computer vision.The main difficulty requires the use of algorithms such as convolutional neural networks in deep learning to integrate the contextual cues in the video and the global characteristics of the video content to infer the social relationships of the characters in the video.Most of the current researches are extracting the semantics of person relations from images,but for another important media-video,further exploration is needed.This thesis takes the ability to effectively understand the relationship between the characters in a given video as the ultimate research goal,this thesis first proposes a video target detection method based on enhanced semantics to extract video key frames for enhanced feature extraction to complete target detection,and then designs a spatial-temporal feature extraction method based on residual network to extract global features of video content.Finally,design a method of video character relationship recognition based on multi-scale analysis to complete the understanding of video character relationship.The work innovation of this thesis is mainly reflected in the following three aspects:(1)The density clustering method is used to extract the key frame of the video,and a memory storage module is used to store the intermediate features generated in the detection process of the previous frame,so that the key frame can use the cache information to extract the enhanced features,and the classification algorithm is used to carry out classification regression on the enhanced features to obtain the target detection results.Target detection experiments were conducted on Image Net VID dataset,and the experimental results are representative of that the accuracy of target detection using enhanced features is improved compared to some end-to-end target detection models.(2)The 3D filter is decomposed into the form of time and space(2D+1D),and the designed(2D+1D)form of residual block is combined with the hourglass structure with deep convolution at the end of the residual path to form a new 3D residual network for spatial-temporal feature extraction.Feature extraction experiments are done on Acitivity Net dataset and VISR dataset respectively.The experimental results are representative of that the hourglass structure increases the precise of feature extraction.(3)The candidate box of target detection and attitude estimation are used to construct multiple graphs.The pyramid convolution network is used to convolve the multiple graphs.The spatialtemporal features extracted from the video are fused with the local features obtained from the convolution of the multiple graphs.Experiments are done on VISR datasets,and finally,it shows that the video character relationship recognition method based on multi-scale analysis designed in this thesis is suitable for video and has high accuracy.
Keywords/Search Tags:deep learning, Character relationship understanding, Pyramid convolutional network, Multiple figure, Feature extraction
PDF Full Text Request
Related items