Font Size: a A A

Relation Learning In Video Summarization And Target Re-identification

Posted on:2021-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:X F HeFull Text:PDF
GTID:2518306503972229Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of deep learning,computer vision has made great breakthroughs in recent years,such as object and face recognition,object detection,object segmentation and so on.However,in the field of computer vision,there is generally a lack of relation learning,such as the relations between multiple objects in object detection,the relations between human and multiple objects in human-object interaction detection,and the relations among frames in video analysis tasks.For several specific computer vision tasks,relation learning can significantly improve the performance.In this paper,relation learning is utilized to help two specific tasks: video summarization and target re-identification,and the importance of relation learning in video summarization and target re-identification is demonstrated.Video summarization aims to shorten the video while still maintain the storyline of the original video.With the popularity of video websites and short video apps such as bilibili and tiktok,a large amount of video data is generated on the Internet every day.This brings huge cost to server storage,network bandwidth and manual video processing,thus video summarization has received great attention in recent years.However,there exist two problems in this field: 1).the cost of labeling video data is huge;2).how to efficiently learn the relationships among the video frames.These two problems can be solved by unsupervised learning and relation learning respectively.Target re-identification is one of the core technologies of the sky eye system which is common in science fiction movies.The purpose of target re-identification is to find the specified person or object across multiple cameras or different perspectives of the same camera.This technology is one of the first practical technologies in computer vision area and is widely used in security cameras.Feature learning and metric learning based target reidentification algorithms have achieved great progress in recent years,and the accuracy of these algorithms can meet the needs of security system in real life.However,these methods only consider the matching between query image and single camera in single perspective,and can not utilize the relations between multiple cameras.Therefore,these algorithms are likely to produce incorrect matches when images with different ids are similar enough.The relation learning can be used to solve this problem.Although video summarization and target re-identification are different problems in computer vision,they have something in common: how to design efficient relation learning modules according to the features of problem.This paper proposes two deep learning algorithms from the perspective of relation learning,which are respectively applied to video summarization and target re-identification.The main contributions are as follows:· We propose a novel yet simple unsupervised video summarization algorithm.This algorithm utilizes the conditional Generative Ad-versarial Network(GANs)to realize unsupervised learning,and first uses the frame-level multi-head self-attention mechanism in video summarization area which learns the long-range temporal dependen-cies along the whole video sequence.In addition,we design a condi-tional feature selector to guide the GAN model to focus on the more important region of the entire video.· We propose a novel Link Feature Learning(LFL)framework based on deep graph convolution network(GCN).The gallery to gallery re-lationships are exploited to establish informative edges in the GCN module to avoid dense and meaningless connections.Also,we pro- pose an effective and efficient hard gallery sampler to obtain high recall for positive samples while keep a reasonable graph size,which can also weaken the affect of imbalanced classification and avoid high computation complexity.Additionally,we show that our method is a universal and flexible module.· Extensive experiment results show that our video summarization al-gorithm and target re-identification algorithm achieve the state-of-the-art performance on various public datasets,which further proves that relation learning can significantly improve the performance of video summarization and target re-identification.
Keywords/Search Tags:relation learning, video summarization, target reidentification, generative adversarial network, graph convolution network
PDF Full Text Request
Related items