Font Size: a A A

Research On Salient Person Detection Algorithm In Video Based On Deep Learning

Posted on:2022-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y X TuFull Text:PDF
GTID:2518306731486864Subject:Electrical engineering
Abstract/Summary:PDF Full Text Request
The detection of salient persons in complex scene videos is currently an emerging research direction of information technology.With the ra pid development of computer science and artificial intelligence,it has received extensive attention.Fields such as substation maintenance,code of conduct for staff in power grid business halls,identity verification of passengers entering stations in ra ilway stations,autonomous robot walking need to quickly and accurately identify salient persons in the scene,so as to track and segment salient persons in the follow-up.In tasks such as action recognition,resources are allocated as much as possible to salient persons to achieve the most effective use of resources.Most of the existing video salient person detection network is based on deep learning technology,and mainly adopts optical flow method or LSTM algorithm to fuse spatiotemporal features betwee n video frames.Existing detection algorithms have problems such as high requirements for datasets,imbalance between detection accuracy and detection efficiency,and low effectiveness in detecting salient persons.This article uses high-definition videos as the research material.The videos contain many scenes,such as domestic and foreign leaders'inspections,exchanges,speeches,meetings,visits,indoor and outdoor military combat commands,outdoor assaults,etc.This thesis takes the salient person in the video as the research object,and determines the salient person in the video by means of theoretical and experimental research.This thesis is dedicated to more in-depth and thorough research on the detection of salient persons in video,and strives to improve the effectiveness,accuracy and generalization ability of salient person detection.Firstly,for the characters in the video,this thesis proposes a fast and slow network-based memory-enhanced global-local integration video person detection algorithm to determine the people in the video and exclude the interference of other objects except the people.This method combines the global semantic information and local location information of the video,and uses a long-range memory module to make full use of all the timing information of the video.In addition,the method proposed in this thesis simulates the persistence effect of human vision.The designed network is a combination of fast and slow networks according to a certain ratio.The fast network extracts the main features of the current frame,and the slow network extracts the detailed features of the current frame.In this way,the redundant information of the video frame is reduced,and the detection speed of the network is greatly improved.Through comparative experiments,qualitative and quantitative evaluation methods are adopted for SF-MEGA and other object detection methods on the dataset proposed in this thesis,which verifies the effectiveness of SF-MEGA for the research objects in this thesis.Secondly,the single object detection algorithm can only determine the position of the person in the video,and cannot extract the relationship between the characters,the characteristics of the salient person from other characters,etc.,so it cannot determine the shortcomings of the salient person in the video.This article adopts video salient person detection network based on residual connection and GCNet enhancement module,the salient person in the video is detected.The saliency detection proposed in this thesis includes three modules:a saliency optimization network with residual connections,a recursive enhancement module based on GCNet,and an optical flow-oriented pseudo-label generation module.First,the continuous video frames and the labeled frames are sent to the saliency optimization network with residual connection to extract the spatial features of the picture,and the saliency detection result is given.Then,this article adds a DB-Conv GRU module to the network to enhance the spatiotemporal correlation of feature representation and extract spatiotemporal consistency information of vid eo frames.In addition,the GCNet self-attention module is added to the DB-Conv GRU network,and the attention mechanism is introduced into the DB-Conv GRU network to improve the effectiveness of spatio-temporal feature prediction and enable the network to adaptively learn important saliency information in video frames.Finally,this thesis designs an optical flow-oriented pseudo-label generation module to obtain pseudo-labels from sparsely labeled video frames,which can reduce the requirements on the training dataset and reduce the difficulty of labeling the dataset.The method proposed in this thesis is tested on the VSPD video data set.The average absol ute error of the model in this thesis is 12.50%,18.33%and 7.53%lower than BASNet,U-2-Net and F3Net respectively,and the frequency-tuned salient region detection index(F?max-)is 5.69%,4.11%and 4.12%higher respectively.The test results show that the method proposed in this thesis improves the accuracy of video saliency detection and has strong robustness.Finally,according to the characteristics of the above two algorithms,this thesis fuses the above two features to complete t he task of detecting the salient person in the video.The experimental results show that the multi-feature fusion algorithm proposed in this thesis can effectively detect the salient person in the video,and the algorithm in this thesis can quickly and accurately complete the detection task of the salient person in the video in complex scenes.
Keywords/Search Tags:Object detection, Saliency detection, Salient person, Deep learning, Optical flow
PDF Full Text Request
Related items