Font Size: a A A

Research On Some Problems Of Video Action Detection Based On Deep Learning

Posted on:2022-12-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:H N QiuFull Text:PDF
GTID:1488306773983739Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
With the development of 5G communication technology,the mobile Internet has entered a new era.Video has become the primary carrier of information traffic,re-placing pictures,text,and sound as the most popular media form on the Internet.The deep learning based computer vision algorithms represented by the convolutional neural network(CNN)have made breakthroughs in recent years.This provides new ideas for studying video analysis and give more possibilities for its application in real-life scenar-ios.Video Action Detection(VAD)is the task of finding the location of specific action from a video.It is the basis for video structure,video information retrieval,anoma-lous event detection,and other applications.Thus,VAD has received much attention from both academia and industry.However,most existing video action detection al-gorithms borrow Generation and Evaluation(G?E)detection frameworks from object detection.In G?E detection framework,the VAD algorithm requires generating a set of candidate proposals and then finding the proposal which contains completed action.Due to the difference between video and image,the G?E framework for video action detection exposes many shortcomings in practical application.It exists the problems such as candidate proposals adjusting performance is poor during the training phase,many high-quality candidate proposals are wasted during the evaluation phase,the pre-defined proposals anchor are challenging to adapt the action with various durations,and the performance of candidate proposals' actionness evaluation is not well et al.This paper studied video action detection from several perspectives.The paper started with optimization of existing G?E-based video action detection algorithms and then tried to go beyond the G?E frameworks to explore the video action detection with-out using candidate proposals.Finally,the paper focus on the core problems of video action detection the activity evaluation of video.The main contributions of this paper include:(1)For the problem of G?E framework's low effectiveness in adjusting the candi-date proposals' position during training,we propose an approach combination of coarse and fine granularity iterations for candidate proposals' position adjustment.First,we globally evaluate the action of candidate proposals with their contextual regions from a high-level view and adjust the position on a large scale.Then,we fine-tune the boundaries of proposals on the frame level by focusing on the details of the frames in video.The experiment results show that this design can effectively improve video action detection performance.(2)For the problem of not all high-quality candidate proposals generated by G?E detection framework can participate in the final evaluation,thus lead the loss of detection performance.We propose a method to train the ranking of candidate proposals and use the relative quality relationship between candidate proposals to improve the accuracy of the action detection.To this end,we built a Siamese network to rank the candidate proposals.Furthermore,we also propose the Group-Level ranking optimization method for the problem of limited throughput of the traditional Siamese network during the ranking training stage.Our candidate pro-posals' ranking method can be compatible with the existing G?E detection frame-work and bring stable performance improvement for video action detection.(3)For the problem of G?E-based action detection framework's pre-defined pro-posals are difficult to match the action with various durations in the actual detection task,we think outside the G?E framework and propose a proposal free video ac-tion detection method that uses the linking mechanism.We build the graph models on the video clips to fuse the temporal information and use the action-background recognizer which trained with different data balancing strategies to link the video clips containing the same actions to get action detection results.To verify the method,we conducted the experiments on our own dataset,the Ego-Deliver,which was collected from the takeout delivery industry and the widely used academic datasets THUMOS-14 and Acvitity Net 1.3.The experiment results show that our proposal-free action detection method can compete with the state-of-the-art G?E based action detection framework in performance,and it can generate more con-cise and clear action detection results.(4)For the problem of insufficient actionness evaluation for candidate proposals,we propose visual feature converters with the mid-feature cache repositories.We converter the complex raw visual features into clear explicit visual features and then train the action recognition network with a self-supervised method to distin-guish the action and background from various scenes and representations.Thus, the feature converter can give more robust actionness properties for converted ex-plicit visual features.We evaluate our method on two different video action detec-tion subtasks that both of them urgently need to improve the actionness evaluation performance.First,we use the converted explicit visual feature to replace the raw visual feature as the input to an online video action detection framework based on the multi-head self-attention(MHSA)module to improve its accuracy when detecting the action in the current frame.Second,we use the cached implicit fea-ture repository as the voting pool for weakly supervised video action detection tasks.Similar features with its label can help the detection framework to survive in the cold start phase of training and generating better candidate proposals.The experiment results show that our approach have better performance in actionness evaluation.
Keywords/Search Tags:Video Action Detection, Online Action Detection, Weak Supervised Action Detection, Egocentric Video Analysis Dataset
PDF Full Text Request
Related items