Font Size: a A A

Research On Video Object Detection And Tracking And Action Recognition

Posted on:2021-08-13Degree:MasterType:Thesis
Country:ChinaCandidate:Z LiFull Text:PDF
GTID:2518306308973849Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the development of big data and computer technology,deep learning has become the mainstream technology in the field of artificial intelligence.Deep neural network can learn the features of strong expression ability and generalization ability,which makes deep learning surpass the performance of traditional algorithms in various fields,especially in natural language processing and computer vision,and its achievements far exceed the previous related technologies.From the perspective of modern scientific research results,vision can capture a lot of information,which is the most important of the five human sensory organs.Object detection,object tracking and action recognition,based on the nature of their tasks,have become hot research directions of computer vision.Thanks to the development of deep learning,these three technologies have developed rapidly in the recent years,and are gradually applied to various fields,but accompanied with more difficult challenges.Therefore,the research on object detection technology,object tracking technology and action recognition is of great practical significance.Based on these three technologies,this paper designs and implements a real-time preception fusion algorithm framework on GPU,which integrates the functions of object detection,object tracking and action recognition.The preception fusion algorithm framework designed in this paper is composed of four parts,namely,object detection module,object tracking module,pose estimation module and action recognition module,which will be combined in a serial or parallell way,so there is a high demand for the performance(precision and speed)of each one.For object detection,based on the comparison of lightweight networks,a super real-time object detection algorithm YOLOv3-MobileNetl.0 based on MobileNetV1 and YOLOv3 is designed,which has achieved good results in the open datasets.For object tracking,a super real-time,long-term target tracking algorithm based on frame-skipping detection strategy and integration of sort and SiamRPN++is designed and proposed,and the SiamRPN++is accelerated by mobilenet series network,and the algorithm finally achieves good performance in practical application.For pose estimation,this paper uses PixelShuffle operations as the upper sampling layer,designs and proposes a pose estimation network,called Fastpose-MobileNet1.0,based on the idea of alphapose,which surpasses other pose estimation algorithms of the same order of magnitude in the open datasets,also realizes the target super real-time.For action recognition,this paper considers that compared with image feature,human skeleton motion feature has less noise and less computation.Based on spatial temporal GCN and skeleton reconstruction,an action recognition network called SR-STGCN is designed.Compared with ST-GCN,its accuracy in NTU RGB+D dataset is 3%higher.In summary,each improved or designed module of the proposed perception fusion algorithm framework has achieved good results in open datasets,and also the framework above has achieved good results in practical application,which proves that the perception fusion algorithm framework designed in this paper is of great significance in scientific research and engineering.
Keywords/Search Tags:object detection, object tracking, pose estimation, action recognition
PDF Full Text Request
Related items