Font Size: a A A

Key Technology And Application Of Visual Object Detection And Recognition Based On Deep Learning

Posted on:2020-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:K J ZhangFull Text:PDF
GTID:2428330575955063Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of mobile internet technology,visual data such as images and videos are growing exponentially.It is significant to detect each object that may contain semantic information from massive images or videos.In this paper,we focus on the detection and recognition of visual objects and their applications from three aspects:image object detection,video object detection and video multi-person pose estimation.Image object detection needs to detect and recognize instances of a specified category and is one of the basic techniques for video object detection.With the increasing popularity of video applications,video object detection has begun to gain more researchers' attention and powers many vision tasks like pose estimation.In recent years,video multi-person pose estimation has become one of the key applications of video object detection.Video multi-person pose estimation relies on the candidate bounding boxes predicted by object detection network,and its accuracy largely depends on the quality of predictive results produced by object detection network.The main work of this paper includes:Firstly,for multiple detections and class imbalance encountered by one-stage im-age object detection,we propose a novel image object detection based on dynamical spatial constraints.We use dynamical spatial constraints to solve multiple detections while using two parallel classify networks to solve class imbalance.Dynamical spatial constraints select default boxes of the high confidence score from all positive default boxes while suppressing the adjacent positive boxes.The experimental results show that our method not only improves the accuracy,but also significantly improves the running speed.Secondly,for frame degeneration encountered by video object detection,we pro-pose self-adjusting partly feature aggregation method.Because of the movement of cameras and objects,the moving speed of objects in the video is different,so we dynam-ically adjust the number of frames for the feature aggregation according to the moving speed of objects.What's more,the moving speed of different objects in the same frame image of a video is different,so a feature aggregation mask is generated according to the predicted feature temporal consistency to achieve partly feature aggregation.Finally,for frame degeneration and hard keypoints encountered by video multi-person pose estimation,we propose a video multi-person pose estimation based on fea-ture aggregation.It consists of one object detection network and one human pose esti-mation network.Frame degeneration may result in the failure of location or recognition of one person,so we use feature aggregation to improve the predicted results of object detection network.To improve the location or recognition of hard keypoints,we add two-stage hourglass modules to the end of the encoder-decoder module and apply hard keypoint mining.
Keywords/Search Tags:object detection, multi-person pose estimation, frame degeneration, dy-namical spatial constraints, feature aggregation, hard keypoint mining
PDF Full Text Request
Related items