Key Technology And Application Of Visual Object Detection And Recognition Based On Deep Learning

Posted on:2020-03-23

Degree:Master

Type:Thesis

Country:China

Candidate:K J Zhang

Full Text:PDF

GTID:2428330575955063

Subject:Computer technology

Abstract/Summary:

With the rapid development of mobile internet technology,visual data such as images and videos are growing exponentially.It is significant to detect each object that may contain semantic information from massive images or videos.In this paper,we focus on the detection and recognition of visual objects and their applications from three aspects:image object detection,video object detection and video multi-person pose estimation.Image object detection needs to detect and recognize instances of a specified category and is one of the basic techniques for video object detection.With the increasing popularity of video applications,video object detection has begun to gain more researchers' attention and powers many vision tasks like pose estimation.In recent years,video multi-person pose estimation has become one of the key applications of video object detection.Video multi-person pose estimation relies on the candidate bounding boxes predicted by object detection network,and its accuracy largely depends on the quality of predictive results produced by object detection network.The main work of this paper includes:Firstly,for multiple detections and class imbalance encountered by one-stage im-age object detection,we propose a novel image object detection based on dynamical spatial constraints.We use dynamical spatial constraints to solve multiple detections while using two parallel classify networks to solve class imbalance.Dynamical spatial constraints select default boxes of the high confidence score from all positive default boxes while suppressing the adjacent positive boxes.The experimental results show that our method not only improves the accuracy,but also significantly improves the running speed.Secondly,for frame degeneration encountered by video object detection,we pro-pose self-adjusting partly feature aggregation method.Because of the movement of cameras and objects,the moving speed of objects in the video is different,so we dynam-ically adjust the number of frames for the feature aggregation according to the moving speed of objects.What's more,the moving speed of different objects in the same frame image of a video is different,so a feature aggregation mask is generated according to the predicted feature temporal consistency to achieve partly feature aggregation.Finally,for frame degeneration and hard keypoints encountered by video multi-person pose estimation,we propose a video multi-person pose estimation based on fea-ture aggregation.It consists of one object detection network and one human pose esti-mation network.Frame degeneration may result in the failure of location or recognition of one person,so we use feature aggregation to improve the predicted results of object detection network.To improve the location or recognition of hard keypoints,we add two-stage hourglass modules to the end of the encoder-decoder module and apply hard keypoint mining.

Keywords/Search Tags:

object detection, multi-person pose estimation, frame degeneration, dy-namical spatial constraints, feature aggregation, hard keypoint mining

Related items

1	Human Joint Multi-view Fusion And Human Pose Estimation
2	Single-Frame Based Multi-Person Absolute 3D Pose Estimation
3	The Monitoring System Of Operation Platform Based On Union Pose Estimation
4	Research On Multi-person Pose Estimation Based On Convolutional Neural Network
5	The Research On Object 6D Pose Estimation Method Based On RGB Image For Open Scenes
6	Research On Multi-person 2D Human Pose Estimation Algorithm
7	Research And Application Of Real-Time 2D Multi-Person Pose Estimation Algorithm Based On Embedded Devices
8	Multi-person Pose Estimation Based On Convolutional Neural Network
9	Research On Low-resolution Human Pose Estimation Methods
10	Research And Implementation On Multi-Person Activity Recognition Technology Based On Pose Estimation