Font Size: a A A

Scene Understanding With Convolutional Neural Networks

Posted on:2016-12-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y H WuFull Text:PDF
GTID:2308330503956367Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Scene understanding is one of the most important problems in Computer Vision,which can be widely applied in robot navigation, driver assistant systems, environmental monitoring system, content-based image retrieval etc. It is also a fundamental theoretical problem in computer vision which aims to teach machines to understand images like human beings, and it has still not been solved yet. One of the difficulties of scene understanding is to generate translation-invariant, rotation-invariant, scale-invariant semantic features. Inspired by visual perception systems, Convolutional Neural Networks(CNNs)can learn distinguished features in supervised learning from plenty of samples and has achieved great success in some problems of Computer Vision, Speech Recognition and etc. But As feature extractors, to choose appropriate models and parameters of Convolutional Neural Networks is time consuming. Besides, although Convolutional Neural Networks do have translation invariance in local regions, they are not robust in scale invariance. And the features learnt especially in higher layers can not be explained well.We used Convolutional Neural Networks as feature extractors to address the problem of scene understanding, especially the targets of specific traffic sign detection and scene labeling. We did some researches about Convolutional Neural Networks with multi-scale information. In specific tasks, we did some researches about the choices of models and training parameters of CNNs.In realtime object detection problems, we need to get bounding boxes instantly. Besides, Convolutional Neural Networks demand high precisions about the locations and scales of bounding boxes. In traffic sigh detection, we transformed RGB images to grayscale images with SVM, followed by Convolutional Neural Networks with fixed filters applied in different scales of images to get bounding boxes. We used multi-stage Convolutional Neural Networks to recognize traffic sign classes. In German Traffic Sign Detection Benchmarks, we ranked 2nd place in Class “Mandatory”, and ranked 3rd place in Class “Danger”.In scene labeling, we used multi-scale Convolutional Neural Networks to label each pixel. Multi-scale Convolutional Neural Networks utilized different scales of context information. After coarse labeling of Convolutional Neural Networks, a Fully-connected Conditional Random Field corrected some labeling mistakes. We obtained 79% pixel average accuracy with a speed of 2 seconds per image in Stanford Background Data.
Keywords/Search Tags:convolutional neural networks, scene understanding, traffic sign detection, scene labeling
PDF Full Text Request
Related items