Font Size: a A A

Scene Understanding Based On Multi-scale Deep Convolutional Network

Posted on:2016-10-28Degree:MasterType:Thesis
Country:ChinaCandidate:D Y LiFull Text:PDF
GTID:2348330503494247Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
Scene understanding is one of the most important research targets in field of computer vision. And machine learning is its method commonly used. Generally, this pattern recognition system is composed of two components: features extracting and classification. In this paper, we adapt deep learning as our architecture to extract features. After trained, the network can be used to extracted features from patches of the full image. Then we can use a method of features concatenation to get the feature of the full image. Here we adopt SVM as our classifier because it is widely used and its performance is so excellent.This paper focus on the research of image features. Deep learning, as the popular method of feature learning and feature extraction, can be used to extract generalized features from an image. We adopt CNN with a SPP layer as our structure, which adds a spp-layer between the last convolutional layer and the first fully-connected layer. Different from the traditional CNN, which only accepts fixed-size image, this network can accept images of arbitrary sizes as input, whatever the ratios. To some extent, it will decrease information loss and improve the accuracy of classification when we input a full image. Experiments on dataset SUN show that: 1) the network with spp-layer can accept image of arbitrary sizes. When training the network, we can utilize multi-size images as input and this will the expression power of the c. 2) though adding a layer increases the number of parameters, it speeds up the training of the network and the process of test. 3) To get features of little patches from a full image, we map feature maps from the entire image to every patch cropped from the full image. By this means, we need convolutional computation only once for one image and this will cost less time.This paper adopts a method of features aggregation, named VLAD, a simplified version of FV. This method concatenates features vector extracted from patches and this will retain more image information. Meanwhile, this will avoid the disadvantage that dimensions will be too large when aggregat features together. Convolution computation retains more spatial information of image. Comparing with the SPM, which incorporatas loose spatial information in the BoF vectors, we consider that more orderless features may be a good choice. Experiments show that feature vectors extracted like this are more robust in the presence of geometric deformations. Scene images are highly variable and more robust features can do well to the classification results.Deep learning has broad space for development in area of image expression and features concatenation. The method used in this paper has achieved appealing result on many datasets. We will focus on the job of improving the efficiency of the algorithm, exploring different architectures, as well as the promotion of the architecture to different application areas.
Keywords/Search Tags:Deep learning, convolutional network, Spatial Pyramid, Scene understanding
PDF Full Text Request
Related items