Font Size: a A A

Multi-Label Scene Classification Using Convolutional Neural Network

Posted on:2016-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z ChenFull Text:PDF
GTID:2308330461486308Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Unlike traditional classification problems, multi-label classification is a more complex classification task. In traditional classification problems, each instance has only one class label. In multi-label classification, every instance can belong to more than one class simultaneously. These tasks have been widely used in text classification, image recognition, medical diagnose, etc. Multi-label scene classification is an application of multi-label classification in image recognition. Convolutional Neural Networks are frequently used for image recognition, and it is also the commonly used model for deep model based image understanding in recent years. Now the study about CNN is mainly concentrated on deep models, more layers and more complicated networks is the trend of research. However, not all tasks needs a complicated network, in some constrained fields, such as scene classification, a simple network will be enough. Especially when the training data set is small, shallow network is almost the only choice. In addition, in the research of deep models, some newly developed tricks perform well, such as Relu activation function, dropout and so on, but whether these tricks are suited for the shallow networks is still unsettled.We used CNN for multi-label scene classification in this paper. Unlike traditional training of CNN, we first introduce the training of convolution kernerls using unsupervised training methods, including tricks specific to small image data set with low dimensional images. Then we use the trained CNN to extract the features of the image, and use logistic regression as classifier, and use multiple evaluation metrics to evaluate the classification results.In this paper, we used unsupervised auto encoder to train the convolution kernels. Auto encoder is a three layer fully connected neural network, it can be trained without label, so the training method is unsupervised. The aim of encoder is to learn better representations of input data, in order to train a better classifier. To learn better feature representations, we preprocess the input data using ZCA whiten, so as to element the correlations between neighbor pixels. We force the network to have much more hidden units than the input units, and restrain the number of active hidden units during training, all to learn sparse representations of the input.After the convolution kernels have been trained, a logistic regression classifier is trained to classify the image. The architecture in this paper is composed of a convolution layer, a subsampling layer and a logistic regression layer. The subsampling layer is used to reduce the dimension of the features, which means it also reduces the complexity of the model. The convolution layer and the subsampling layer make the model to have some kinds of translation invariant. On this architecture, Relu activation function and dropout are also tried, and we have also analyzed the applicability of these tricks on shallow networks. Since each instance can have multiple labels, in this paper, we train a classifier for each label. When predicting, the result labels set is a combination of all these classifiers. To evaluate the experiment results, we first use precision, than we use some metrics suitable for multi-label classification, such as hamming loss, one error, coverage, rank loss, average precision. The experimental results show that the architecture proposed in this paper can get good results for multi-label scene classification.
Keywords/Search Tags:Multi-label classification, Scene classification, Convolution neural network
PDF Full Text Request
Related items