Research On Scene Classification Of Mid-Level Features

Posted on:2018-06-13

Degree:Master

Type:Thesis

Country:China

Candidate:L K Yu

Full Text:PDF

GTID:2428330542487830

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

Scene classification is a classical research topic in the field of computer vision.The research results can directly deal with the large-scale scene classification.This paper focuses on scene classification based on the mid-level features.Comparing with low-level features,mid-level features are more likely to rise to the semantic level and robust.Comparing with high-level features,mid-level features does not require labeling and can be carried out independently,unsupervised learning,saving manpower.The existing bag-of-words model based on the mid-level features is divided into five steps:image segmentation;image patch feature extraction;visual word dictionary learning;pooling;support vector machine classification.Aiming to tackle the problems existing in each part of the mid-level features in bag-of-words model,the specific research contents are as follows:First,Convolution Neural Network(CNN)depth features are adopted as the feature extraction method of the mid-level features.This paper introduces the pre-training CNN network as the feature extraction method,tested on a variety of size of the sliding window.Extracted on the appropriate size,then pooled with Spatial Pyramid Matching(SPM)and classified with Support Vector Machine(SVM),such a simple method is able to rise the classification accuracy of MIT indoor scene dataset to 75.86%,exceeding the classification accuracy of the algorithms in recent years.It is verified that the great potential of CNN depth features as mid-level features,and the CNN depth features is applied in the following chapters.Two,object proposal is adopted as image segmentation,and a new K-Means clustering method is proposed.Compared with the sliding window,the image patches segmented by object proposal are more semantically manful.Traditionally,object proposal is used to extract thousands of patches for object detection,whereas,this paper only chose those patches with the strongest response of object quality.In dictionary learning,the traditional K-Means clustering method cannot deal with large-scale scene classification.So according to the threshold to elect the representative clusters,and then based on the linear discriminative analysis distance to selected the discriminative clusters,the clusters filtered with the two steps are the words in the dictionary.The CNN features are extracted from the object proposal patches,and the final result is 76.49%.Three,Apriori pattern mining is adopted for dictionary learning and a new pooling method is proposed.The CNN depth features has the characteristics of concentrated response,which is convenient for the generation of transaction set,and it is able to seamlessly integrated with pattern mining for dictionary learning.Due to the randomness of pattern generated in the mining model,the patterns are merged ensemble merging with the idea of "detection instead of classification".During the pooling,the method of sequential max-N pooling is proposed for the first time,which makes full use of the sort of object attributes of object proposal,and the max pooling is carried out in each group.The enhancement effect is greater than SPM.In this chapter,the final result is 78.28%,which is the best result of the subject.Several improvements of each module in the traditional Bag-of-Words(BOW)model based on mid-level features are proposed,including an image segmentation method,patches segmented with object proposal;two dictionary learning methods for CNN depth features,K-Means clustering and Apriori pattern mining;a pooling method,sequential max-N pooling,which is designed for object proposal.The combination of object proposal and sequential max-N pooling is excellent and can be adapted to other dictionary learning methods.

Keywords/Search Tags:

BOW, Mid-level features, CNN deep features, K-Means, pattern mining, pooling

PDF Full Text Request

Related items

1	Extracting High-level Multimodal Features
2	Research And Application Of Crowd Counting In Static Images Based On Deep Convolution Features
3	Medical Image Retrieval Based On Low Level Features And Semantic Features
4	Deep Learning Based Speech Emotion Recognition By Fusing Acoustic Features And Transcriptions Clues
5	Low-level Features Based Image Quality Assessment Methods
6	Research On Methods Of Behavior Recognition Using Feature Fusion
7	Research On Multi-view Face Detection Method In The Wild
8	Salient Objects Detection In Natural Images
9	Study On Named Entity Recognition Based On Deep Learning
10	Research On Object Tracking Merged With Handcraft And Deep Features