Font Size: a A A

Research Of Scene Classification Based On Convolutional Neural Networks

Posted on:2019-05-28Degree:MasterType:Thesis
Country:ChinaCandidate:C R ZhanFull Text:PDF
GTID:2428330566998593Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Different from object classification task which only focus on the local part in an image,scene classification is a classification method based on the overall image information.Scene is the generalization of the comprehensive information such as objects and backgrounds in an image.Scene classification is widely used in the fields of intelligent monitoring and image retrieval,and also can provide auxiliary knowledge in the fields of action recognition and so on.Most scenes are composed of complex object distribution and the differences are small between some scene categories which makes the scene classification task becomes one of the difficulties in the field of image classification.Traditional feature encoding method and CNN-based methods are two main approaches for scene classification.Traditional methods,such as SIFT are generally used to extract local features,and then global features are obtained by encoding the local ones.However,these methods can not extract good features that can represent the local area for a given image,and the encoded global features are not distinguishable enough between complex scenes.So,they can only be used for the cases with fewer categories.CNN has stronger feature extraction ability and better classification results.It can distinguish complex scenes,and suit for large-scale scene classification problems.However,when using CNN to extract features or classify the whole image,only the global categories representing scene categories are used which ignore the local information in the scene.This makes it could not achieve good classification results.In order to solve these difficulties,we apply and optimize the scene classification model by the fusion of CNN and traditional feature encoding method.Local semantic features which are obtained after local random sampled patches on image are classified by the object classification network.Then the features are encoded by Fisher Vector.However,when constructing the scene classification model based on neural network to extract local features,the local random sampling method may loose important information and cause deviation.Using object classification network as the feature extractor makes the extracted features lack of the ability for background modeling.The corresponding semantics codebook can be redundancy and noisy during classification.In order to solve these difficulties,we first design a uniform local sampling method.It is more reasonable to collect the local information from the scene image compared with the random sampling method.In order to improve the background modeling ability of the model,we design a local feature extraction CNN and trained on the public scene data set.By coorperating with the object classification network,both foreground and the background local features are extracted which are conbined with the corresponding high-level semantics to build the semantic features.An algorithm for screening semantic categories to reconstruct the semantic codebooks is proposed.It can reduce feature redundancy and increase the distance between the categories of the encoded features.We also combine the local and global features.The proposed model improved the accuracy on MIT Indoor and SUN data sets by 2.8% and 3.6% reaching accuracy of 83.80% and 70.10% respectively,with the accuracy of the state-of-the-art.
Keywords/Search Tags:scene classification, cnn, feature encoding, fisher vector
PDF Full Text Request
Related items