Font Size: a A A

Research On Context Feature Description Model In Scene Classification

Posted on:2020-02-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:J ShiFull Text:PDF
GTID:1368330611453186Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Nowadays,with the rapid development of the Internet,the network provides users with rich information resources,meanwhile,it also brings unprecedented difficulty to the sorting and classification of massive images.To this end,various image classification technologies have developed.Scene image classification is an important branch of the image classification research field.Scene classification is the process of converting various information that represents scene attributes into specific feature descriptions,so that it can automatically label via constructing appropriate feature expressions.Scene classification is widely applied in machine vision fields such as image analysis,sorting,video summarization,robot navigation,etc.Scene images are the carrier of multiple complex information,in which,there are complex implicit relationships among diversity objects,and between objects and the surrounding environment,which reflect many important semantic relations,i.e.,the so called context relationships.However,the relationships are often hidden,variable and difficult to describe.The image features that can be used for scene classification can be roughly divided into low-level visual features,middle-level semantic features,high-level semantic features,and deep learning features.How to obtain the context relationships that can fully express the scene semantics from these kinds of features is one of the key factors to improve the accuracy of scene classification.To this end,aiming at the difficult problems in the scene classification,some reasonable and effective context feature description models which closely surround the context relationships of scene images are constructed in this thesis.The accuracy of the scene classification algorithm is significantly improved.The main work and innovations of this thesis are as follows:(1)Establishment of time-frequency context feature description model of the scene.If only the spatial time domain features are utilized,complex visual dictionaries cannot be fully constructed,and problems such as high-level modeling cannot be effectively performed.As to the above problems,according to the advantages that time domain(space domain)describes the scene structure and different subbands of frequency domain describes different properties of scenes from overview to detail,a time-frequency context feature description model is established via fully considering the interaction of different object details in scene images.Using the context semantic symbiosis relationships between adjacent pixel points and image blocks,the frequency domain spatial context information of different scale and detail components of scene images is obtained by wavelet transform.The Multi-scale Texture Descriptor(MSTD)is generated by combining DLBP(Different Local Binary Pattern)feature to construct a time-frequency context feature description model.The model fully considers the detailed texture features and spatial scale information,enhances the discrimination of single-layer features,and further reveals the deep interconnected relationships between image contents.The structure of model is relatively simple,and it is robust to illumination and rotation.With this model,the classification accuracy of outdoor scenes is above 84%,which effectively improves the accuracy of scene classification.(2)Establishment of saliency context feature description model of the scene.There are usually a large number and variety of objects in scene images,the interrelationships among objects and between objects and the environment are complex,and the angles of images are different,which result in the problems that scene contents are difficult to identify and scenes of the same category vary greatly,and there is high similarity among scenes of different categories.Aiming at the above problems,a saliency context feature description model is established in this thesis,which is based on the fact that the saliency of the core information plays a key role in expressing contents of scene images.The multi-scale spatial context relationships of scene image contents can be fully reflected by preferentially detecting the context saliency information of images,and meanwhile,the shortcomings of splitting objects in the scenes and hardly combining are overcome.At the same time,the multi-scale and multi-direction context vision information is described by Gabor transform.Experimental results on the standard scene image datasets show that the proposed model can effectively overcome the influence of shooting angle and scale,and improve the ability to describe the relationships among objects in scenes.(3)Establishment of global and local context feature description model of the scene.For the problem of the complexity and diversity of the scene itself,a global and local context feature description model is established.The model fully considers the principle that the relationships between objects is described by using global features of scene images,and the object details are described by using the local features,the relative position changes,occlusion and background chaos are also fully considered.On the basis of the detection of context sensitive areas,the enhanced local features and global features are weighted and fused to generate multi-scale spatial-frequency fusion features,and a context feature description model is constructed.The model effectively avoids the increase of the algorithm complexity caused by accurately segmenting objects.At the same time,the visual sensitive area detection algorithm can obtain more image context information,so that the background area around the objects can also assist classification and discrimination.In addition,in view of the neglect of spatial information in the traditional Bag-of-Visual-Words model,the traditional visual words are improved into context visual words,which effectively reduce the polysemy.The comparison test results on the standard scene image datasets show that the proposed model can distinguish different scenes with similar objects,overcome the influence of occlusion and background chaos,and has strong applicability.(4)Establishment of context deep learning feature description model of the scene.Due to the complexity and diversity of scene contents,dominant features are often difficult to generalize.Although the features extracted by deep learning have better generalization characteristics,the depth features obtained by simply using data-driven training are often insufficiently convey the core contents of scenes.Therefore,in order to obtain the connotative generalization features of scene images,a data-driven and feature-driven deep learning network training mechanism is established in this thesis,combining the context relationships among objects and between objects and the environment in scenes.A context deep learning feature description model is constructed.The deep convolution neural network pre-trained on the large-scale scene image dataset Places is used to extract the multi-layer deep convolution features of scene images in different drive modes.The high-level context abstract semantic information of scene images is described.The model combines data-driven mode with feature-driven mode to effectively improve the accuracy of classification,especially for complex indoor scenes.The test results on the standard scene image datasets show that the classification results of the proposed model are superior to many state-of-the-art approaches.
Keywords/Search Tags:scene image classification, context relationship, multi-scale information, texture feature, visually sensitive information, feature description, convolution neural network
PDF Full Text Request
Related items