Font Size: a A A

Scene Understanding Based On Convolutional Neural Network

Posted on:2018-11-24Degree:MasterType:Thesis
Country:ChinaCandidate:F YangFull Text:PDF
GTID:2348330563452207Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the development of science and technology,and the improvement of human intelligent living standards,more and more intelligent service robots have been applied to people's daily life.To serve people,robots must first have the competence of environment cognition and understanding.Vision is the main sensor for robots to obtain external information.Therefore,vision-based scene understanding has an important effect on intelligent competence of robots.In Recent years,scene understanding has attracted more and more attention,and many impressive visual systems have been proposed.However,most of them are based on hand-crafted features,which cannot express hidden information and lead to lacking semantic meaning.Consequently,it is difficult to precisely perform other intelligent tasks for robots.With the development of deep learning,especially the great success of Convolution Neural Network(CNN)in image recognition,CNN is widely applied in many other tasks of computer vision.CNN imitates human visual mechanism and has a more comprehensive representation for images,which is suitable for practical application.However,there are also limits to scene understanding for CNN,for example,global activation of CNN is one dimensional feature vector containing high-level semantic information but ignores mid-level information and the details of the objects.In addition,global activation of CNN reduces the invariance to geometric transformations due to characteristic of pooling.In this thesis,we investigate CNN based scene understanding from two aspects: scene recognition and scene parsing.The main contributions of this thesis are as follows:(1)A novel scene recognition method based on CNN combined with mid-level semantic parts is proposed.First,mid-level local semantic parts are discovered,and their mid-level features are also extracted.Then,the mid-level features and the CNN features are combined under the CNN framework.Finally,the SVM is applied to classify scenes.Experimental results over MIT 67 indoor and UIUC 8-sports dataset show that our approach achieves better performance than other counterparts.However,we also find that the proposed method can identify simple scene images well,but is incompetent in complex situation.(2)To increase the invariance of the geometric transformation to complex scene images and obtain comprehensive image representation,a scene recognition method based on multi-channel and multi-scale orderless pooling using CNN is proposed.First,the CNN is used to extract convolutional features and fully-connected features of multi-scale patches.Then,the convolutional features and fully-connected features are aggregated to represent the whole image.Finally,a linear SVM is used to categorize scenes.Experimental results on SUN397 and MIT 67 indoor datasets demonstrate that the proposed approach achieves better results than other counterparts in term of accuracy.In addition,we also combined the ImageNet CNN(object-centric network)and the Places CNN network(scene-centric network)to improve the performance.Experimental results have proved the effectiveness of our method.(3)A scene labeling method based on encoder-decoder pyramid pooling structure combining superpixels is proposed to cope with the problems of ambiguity of objects' edge in the image and the uncertainty of the small objects' segmentation.First,the encoding-decoding network is employed to extract features.Then,the extracted features are pooled to integrate the global and local information,and a two-layer neural network classifier is trained,to enhance the segmentation for the small objects and increase the spatial context information.Next,an image is segmented into superpixels to make the outline of objects clear.Finally,each pixel of the image is classified using the two-layer neural network.Experimental results over three datasets show that our method can get better performance of segmentation compared with the traditional encoder-decoder structure.(4)A demo system of software for scene understanding is designed.To valuate the efficiency of the proposed methods,a variety of datasets and the sequences collected by a camera mounted on the robot are used for experiments.The demo system mainly includes two modules: scene recognition module and scene understanding module.And each module contains training,testing,and visualization parts.The developed system satisfies with the function of scene understanding visualization.
Keywords/Search Tags:scene understanding, convolution neural network, mid-level semantic part, multi-scale orderless pooling, pyramid pooling
PDF Full Text Request
Related items