Font Size: a A A

Research On Indoor Scene Segmentation Algorithm Based On Fully Convolutional Neural Networks

Posted on:2020-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:L HuangFull Text:PDF
GTID:2428330596479257Subject:Microelectronics and Solid State Electronics
Abstract/Summary:PDF Full Text Request
Vision is one of the most important ways for human to get information,visual prosthesis implants the electrodes into the blind body to stimulate the optic nerve,so the blind can get hallucination.Therefore,the object that the blind can feel is only the general feature such as low resolution and poor linearity.Sometimes the blind can hardly distinguish the object represented by the optical illusion.Before the electrodes being stimulated,it is necessary to use image segmentation technique to display the general position and outline of the object to help the blind people to recognize,which makes significance for them to recognize every familiar object clearly.The image fast segmentation of the convolution neural network is proposed to segment the indoor scene of visual prosthesis in terms of its application features.According to the demand of visual prosthesis for realtime image processing,the FFCN network structure proposed in this paper is improved on the AlexNet classification network structure.The network reduces the error rate of top to 16.4%in the ImageNet dataset,compared with 26.2%of the second,its error rate has been greatly improved.The Alexnet uses the convolution layer to extract the deep feature information,adds the structure of the overlapping pool layer to reduce the parameters needed to be learned and defines the Relu activation function to solve the gradient difffusion of the Sigmod function in the deeper network.Compared with other networks,it has the characteristics such as lightweight,fast training speed and so on.Firstly,the FFCN(Fast Fully Convolutional Networks)network for image segmentation in the indoor scene is constructed,which is composed of five convolution layers and one deconvolution layer.The method of interlayer fusion based on Add technology is superior to Concat technology in reducing the amount of network computing parameters.The loss that the continuous convolution effects on the picture feature information is avoided by the scale fusion.In order to verify the effectiveness of the network,the data set of basic items that the blind can touch in an indoor environment is created.It is divided into categories,including 664 items,such as beds,seats,lamps,televisions,cupboards,cups,and people(Hereinafter referred to as XAUT data set).The type of each item is marked by grayscale in the avoided by the scale fusion.In order to verify the effectiveness of the network,the data set of basic items that the blind can touch in an indoor environment is created.It is divided into categories,including 664 items,such as beds,seats,lamps,televisions,cupboards,cups,and people(Hereinafter referred to as XAUT data set).The type of each item is marked by grayscale in the original image,then added a color table to map the gray image into pseudocolor map as the semantic label.The XAUT data set is used to train the FFCN network under the Caffe framework and the image features are extracted by using the deep learning feature and the scale fusion of the convolution neural network to obtain the segmentation model in the indoor scene for adapting to the visual prosthesis for the blindTo compare with the validity of the model,at the same time,the fine adjustment of traditional models,including FCN16s,FCN32s,FCN8s and FCN8s at-once,the data set is used to train to get the corresponding segmentation model in the indoor scene for adapting to the visual prosthesis for the blind.In the Ubuntu16.04 version of the Amax Sever environment,a comparative experiment is conducted.The training time of the model takes 13 hours,and a training model is saved every 4000 iterations,and the tests are tested at 4000,12000,36000 and 80000 iterations.The pixel recognition accuracy of all kinds of networks has reached more than 85%and its Mean IU is above 60%.The Mean IU of Fcn8s at-once network is the highest,amounting to 70.4%,but its segmentation speed was one-fifth of that of the FFCN.On the premise that there is little difference in other indicators,the average segmentation speed of FFCN fast convolution neural network reaches 40fpsThe FFCN convolution neural network can effectively use multilayer convolution to extract picture information and avoid the influence of the underlying information such as brightness,color and texture.It can avoid the loss of image feature information in the network convolution and pool through the scale fusion technology.Compared with other FCN networks,the FFCN has faster speed and can improve the realtime performance of image preprocessing.
Keywords/Search Tags:indoor environment, visual prosthesis, semantic segmentation, convolution neural network, deep learning
PDF Full Text Request
Related items