Font Size: a A A

Semantic-based Scene Understanding

Posted on:2020-01-11Degree:MasterType:Thesis
Country:ChinaCandidate:J Q ZhangFull Text:PDF
GTID:2518306215954669Subject:Mechanical and electrical engineering
Abstract/Summary:PDF Full Text Request
With the popularity of low-cost,compact vision sensor devices,there is an increasing demand for computers to understand surrounding scenes.For example,in many fields of people's livelihood such as robot navigation,automatic driving,security,medical care,etc.,scene understanding shows important research significance and research value.One of the most important research directions for scene understanding is image semantic segmentation.Image semantic segmentation not only combines image recognition information with image detection information,but also bases on other high-level semantic scene information.At present,the use of deep learning for semantic segmentation may lead to inaccurate segmentation,discontinuity,etc.In order to solve this problem,this paper studies the semantic segmentation model based on generation adversarial networks.Generation adversarial networks(GANs)is a relatively important network model structure proposed in recent years.The generation adversarial network model consists of a generation network and a discriminant network.The two networks simultaneously perform game training,so that the generation network can generate better results.Based on this,this paper proposes a generation adversarial network for semantic segmentation model.The model consists of a deep convolutional semantic segmentation model as a generation network.It consists of five modules.Each module consists of a 3x3 convolutional layer,a pooling layer,and an activation function.The RGB image is used as an input and output a semantic segmentation probability map.The label map and the generated probability will random input to the discrimination network.The discriminant network consists of a deep convolutional neural network,learns the feature difference between the generated semantic segmentation probability map and the label map,and guides the optimization generation network to generate a semantic segmentation probability map that is closer to the label map.By modifying the parameters such as the loss function and the learning rate,the network reaches the optimal solution.The network proposed in this paper maintains the end-to-end training of traditional neural networks and reduces the dependence on the artificially designed loss function.Scene understanding requires not only the analysis of an image,but also the understanding and recognition of the content of the video.The human body is the main research object in a video,at present,the human action recognition methods are based on the spatio-temporal information of the video without considering the scene in which the action is located,and thus action judgments that do not conform to the scene may occur.Therefore,this paper proposes a dual-stream network identification structure based on scene understanding.For common movements,this paper divides it into 11 kinds of scenes such as soccer field,basketball court,indoor,gym,lake,outdoor,etc.,using convolutional neural network to extract scene information,and adds this as auxiliary information to the human motion recognition network structure to improve the recognition network accuracy.This paper first trains the network of scene recognition,and uses the trained parameters as the initial parameters to train with the human motion recognition network.Determine the optimal identification structure of the network by analyzing the different proportions of the scene recognition network and the human motion recognition network.This paper uses Pascal VOC 2012 and City Scapes and other large data sets to train and test the semantic segmentation generate adversarial network model,and uses data augmentation methods such as random scaling and cropping.The results show that the proposed method can effectively improve image inaccurate segmentation problem and other issues,compared to the typical semantic segmentation models such as FCN and Deep Lab,the Mean IOU increased by 6.7% and 3.6%,respectively.For the human motion recognition network model,this paper uses UCF50,UCF101 and other public action data sets to train each frame in the video,and selects some motion recognition methods such as C3 D and Two Stream for comparison.The accuracy rate is improved by 5% and 3% respectively.This paper also studies the addition of the same scene information to some typical identification network structures.The experimental results are better than the original network,which proves that the proposed method can effectively improve the recognition accuracy.
Keywords/Search Tags:generating adversarial semantic segmentation model, scene understanding, dual stream network structure, video analysis
PDF Full Text Request
Related items