Font Size: a A A

Visual Semantic Representation For Visual Segmentation

Posted on:2020-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:X ChenFull Text:PDF
GTID:2518306518463124Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the field of computer vision,semantic segmentation is a very basic and important task.With the continuous development of computer hardware performance,especially GPUs for numerical computing technology,deep learning has continued to evolve.The emergence of full convolutional neural networks has enabled the rapid development of semantic segmentation methods that based on deep learning,while the segmentation method applied on video data has received less attention.Compared with image data,video data has one more dimension of time series information.How to use this information more effectively remains a challenge to researchers.Furthermore,for intelligent decision systems,the ability to predict the future is of great useful.For example,the intelligent driving system and the robotics need to perceive and make decisions according to the current situation.Semantic segmentation plays an important role.The research for semantic segmentation prediction is still in its infancy.How to better model the spatiotemporal relationship need a lot of research.In terms of semantic segmentation problem,this paper aims at improving the performance of semantic segmentation of the current frame,and attempts to model the temporal and spatial features between adjacent frames to obtain better segmentation features.First,this paper proposes a timing modeling module.The timing and spatial characteristics are modeled through designed masks and gated activation operations.Second,in the process of decoding and generating segmentation maps,historical information is fused by setting a feature fusion scheme that decays in time to obtain a better feature representation during segmentation.To solve the two problems in the semantic segmentation prediction task,namely the lack of small objects and the prediction offset of moving objects,different solutions are proposed respectively.First,for the problem of small objects missing in prediction,this paper proposes an inter-frame attention mechanism,which uses the context information of the previous frame to supplement the semantic details of the next frame to improve the semantic expression of non-saliency objects.Secondly,for the migration phenomenon of moving objects in the prediction process,this paper proposes to incorporate deformable convolutions in convolutional long short term memory networks to enhance the ability to model position changes that are lacking in standard convolutions.
Keywords/Search Tags:Deep Learning, Computer Vision, Video Semantic Segmentation, Feature Representation
PDF Full Text Request
Related items