Font Size: a A A

Scene Content Parsing For Visual Image Data

Posted on:2019-11-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z Y JiangFull Text:PDF
GTID:1368330596456536Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the development of sensing technology and the needs of human social life,visual sensors have been widely utilized in many fields,such as automatic factory,traffic surveillance,security and so on.However,due to the limitations of human resources,a large number of image data produced by visual sensing devices cannot be timely and accurately processed,which seriously restricts its applications in the real scene.Therefore,how to effectively analyze scene contents from the visual data obtained by visual sensors is a hot topic in the field of computer vision.Based on machine learning and pattern recognition theories,the task of scene content parsing for visual image data is inferring the location of semantic targets,such as road,pedestrians and vehicles,in images or video sequences obtained from the visual sensing device.The related theories and methods of this area are of great value in academic research and industrial applications.In the academic field,scene analysis can provide task oriented guidance for the low-level perception theory,and also provide semantic parsing results for high-level tasks,such as behavior analysis and event detection,and the problem of semantic gap can also be indirectly avoided.In the industrial field,scene content analysis can provide algorithm support for car assisted driving,traffic surveillance.In recent years,with the development of computer vision and machine learning,scene content analysis has achieved great progress.However,some problems still exist: 1)the problem of efficient representation of visual image data;2)accurate modeling of semantic object relations;3)the problem of robust decision making for scene content analysis.In view of the above problems,this dissertation studies the theory and method of scene content analysis under visual image data from four aspects.The main research contents and contributions are listed as follows:1)Local scene content parsing based on online edge structure learning.For the fact that the appearance of semantic object can change a lot as the camera moving and previous works do not focus on updating the model.This dissertation uses the online model learning framework to make the model adapt to the change of the scene.For the traditional classification task,the structure information is less used to model the sample relationship,and the structure classification model is used in this dissertation.The training samples can play important role for the performance of classification model.In this dissertation,we choose difficult samples to update discriminant model,which effectively improves the discriminant performance of the model.2)Local scene content parsing based on regional multilevel probabilistic analysis.In view of the noise influence of the original data and the manifold structure distribution of the data,the Laplacian sparse subspace method is introduced to compute the mid-level feature.And the multi-scale processing strategy is introduced to optimize the semantic object region with the feature of the scale information.On this basis,the multi-layer features,including the low-level feature and the mid-level feature,are modeled in the Bayesian framework in this dissertation,which can effectively improve the analytic ability of the model for complex scenes.3)Global scene content parsing based on context analysis and hard-sample enhancements.Aiming at the problem of modeling context relations between semantic objects,this dissertation builds a pyramid multi-level context model based on the conditional random fields for its ability of modelling the local contexts.At the same time,structural analysis methods are used to fuse the analytic results of different scales.Different sizes of semantic objects lead to hard sample learning problems caused by unbalanced training samples.This dissertation weights the loss function in the model training process according to the sample characteristics,which effectively alleviates the problem of low accuracy of hard samples detection.4)Global scene content parsing based on contour and adaptive network structure.There are great differences in the attributes of semantic objects.Performing the same inferring procedures on different semantic objects will lead to the problem of model degradation.A convolutional neural network model based on adaptive depth structure is proposed to ease the problem of large precision differences between different semantic objects.And it has effectively improved the accuracy of scene content parsing.For the ambiguity in the edge region of semantic objects in scene parsing tasks,a convolutional neural network based on contour perception has been proposed,which has effectively improved the parsing accuracy of edge regions.
Keywords/Search Tags:Visual Information, Scene Parsing, Road Detection, Semantic Segmentation, Deep Learning
PDF Full Text Request
Related items