Font Size: a A A

Research On Scene Semantic Understanding And Its Key Technologies In Complex Environment

Posted on:2021-03-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:D WangFull Text:PDF
GTID:1488306464981949Subject:Mechanical engineering
Abstract/Summary:PDF Full Text Request
Scene semantic understanding is an important tool for computers to perceive the real world and it is the basis and key to solve high-level vision tasks.Therefore,it has great importance of the study in the fields of automatic driving,intelligent robot,augmented reality,intelligent traffic,remote sensing and mapping,etc.However,as the real scene contains a large number of complex objects and is also affected by factors such as occlusion and illumination,the performance of the semantic understanding method based on 2D images is affected,and the accuracy is not high enough.With the development of sensor technology,lidar has become an effective tool to obtain 3D data and has been widely used.Therefore,the analysis method based on 3D data becomes the key point to understand the real scene and has important research significance.This paper is dedicated to the study of the key technologies of semantic understanding of the complex scene,including object recognition and semantic segmentation.By combining 2D image color and texture information with 3D geometric information,the 2D image semantic segmentation,feature extraction based on point cloud data and multimodal data fusion are studied,and semantic understanding information is applied to object pose estimation.The main contributions of this paper are as follows:Aiming at the problem that existing interactive segmentation algorithms are sensitive to the number and position of initial seed points,an interactive segmentation algorithm based on multi-layer non-parametric model is proposed from the perspective of efficient use of image context information.Firstly,a multi-layer non-parametric model is established to solve the data term of the energy function.Secondly,the label consistency constraint between the pixel and its corresponding region is added for smoothing term estimation,which can be regarded as the higher-order potential energy of the pixel.To achieve better balance between performance and efficiency,it reduces the interaction between the layers and proposes to compute each layer independently.Finally,the advantages of the proposed method in accuracy and efficiency are verified by experiments.In order to improve the accuracy and robustness of semantic understanding of images and solve the problems of weak semantics and large target scale variation in images,an image segmentation network based on multi-path connection is proposed.This method adopts a pyramid structure in the initial stage of the encoder,so that the model may well preserve the spatial information.Because each path in the encoder contains a different level of features,lowlevel features with rich spatial information can be used as a guide to refine high-level features.At the same time,a multi-scale feature extraction module is proposed to deal with the scale variation problem.Experimental results show that the proposed method has strong feature learning ability which can capture target objects of different scales.Thus,we can obtain satisfactory results without post-processing.To solve the disorder and irregularity of 3D point clouds,a point cloud recognition and segmentation method based on spatial point correlation network is proposed.In this method,a new spatial-related correlation path is designed,which considers both spatial information and point correlations,to preserve high dimensional features,thereby capturing fine-detail information of the point cloud.It does not need to search the center point and its neighborhood points,thus reducing the complexity of the algorithm.Then,a simple and efficient network based on the path is constructed to combine the point feature,fine detail feature and global feature,so as to better describe the features of different levels.The experimental results show that the proposed method is effective and has a better ability of feature expressions.In order to further improve the accuracy of 3D scene understanding and reduce the influence of object occlusion,truncation and other factors,a semantic understanding network based on RGB information and point cloud information fusion is proposed.Firstly,a novel lightweight feature refinement neural network is proposed for feature extraction of 2D images.Secondly,a 3D scene semantic segmentation framework based on 2D image features,geometric structure and global context information is proposed.It adopts a heterogeneous network to combine image and point cloud information effectively,thus solving the problem that the result of using single point cloud data is not fine enough.Experimental results show that the proposed method is effective in understanding complex scenes.An efficient 6-Dof pose estimation method based on multimodal feature fusion is proposed and it applies semantic understanding information to the 6-Dof pose estimation of the target object.The semantic information is applied into the pose estimation network and the object's 6-Dof pose is estimated from the whole to the part.In order to take full advantage of the depth image and RGB image,a fusion method of appearance features and geometric features based on the attention mechanism is proposed.It adopts the adaptive fusion method based on spatial attention mechanism to improve the expression ability of features.Finally,the effectiveness and robustness of the proposed method are verified by experiments.
Keywords/Search Tags:Scence understanding, image segmentation, object recognition, deep learning, multimodal feature fusion
PDF Full Text Request
Related items