Font Size: a A A

Research On Utilization Of Multi-level Semantic Features In Mono-SLAM

Posted on:2023-03-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Q BaoFull Text:PDF
GTID:1528306809496224Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
Features of different depths in the CNN have different levels of semantic information and other characteristics.The first layers mainly focus on low-level features,such as angle,color,maxima,and edge,while the final layers mainly focus on high-level semantic information,such as target regions,contours,and categories.This thesis focuses on the monocular SLAM system based on multi-level semantic features,which integrate multi-level semantic features into submodules of SLAM,aiming to fully leverage the characteristics of multi-level semantic features to improve the localization accuracy,robustness,and ability of scene understanding for the monocular SLAM system.The research contents of this thesis are summarized as follows:(1)Monocular localization and dense semantic map construction method based on seman-tic planes.The road scenario in urban environments has strong structural characteristics,for example,buildings are distributed along both sides of the road,the flat road and the surface of most buildings can be approximated as planes.Normally,the road plane and the building plane are perpendicular to each other and they generally form the visible boundary of the cam-era.Furthermore,like other artificial environments,the surface of urban environments can be approximated as numerous small planes.With the help of semantic labels,this thesis integrates the above-mentioned structural regularities into LDSO and proposes a monocular localization and dense semantic map construction method based on semantic planes.The experimental re-sults on the KITTI odometry dataset demonstrate that the proposed method can improve the localization accuracy and the ability of scene understanding while maintaining the robustness of the sparse direct method compared to the feature method.(2)A visual odometry based on the direct alignment of semantic probabilities.There is an inherent problem with direct methods—the non-convexity of the grayscale image.Direct methods rely on gradient search to reduce the photometric error in the optimization to esti-mate camera poses and the depth information of point cloud in the map.In gradient search,the grayscale value of pixels needs to be used.However,the grayscale image is a strong non-convex function in most cases and the convexity of the grayscale image can only hold in a small region.The direct method works stably only when the relative motion of successive frames is small,as the non-convexity of the grayscale image makes the photometric error may be stuck at the sub-optimal local minima in the optimization.In contrast,semantic probabilities ignore the details in the image,mainly reforming on the boundary between objects of different semantic categories.Thus,semantic probabilities can maintain the convexity in a larger region than the image grayscale.On account of the better convexity of semantic probabilities,this thesis inte-grates the direct alignment method of semantic probabilities into LDSO and proposes a visual odometry method based on the direct alignment of semantic probabilities.The experimental re-sults on the KITTI odometry dataset illustrate that the proposed method significantly improves the localization accuracy compared to LDSO,and the proposed method can achieve higher or similar localization accuracy in all sequences compared to ORB-SLAM2.(3)A loop detection method based on concatenated features of the semantic gradients.Traditional loop detection methods struggle in the adaption of the severe environment varia-tions and viewpoint variations for long-term tasks,while the loop detection method based on deep features can overcome severe environment variations.However,most deep feature based methods take the whole layer of the neural network as the image descriptor,which loses the ro-bustness to viewpoint variations.In addition,most of these works are based on features of final layers in the CNN,which have lower feature dimensions in the image height and width direc-tions.It can not provide enough resources for local feature extraction and spatial verification,which can enhance the robustness to viewpoint variations.The loop detection problem under both severe environment variations and severe viewpoint variations remains unsolved.On the strength of the repeatability of semantic gradients,the robustness of local features to viewpoint variations,the characteristics of features in different depths of CNN,and the structural regulari-ties of urban environments,this thesis proposes a loop detection method based on concatenated features of the semantic gradients.Experimental results on the Oxford Robotcar dataset and the Synthia dataset show that the method achieves the overall best closed-loop detection accuracy compared to current state-of-the-art methods(LoST-X,Net VLAD,Patch Net VLAD),especially in the combination of strong environmental variations and 180-degree viewpoint variations on the Oxford Robotcar dataset.Moreover,the proposed method also achieves better storage effi-ciency and computational efficiency.
Keywords/Search Tags:SLAM, Semantic Segmentation, Loop Detection, Visual Place Recognition, Scene Understanding
PDF Full Text Request
Related items