| Currently,autonomous driving technology is one of the widely studied topics.As an essential constituent of the environmental perception system of autonomous driving vehicles,the performance of visual object detection and tracking methods influence the stability and safety of vehicles to a large extent.Compared with Li DAR based methods,camera based methods have the advantages of lower hardware expense and stronger anti-interference ability.Compared with binocular vision based methods,monocular vision based methods can further cut off the hardware expense and save computational resources.Furthermore,on occasions when malfunctions occur to one of the cameras in binocular vision based systems,the camera left need to work on its own to guarantee the environmental perceptual ability of autonomous driving vehicles.Therefore,the research on multi-dimensional object detection and tracking based on monocular vision in autonomous driving is of much significance and practical value.In this dissertation,the problems in existing methods are analyzed and discussed in depth,and solutions and novelties are correspondingly proposed.Given a series of input images,object detection aims at obtaining desired geometric properties(categories,positions,dimensions,directions,etc.)of certain objects.Compared with 2D object detection,3D object detection can further acquire the 3D spatial properties of objects,which meets the requirement of autonomous driving.Thus,this dissertation takes 3D object detection as the research content.Object tracking aims at predicting and associating a single or multiple objects across consecutive frames.In the visual perception systems of autonomous driving vehicles,object detection and tracking can be viewed as the former and latter constituents of a common entity.From single object to multiple objects and from 2D to 3D,the dissertation conducts a progressive research from the aspects of single object tracking,2D multi-object tracking,3D object detection and 3D multi-object tracking.Despite of the numerous methods proposed by previous researchers in this field,existing methods are still in the early stage of development,such that many problems are still under-explored.In specific,current methods for generic single object tracking mostly resort to the mere utilization of convolutional features extracted by neural networks,while neglecting the importance of low-level visual features.As a result,trackers are liable to be confused by distractors of the same category and background distracting regions,leading to undesired drifting phenomena.Previous works on 2D multi-object tracking focus on studying the individual motion patterns of tracklets without delving into the internal relationship among different tracklets,thereby the problem of improper inactivation moments occurs.In terms of monocular 3D object detection,there is a lack of dedicated design of neural networks on account of geometric information of different types.Especially,the network design has not been sufficiently investigated to cope with the ill-posedness of depth prediction in monocular settings,resulting in a leeway for the enhancement of detection performance.As for monocular 3D multi-object tracking,the limitations of current methods include the absence of data association strategies that are specially tailored for the characteristics of monocular vision,and the absence of effective evaluative criteria to judge the performance of monocular 3D multi-object tracking methods.In regard of the issues analyzed above,the research contents of this dissertation are summarized as follows:Concerning the limitations of existing methods,a dual channel single object tracking method based on the integration of color and convolutional features is proposed.On the one hand,a semantic response map is obtained by using the convolutional features extracted by a siamese neural network.On the other hand,the foreground probability is calculated by using the Bayes Rule and the color histogram features extracted from both foreground an search regions.To integrate the deduced information from different channels,an effective region cropping strategy is proposed.To tackle the ineffectiveness of color tracking channel caused by the drastic lightness difference between different frames,a strategy for the integration of dual channel with a circuit breaker is proposed.To prevent the tracking results from being disturbed by noise points with high response values in the integrated response map,a post-processing denoising method based on Gaussian kernel smoothing is proposed.Concerning the limitations of existing methods,a 2D multi-object tracking method based on multi-tracklet joint hibernation mechanism is propsed.A fully-connected discrete conditional random field(CRF)is leveraged to model the mutual relationship between different tracklets,with the aim of yielding a reasonable judgment for the moment of tracklet inactivation.As for the issue of varying tracklet numbers across different frames,the strategies of confidence score filtering and dummy nodes complementation are proposed to fix the nodes number of CRF nodes.Dedicated unary and binary feature functions are designed by taking into account the characteristics of 2D multi-object tracking applications,such that the change of tracking information under different circumstances can be effectively handled.Concerning the limitations of existing methods,a monocular 3D object detection method based on sequential feature association and depth hint is proposed.On the one hand,the predicted information is divided into different groups according to the difficulty of prediction.Then a convolutional Gated Recurrent Unit is employed to sequentially associate the features in different groups in an easy-to-hard order,thereby the geometric information that is hard to predict can be hinted by easier one.On the other hand,to ease the difficulty of depth prediction in monocular settings,a depth hint module is designed to learn the depth distribution of objects’ centroids with respect to different 2D heights.The learned depth hint vector is then used to augment the input feature of the regression head for depth prediction.Concerning the limitations of existing methods,a monocular 3D object tracking method and evaluation method based on depth-aware association is proposed.To cancel out the ego motion of autonomous driving vehicles,a global coordinate matching strategy is proposed.To ease the difficulty of hyperparameter tuning and to enable long-term tracking,the hibernation state is introduced into the tracklet management scheme.To address the issue of frequent identity switch of remote objects due to the inaccurate monocular depth estimation,a depth-aware data association strategy is proposed.Furthermore,the deficiency of current evaluative criteria is analyzed and accordingly a refined criterion is proposed.In respect of the above research contents,a series of qualitative and quantitative experiments are conducted to validate the novelties and research findings of this dissertation,and experimental results are deeply discussed and analyzed. |