| Remote driving uses broadband wireless network to separate the driver from the vehicle in time and space,and it has broad application potentials in various scenarios which requiring human-vehicle separation,such as automatic driving assistance,emergency rescue,transportation of dangerous goods and operation in harsh environments.However,due to that remote driving operation depends on video signal transmission through communiation network,on one hand,network video is limited by collecting device parameters such as vision,angle,depth of field.On the other hand,image and video on the pilot present different visual stimulation,which reduce drivers’ ability to identify safety risks in remote driving,and bring challenges to remote safe driving.In order to solve the above problems,this work develops targeted generic technology research from two aspects of dangerous scene aided detection and driver attention detection,which effectively improve the safety of remote driving.In terms of the auxiliary discovery of dangerous scenes,this paper focuses on the visual saliency detection and multi-scale image saliency detection in the mobile environment of computer vision,and proposes the following innovations:(1)In order to model the influence of viewing factors on visual attention in mobile environments,cell phone sensor readings of acceleration,viewing ratio,external illumination in four different environments were collected.The distribution of eye gaze points on the cell phone screen under these four different conditions was also gathered.Then,we use these collected data to build and train a multimodal deep neural network to realize a visual saliency detection method that reflects the influence of environmental factors on visual attention through the learning and interaction of environmental factors features and visual stimulus features.(2)To solve the generalization problem of feature learning caused by inconsistent probabilities of occurrence of regions of interest of different sizes,a cross-scale inference method is proposed.Different saliency models are trained on image samples of various scales,and these models imply inference on images of the same scale as the training samples.In addition,the inference is performed on images of small scales with models trained on large scale samples,and inference is used on images of large scales with models trained on small scales.The results of various inferences are then fused to achieve the detection of regions of interest with different probabilities of occlusion.In terms of driver attention detection,this paper focuses on the attentionbased line of sight detection and head pose detection technology,and uses the head pose detection results into line of sight detection to improve the robustness of line of sight detection.The innovation is as follows:(1)In order to solve the problem that the deep learning model of gaze detection has insufficient learning ability for diversity features,a method for computing second-order autocorrelation information in deep learning features is proposed.A new bilinear pooling attention mechanism is also proposed based on it.The visualized feature maps show that the bilinear pooling-based attention mechanism helps to extract attentional and diversity features related to gaze detection in face images.(2)To improve the ability of the head pose detection model for pose feature learning,the bilinear pooling-based attention mechanism is used for the modal interaction between RGB image and depth image.This is based on the observation that head pose is affected by the asymmetry of head appearance.The features in the "cross" neighborhood of each point is used as the deep learning features to achieve the perception of asymmetry of head appearance.The visualized feature maps show that the attention mechanism based on asymmetry perception with bilinear pooling helps to extract attentional and diverse features related to head pose. |