| With the development of Virtual Reality(VR)and meta-universe technology,the research on panoramic vision has become a hot topic in academia and industry in recent years.The detection of salient information of panoramic vision is the basic task of the application of panoramic image and panoramic video in various fields,and its detection efficiency will directly affect the subsequent application of panoramic vision.In this thesis,the detection of panoramic visual salient information is divided into three sub-tasks,namely,the detection of panoramic image salient object,the detection of panoramic image salient object and the detection of panoramic video salient object.The research depth and difficulty of these three sub-tasks are increasing step by step,and they are interrelated.Around these three subtasks,this thesis designs several new detection modules of panoramic visual significance information based on deep learning,and proposes three new methods on this basis.This thesis proves the effectiveness and advancement of the proposed method through sufficient experiments and in-depth analysis.The specific research content of this thesis is as follows:(1)Aiming at the problems of low detection accuracy,slow model convergence speed and large amount of calculation in current panoramic image saliency detection methods,this thesis proposes a panoramic image saliency detection model based on robust visual transformation and multi-attention.The model uses spherical convolution to extract multi-scale features of panoramic images,including shallow detail information and deep semantic information,and reduces the distortion of panoramic images after equirectangular projection.The robust visual transformation module is used to extract the salient information contained in the feature maps of four scales,and the convolutional embedding is used to reduce the resolution of the feature maps and enhance the robustness of the model.The multi-attention module is used to selectively fuse multi-dimensional attention according to the relationship between spatial attention and channel attention,so as to improve the feature extraction ability of the middle layer.Finally,the multi-layer features are gradually fused to form the panoramic image saliency map.The latitude-weighted loss function enables the proposed model to converge faster.Experiments on two public datasets show that the performance of the proposed model is better than that of the other six advanced methods due to the use of the robust visual transformation module and the multi-attention module,which can further improve the accuracy of panoramic image saliency detection.(2)Aiming at the problems of high complexity,poor generalization,low detection accuracy and loss of detail information in current salient object detection methods of panoramic images,this thesis proposes an attention-based multi-branch and multi-level network.The network uses a multi-branch structure to complement the information between different scales and compensate for the loss of detail information.The backbone network extracts five levels of features,performs attention calculation on deep features from two perspectives of space and channel,and completes semantic information processing.The shallow features are extracted through multi-scale features,and then the feature mining operation is performed on the high and low confidence regions to further refine the salient features and complete the detailed information processing.The detail map supervised loss function supervises the salient object details of the panoramic image with a larger range of details to improve the detection effect.In this thesis,the existing data sets are integrated and noised to improve the challenge of the data set.Extensive experiments are carried out on two conventional datasets and two noisy datasets.The subjective and objective experimental results show that the proposed method outperforms 14 state-of-the-art methods and can further improve the accuracy of salient object detection in panoramic images.(3)At present,there are many challenges in the salient object detection of panoramic video.In this thesis,focusing on the three important factors of model detection accuracy,complexity and generalization performance,a hybrid stream network based on two modes is proposed for salient object detection of panoramic video.Spatial and optical streams were used to extract salient object features at the same time,and the two streams were fused to detect salient objects in panoramic video.Interlayer attention is used to improve the accuracy of salient object features of spatial flow by calculating the attention relationship between features at different levels.The features of space flow and time flow at each level were fused,and the interlayer weight of mixed flow was calculated to improve the fusion efficiency of significant target features at each level of mixed flow.Under the action of interlayer weight,the two-mode attention is calculated to improve the detection accuracy of the model.Full experiments were carried out on two open data sets.The results of subjective and objective experiments show that the comprehensive performance of the proposed method is superior to the seven advanced methods,and can further improve the efficiency of significant object detection in panoramic video. |