| Stereoscopic image can provide viewers with rich depth information and enable them to have an immersive visual experience.However,due to limitations in existing stereoscopic display technology,viewers often experience visual discomfort symptoms when viewing stereoscopic image,such as eye fatigue,physical discomfort,difficulty concentrating,nausea,and vomiting.This negative viewing experience not only seriously affects people’s visual health but also impedes the promotion and widespread adoption of stereo multimedia technology.To ensure the viewer’s visual health,it is important to assess how they perceive stereoscopic image.However,subjective stereoscopic image evaluation can be challenging due to the need for specialized equipment and controlled experimental conditions,as well as the time and effort involved.Therefore,it is important to develop an objective prediction model to accurately measure stereoscopic image discomfort.The existing objective stereoscopic image discomfort prediction methods mainly rely on manual feature design prediction models.However,the human binocular vision mechanism is very complex,making it difficult to continue to obtain effective stereoscopic image discomfort features through manual feature extraction methods.Currently,deep learning is extensively used in video and image processing tasks,and it has shown significant advancements.However,few researchers have proposed a stereoscopic image discomfort prediction model based on deep learning.Given the above research background,this thesis focuses on deep learning-based model for predicting stereoscopic image discomfort.In general,the main contents of this work are as follows:(1)The proposed stereoscopic image discomfort prediction model is an end-to-end multi-level interactive network(MLI-Net),based on binocular and difference information,as well as attention mechanism.The first step involves extracting low,middle,and high-level add-subtract feature maps from the left and right view images.These feature maps are concatenated to simulate the interaction mechanism of the human visual system and are refined using a channel spatial attention.The resulting attention-weighted feature maps are then fed into corresponding subnetworks.The MLI-Net model does not require a disparity map and only uses left and right view images as input.This approach avoids the influence of disparity calculation errors on the model’s prediction performance and reduces the model’s time complexity.The model is fully end-to-end and utilizes two fully connected layers as a nonlinear regression function to map the feature vector to the discomfort score of the stereoscopic image.The proposed MLI-Net exhibits good performance on the IEEE-SA stereoscopic image dataset,with a pearson’s linear correlation coefficient,spearman’s rank correlation coefficient,and root mean square error values of 0.9162,0.8551,and 0.3394,respectively.(2)The proposed model for predicting stereoscopic image discomfort is called CSP-Net and is based on the cyclopean image hypothesis and significance.First,an RGB cyclopean image is constructed considering binocular rivalry and disparity.Next,the depth information is used to calculate the 3D saliency map,which is utilized to extract cyclopean image patches on salient regions and expand the number of dataset samples.Finally,the CSP-Net model takes these cyclopean image patches as inputs to predict the stereoscopic image discomfort score.To further refine the features,a group-weighted attention module is incorporated into the CSP-Net model.The effectiveness of the CSP-Net model is validated through extensive experimental evaluations.The model achieves good performance on the IEEE-SA stereoscopic image dataset,with pearson’s linear correlation coefficient,spearman’s rank correlation coefficient,and root mean square error values of 0.9095,0.8512,and0.3408,respectively. |