Font Size: a A A

The Research On Quality Assessment Of Stereo Visual Signals

Posted on:2023-01-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:J B YanFull Text:PDF
GTID:1528306791493064Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of stereo imaging techniques,stereo visual signals have become the essential multimedia data in our daily life.They also bring considerable changes to our daily life.One of the most common stereo visual signals is the stereoscopic image,widely adopted in 3D games,3D movies,etc.Since the view of any stereoscopic image is fixed,i.e.,the parallax of any stereoscopic is equal to a particular value,stereoscopic images can not provide other viewing angles for users.At the same time,to meet users’ increasing demand for the high quality of experience(Qo E),virtual view synthesis techniques have been proposed successively.Virtual view synthesis techniques can generate images from arbitrary perspectives,while the outputs include synthesized images and videos,providing a better immersive experience for users.One type of synthesized video is called free-viewpoint video(FVV),which allows users to freely change the viewing angle.Compared with other stereo visual signals,FVVs can provide a better interaction experience with users,and they have been gradually applied in sports,TV shows,and other applications.Therefore,the synthesized images and FVVs have also become important stereo visual signals.Although the stereo vision technology in multimedia processing plays an essential role in serving us and improving the quality of life,it brings many problems to the information management in multimedia processing.For example,the quality of the massive number of stereo visual signals varies greatly,and how to predict the quality of stereo visual signals accurately and screen out the stereo visual signals with low quality is extremely necessary.This operation can effectively save storage space and improve multimedia resource utilization.In addition,the pursuit of users for the high quality of immersive experience has always been the original impetus of the development of stereo vision technology in multimedia processing,which is also the objective of both academic and industry communities.Visual quality assessment of stereo visual signals aims to predict the quality of stereo visual signals accurately.It is one of the most critical ways to pre-process the massive number of stereo visual signals.Meanwhile,it provides the direct optimization goal for stereo visual signal processing algorithms and stereo vision systems.Therefore,quality assessment of stereo visual signals is necessary for information management in multimedia processing,which has considerable research and application value.This thesis dedicates to the quality assessment of three types of stereo visual signals,including stereoscopic images,synthesized images,and FVVs.The main content is as follows.(1)Considering that most existing stereoscopic image quality assessment(SIQA)methods only consider fusing high-level features,this thesis proposes a SIQA model with Multi-Level Feature Fusion,namely MLFF for short.MLFF includes a weight-sharing feature extraction module,a feature fusion module,and a quality regression module.Specifically,inspired by the multi-layer visual perception mechanism of the human visual system,MLFF extracts low-level,middle-level,and high-level features of left-and right-views through a weighting-sharing deep convolutional neural network.Then,MLFF further aggregates different-level features respectively in light of binocular properties.After that,two convolutional layers are used to integrate multi-level features.The deeply fused features are then fed into fully connected layers,producing the predicted quality score.We conduct experiments on two widely used public SIQA databases,and the results demonstrate the superiority of the proposed MLFF.(2)Due to the limited data of the public SIQA databases,it is pretty unreliable to compare the performance of different SIQA models on the publicly available SIQA databases due to the highly possible over-fitting risk.To address this problem,this thesis conducts a comprehensive study regarding SIQA based on semi-supervised learning.Specifically,we first construct a large-scale SIQA database with image-level coarse labels and view-level pseudo labels,then used as weak supervision signals.Then,we conduct a comprehensive study on SIQA by retraining the existing SIQA models on the proposed database,making us compare different SIQA models more fairly.Besides,we also investigate the influence of distinct network architecture,input size,and auxiliary supervision signal on the performance of the test SIQA models.We test the well-trained SIQA models on the public SIQA models.The experimental results demonstrate the necessity of the proposed SIQA database,and obtain multi-dimensional comparison regarding to different SIQA models.(3)The distortions in synthesized images introduced by DIBR show non-uniform distributions,and it is a challenging problem in image quality assessment(IQA).The thesis proposes an IQA method for synthesized images considering local variation perception and global change modeling(LVGC).Specifically,LVGC computes the Gaussian derivatives of the input image and obtains local Taylor series expansion,which are used to represent the local structure information.Furthermore,the uniform local binary pattern(ULBP)operator is used to encode the structure maps(i.e.,Gaussian derivatives),and the output of this operation is primary structure features.Then,local structure magnitude information is used to weight the primary structure features,and the outcome is the final structure features.Meanwhile,LVGC extracts the chromatic features to capture local variation,including hue and angle features.Similarly,we can get the chromatic features.Besides,LVGC uses the naturalness to measure the global change of synthesized images,including luminance and structure naturalness.Finally,LVGC employs local variation and global change to measure the quality of synthesized images simultaneously.Experimental results on three public benchmark databases demonstrate the effectiveness of our method on estimating the visual quality of synthesized views.And we also demonstrate the complementarity of local and global features by ablation study.(4)Since the content in the current FVV Qo E database is relatively simple and the amount of data is small,we conduct an extensive study of FVV Qo E from objective and subjective perspectives.Considering that there are only two application scenarios,i.e.,Chinese Basketball Association(CBA)and Variety Show,we propose a diverse data collection scheme for the limited scenarios,and construct the largest so far FVV Qo E database called Youku-FVV from two real complex scenarios by incorporating both internal and external influential factors,which correspond to FVV generation and playback.Youku-FVV originates from the videos captured by dozens of real cameras arranged annularly,and these videos are used to generate virtual viewpoints.The real views and virtual views make up FVVs.To collect subjective ratings efficiently,we propose a coarse-to-fine subjective scoring method.This method includes two stages: one is to screen out the certain samples on which the participants hold consistent ratings with a high possibility.The other is to collect the ratings for the remaining uncertain samples.Based on the subjective data,we deeply analyze the influence of depth information and crowdedness on the Qo E of FVVs.Besides,we make an initial attempt to train an efficient FVV Qo E prediction model utilizing this database,where several sparse frame sampling strategies are validated.Extensive experiments demonstrate that the proposed model can predict the Qo E of FVVs by only using partial frames.
Keywords/Search Tags:Stereo visual signal, Human visual system, Quality assessment, Synthesized image, Free viewpoint video
PDF Full Text Request
Related items