Font Size: a A A

Study On The Generalization Of No-reference Video Quality Assessment

Posted on:2023-09-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:P F ChenFull Text:PDF
GTID:1528306788474724Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the burgeoning development of video streaming media technique,various video services have become an inaccessible part in people’s daily life.These video data tend to go through a series of operations including capture,compression,transmission and reception before receiving the final visual effect by end users,which would inevitably introduce distortion and degrade the perceptual quality of the video.Therefore,accurate prediction and optimization of the quality of these processed videos are the major priorities to ensure the viewing experience.While conducting subjective video quality evaluation with the help of manual labor is both time-consuming and labor-intensive,it is deemed to be unattainable in practical applications.To seek a good compromise between the prediction performances and computational complexity,the no-reference video quality assessment methods that do not require any reference videos have emerged as the research focus from both industry and academia in recent years.The thriving deep learning technology gives birth to numerous end-to-end noreference video quality prediction models,which have pushed the boundaries of the prediction performances on existing publicly available datasets.However,in the vast majority of the practical application scenarios,there exists a large difference between the distributions of the test video data and the ones used for model training,which is the so-called domain shift in the machine learning related tasks.Taking into account the diversity of video contents and distortion types,these learning-based quality prediction models tend to experience significant performance drops and fall far from ideal when directly applied to other test scenarios.Consequently,this mentioned generalization problem might hamper the further application and development of the no-reference video quality assessment society.This dissertation concentrates on the generalization of video quality assessment metrics,by studying how to effectively transfer the trained quality prediction model,and how to construct an effective pre-trained model for network initialization to enhance the transferability.On the one hand,according to the amount of supervision information that can be obtained from the target domain during training,the first problem is solved by weakly-supervised domain adaptation,unsupervised domain adaptation and multi-source domain generalization,respectively.On the other hand,the second problem is solved by generating training samples in a self-supervised manner.The main research contents include the following four aspects:1)When the continuous-time user-perceived quality-of-experience labels of the distorted videos are difficult to achieve in practical applications,by regarding the readily available retrospective video quality-of-experience labels from the target domain as the weak supervision signals when conducting domain adaptation,this dissertation proposes a weakly-supervised domain-adaptive continuous-time video Qo E prediction method,dubbed DA-Qo E.While users’ perception of video quality-ofexperience is naturally continuous and time-varying,considering that the continuoustime video quality-of-experience is closely related to the retrospective one,the proposed method constructs a multi-task learning framework which enables the continuous-time and the retrospective video quality-of-experience prediction at the same time.During the domain adaptation process,such weak labels could provide effective guidance.Experimental results on several video quality-of-experience databases demonstrate that the proposed method is capable of achieving the state-ofthe-art prediction performance.2)In the case that the quality labels of the video data from the target domain are hard to obtain,the intrinsic subjectivity of video quality assessment task could give rise to the negative adaptation performance.To this end,this dissertation proposes an unsupervised curriculum domain adaptation for no-reference video quality assessment(UCDA).First of all,the easy and difficult tasks in the designed curriculum are constructed according to the domain adaptation results between the source and target domains,based on which the target samples are divided into confident and uncertain subdomains.The rationale is that,by solving the simple task first,some useful properties about the target domain could be inferred.Afterwards,the prediction performance of the difficult task could be improved through the adaptation between the two subdomains by following the inferred properties.Results of the extensive crossdomain quality prediction experiments have proved its superiority over the existing video-based domain adaptation methods.3)When only leveraging video data from multiple source domains for training generalizable quality prediction model,through fully exploring the potential benefits of those specific knowledge from each source domain for improving unseen domain generalization,an unsupervised domain generalizable video quality evaluation based on knowledge ensemble,named DEEK,is presented to tackle the problem.Based on the multiple expert models each trained to specialize in a particular source domain,this method proposes to train an ensemble prediction model that can effectively generalize across diverse domains in a contrastive ensemble learning manner.By dynamically assigning weights to each source domain expert when making quality predictions for unseen target samples,it is proved that the DEEK have shown great advantages over the existing models relying on the mixed domains training strategy.4)To meet the limitation on the transferability of the trained model brought by directly deploying Image Net pre-training model in network initialization,a contrastive self-supervised pre-training for video quality assessment(CSPT)is presented in this dissertation.This is reasonable considering the divergence between the image classification tasks and the quality assessment tasks.To perform such pre-training,on the basis of a large amount of unlabeled video data,this method tries to construct positive and negative pairs for conducting the contrastive learning through the distortion enhancement strategy.Moreover,the introduction of the distortion prediction task is able to provide stronger surrogate supervision signals for model training.It is proved that the prediction performances and generalization abilities of existing learning-based models can be significantly improved by replacing the Image Net pretrained model with the proposed CSPT pre-trained one.
Keywords/Search Tags:quality assessment, model generalization, domain adaptation, domain generalization, pre-trained model
PDF Full Text Request
Related items