Font Size: a A A

No-Reference Video Quality Assessment Based On Spatial-Temporal Feature Learning

Posted on:2022-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:F GaoFull Text:PDF
GTID:2518306605971909Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Video has accounted for more than 80% of Internet data in the age of information,covering people's consumer,education,medical care,security and many other fields.However,it is inevitable that video will be distorted and degraded in the process of collection,compression,transmission,storage,and playback,which will affect the viewing experience and even semantic acquisition of the video,and bring huge challenges to people's happy living.Therefore,designing a reasonable and reliable video quality evaluation method to provide guidance for optimizing video processing and analysis has very important practical application significance,which makes it a research hot spot in the field of computer vision.This thesis aims to solve the problems of insufficient consideration of human visual characteristics,poor evaluation of high frame rate videos,and insufficient amount of available data,and build a no-reference video quality assessment model to make a correct objective quality evaluation of the video.Main research contents are summarized as follows:(1)A no-reference video quality assessment method based on persistence of vision and memory is proposed.Existing video quality assessment methods usually dilute the impact of the temporal information of the video on the video quality,or confuse the information expression of the video in the temporal and spatial domains,and cannot accurately simulate the process of human visual quality perception,so the accuracy of the prediction result is not high.Therefore,this thesis proposes a quality perception network and a visual memory network based on the characteristics of visual persistence and visual memory.The quality perception network decomposes the traditional three-dimensional convolution into time convolution and spatial convolution,and extracts the short-time spatial-temporal distortion features that match the persistence characteristics of human vision between frames,which overcomes the difficulty of the prior art in the spatial-temporal operation of distorted video and the problem of information redundancy.The visual memory network employs the gated recurrent unit to efficiently model the characteristics of human visual memory and optimize the accuracy of video quality assessment under long-term sequence conditions.Experimental results show that this method has high evaluation accuracy for a variety of distortion types of videos,and has high consistency with human visual perception.(2)A no-reference video quality assessment method based on multi-scale spatialtemporal analysis is proposed.Existing assessment methods do not consider the effect of frame rate on motion blur,time aliasing,etc.,which leads to a reduced evaluation effect of the model when facing videos of multiple frame rates.Therefore,this method extracts the multi-scale local distortion features of the distorted video in the spatial-temporal dimension by designing a local distortion feature encoder network and spatial-temporal multi-downsampling-branch,and then the spatial-temporal attention pooling module selectively effectively integrates the local distortion features to simulate the human perception of motion distortion.The experimental results show that this method still has excellent evaluation accuracy when evaluating videos of various specifications.(3)A no-reference video quality assessment method based on meta-transfer learning is proposed.Existing assessment methods usually fine-tune on the pre-trained network model in order to deal with the insufficient amount of available video data.However,because these pre-training models are not aimed at quality assessment tasks,the correlation between the extracted features and the distortion features is very small,resulting in poor evaluation of the model.Therefore,this method is designed based on the idea of meta-transfer learning.It freezes the model parameters that have been learned in the image quality assessment database,then transfers them to the video quality assessment task and adds a scaling factor to them,so as to transfer the prior knowledge learned by the model in the field of image quality assessment to the field of video quality assessment.Experimental results prove that this method not only achieves excellent performance when targeting a single distortion type video,but also can quickly adapt to new distortion conditions.
Keywords/Search Tags:no-reference, video quality assessment, human visual characteristics, spatial-temporal multi-scale, meta-transfer learning
PDF Full Text Request
Related items