Font Size: a A A

Study On Video Quality Assessment Method Based On Deep Learning

Posted on:2024-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:X LiFull Text:PDF
GTID:2568307115992889Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the emergence of online video sharing platforms,the communication between people is being gradually changed.Nowadays,video throughput occupies an increasingly large share of the network throughput.The quality of videos on the web varies due to the different skills of video shooters,the high and low resolution of shooting devices,the constraints of shooting environment and transmission conditions.User-generated videos are rich in content directly,but various distortions are superimposed,and the evaluation of such videos is challenging.In this paper,corresponding improvement methods are proposed for different problems,as follows:(1)Facing the problem of inadequate video content information extraction,this paper proposes a video quality assessment method incorporating dual deep learning networks.This method is divided into three parts: firstly,the Res Net50 network tandem convolutional layer structure with the characteristics of small convolutional kernels to obtain the local information of the video content at a deep level.The structural properties of the parallel convolutional layers of the Inception V3 network and the characteristics of large convolutional kernels are used to obtain global information about the video under different perceptual fields.The combination of the two makes for richer access to video information.Next,a Bidirectional Gate Recurrent Unit(Bi-GRU)is used to obtain spatiotemporal information.Then an exponential function fitting scheme is introduced to construct a hybrid fitting strategy to improve the predictive component of the temporal memory model.Combine it with a Gaussian distribution model to predict the final video quality score.Finally,extensive experiments are conducted on the Ko NVi D-1k database and the LIVEVQC database.The model achieves the best combined performance in terms of predictive monotonicity(Spearman order coefficient,SROCC)and predictive accuracy(Pearson linear coefficient,PLCC),reaching 0.7786 and 0.7759,respectively.Demonstrates the superiority of this model.(2)Facing the problem of inadequate utilization of the extracted video information content,this paper proposes a transformer based video quality assessment method.The method consists of three parts,firstly the video content feature extraction part,using Shuffle Net for feature extraction.The Pointwise Group Convolution and Channel Shuffle features of this network enhance the connection between video frames.The feature extraction strategy that incorporates different convolutional layers ensures the integrity of video content extraction.Secondly,for the spatiotemporal feature extraction part,the Decoder layer of the Transformer network is introduced to more closely match the subjective evaluation process.The Decoder layer contains a multi-headed self-attention mechanism that hierarchically processes the features extracted by the Shuffle Net network.The spatiotemporal features obtained after re-entry into the Bidirectional Gate Recurrent Unit(Bi-GRU)are more representative.Then in the video quality score prediction section.To reduce the influence of artificial differentiation,a mixed influence strategy of temporal memory factor and visual lag factor was constructed.Finally,extensive experiments are conducted on the Ko NVi D-1k database and the LIVEVQC database.The Spearman order coefficient(SROCC)and Pearson linear coefficient(PLCC)metrics in the combined performance reaches 0.8103 and 0.82,respectively.Model performance is further improved.
Keywords/Search Tags:video quality assessment, feature fusion, deep learning, multi-headed self-attention mechanism
PDF Full Text Request
Related items