The functional requirement of image and video background replacement is widely used in short video production,high-definition video conference and other scenes.The background replacement algorithm based on artificial intelligence technology helps to reduce the threshold of video generation.The quality of background replacement video directly affects the audience’s perception.The uneven level of creators and the difference of the effect of background replacement algorithm lead to the difference of image and video quality after replacement.Therefore,the assessment of the quality of images and videos after background replacement has important guiding significance in both industry and academia.The general solution of AI assisted background replacement of images and videos is divided into two parts:foreground segmentation and background fusion.There have been many studies on image and video quality assessment,but in the scene of background replacement,due to the attention characteristics of human vision,the audience has a higher perception of significant target quality damage,is more sensitive to edge burr and jitter,and needs to consider the overall harmony after background replacement at the same time.According to the visual characteristics of background replacement image and video,this paper studies the no-reference multi-dimensional quality assessment scheme suitable for background replacement video.The main work and innovations of this paper are as follows:(1)In view of the characteristics of background replacement video compared with ordinary video and the shortcomings of existing research on video quality assessment,this paper proposes a multi-dimensional background replacement video quality assessment model,which completes the quality assessment of background replacement video from four dimensions:objective quality,aesthetic quality,segmentation edge accuracy and motion consistency,It solves the problem that the existing research on video quality assessment focuses on a single dimension and can not fully fit the characteristics of human eyes.Among them,the objective quality dimension evaluates the objective quality affected by clarity and noise;The aesthetic quality dimension predicts the subjective aesthetic feeling;According to the edge accuracy of segmentation,the edge confidence of video frame and the inter frame stability of edge segmentation are evaluated by confidence modeling;According to the rationality of motion,the motion consistency dimension evaluates the consistency of the motion characteristics of the foreground and background after fusion through the motion state of the foreground and background,the difference of motion blur and the change of motion amplitude;This paper also constructs a subjective assessment data set of background replacement to verify the effect of multi-dimensional background replacement video quality assessment model.(2)In view of the insufficient subjective accuracy of the existing objective quality assessment methods and the insufficient fitting of human visual characteristics,this paper designs a video frame fidelity assessment model based on spatio-temporal attention mechanism,Based on the original objective quality assessment research based on depth model,this method introduces the significant branches in time domain and space domain to improve the effect of quality assessment model in video background replacement scene.(3)Aiming at the problem that the existing methods cannot measure and evaluate the aesthetic loss caused by the background replacement process such as fusion harmony,contrast distortion and artificial synthetic traces in the background replacement scene,an aesthetic quality assessment model of background replacement video based on multisource feature fusion is designed according to the aesthetic quality assessment dimension in the multi-dimensional model The artificial features are designed and extracted from the three perspectives of composition quality and fusion harmony,and the aesthetic quality of the background replacement video is jointly evaluated after fusion with the depth features.The model not only achieves the best effect on our self built background replacement video dataset,but also has good generalization ability on other aesthetic data sets. |