| With the rapid development of the Internet,video is becoming more and more widely used in life.Adding harmonious music to videos is gradually becoming an artistically challenging task.However,manually selecting music takes a lot of time and effort,and deep learning-based approaches can provide a more efficient solution.In practice,the task of scoring long videos can be reduced to adding different background music to each of several consecutive short video clips.This paper therefore takes short videos as a vehicle for an in-depth study of cross-modal retrieval techniques for video-music.The main work of this paper is as follows.(1)We first theoretically elaborate on the characteristics of video music retrieval tasks,which require comprehensive consideration of content and emotional information,and then proposes a dual path solution.We use an encoder decoder structure as the content sharing representation space in the content path to obtain content information;in the emotional path,we obtain emotional information through emotional keyframes,channel attention mechanisms,and other emotional schemes.On the basis of effective feature extraction,the dual path network DPVM utilizes the fusion shared space to obtain more effective fusion features.Compared with the classic dual tower structure network EMVGAN,the retrieval performance of DPVM Recall@1 An increase of 9.02%.(2)In response to the problem of network heterogeneity in cross modal retrieval tasks,we propose a heterogeneity weakening scheme,which eliminates network heterogeneous information through data assimilation processing and feature extraction.The low heterogeneity network LHCT proposed is based on the Transformer structure and utilizes an emotion adaptive encoding scheme to obtain shared features.The results of ablation experiments indicate that compared to the retrieval network MVPt with heterogeneous network information,the retrieval performance of LHCT is better Recall@1 An increase of 5.99%.(3)Combining the characteristics of dual path and heterogeneous weakening schemes,we apply heterogeneous weakening schemes to dual path networks and propose a low heterogeneous dual path retrieval network DPCE.By eliminating heterogeneous information through low heterogeneity networks,pure emotional information can be obtained,and content and emotional information can be fused through dual path networks.DPCE in objective evaluation indicators Recall@1 Up to 15.83% Recall@25 It reached 63.34%,which is 4.60%and 9.86% higher than the advanced network MVPt,respectively.In addition,the subjective experimental results also indicate that the low heterogeneity dual path retrieval network has achieved the goal of retrieving harmonious background music for videos. |