| In recent years,the issue of video content piracy and infringement has become increasingly severe,causing significant harm to the rights of original creators and undermining the healthy development of the video creation industry.Consequently,video copyright protection has emerged as a focal point.Additionally,with the explosive growth of internet videos,the presence of near-duplicate content has occupied substantial storage and bandwidth resources,making the filtration of such near-duplicate videos an essential requirement.To address these challenges,the technology of near-duplicate video segment retrieval has emerged.Its objective is to search for copied and transformed video segments from a video database.This technology plays a crucial role in video copyright protection,search,filtration,and more.This thesis divides the process of near-duplicate video segment retrieval into three stages.The first stage is the extraction of frame-level features,the second stage is the search for near-duplicate videos of the query video in the video database as candidate videos,and the third stage is the further localization of the copied segments in the candidate videos.There are three major challenges in efficiently and accurately retrieving near-duplicate videos and subsequently pinpointing the temporal boundaries of copied segments within large-scale video libraries.(1)There is a constant emergence of various types of tampering and transformation techniques for videos,including edits to video frames and temporal variations between frames.This necessitates strong robustness of video features.(2)Searching for target videos requires traversing the entire video library and measuring the similarity between the query video and all reference videos.Balancing retrieval performance and efficiency poses a challenge.(3)The diversity of scenes and complexity of transformations result in significant spatio-temporal variations between videos,making it difficult for existing localization methods to generalize across different scenarios.To address these challenges,this thesis accomplishes the following tasks:This thesis proposes a near-duplicate video retrieval method based on bag of visual words.By utilizing self-supervised contrastive learning,video frame-level features are extracted,and the robustness of features is ensured through extensive data augmentation.Using bag of visual words,features with semantic similarity in the video are assigned to the same cell.The video-level similarity is obtained by aggregating the cell-level similarity measured by joint spatio-temporal information.Combined with a dual index,this method can accurately measure similarity and significantly improve retrieval speed.On two comprehensive datasets,the retrieval performance and speed exceed existing methods,and average query response time can reach seconds for the short video database with millions of scale.This thesis proposes a video copy localization method based on bag of visual words.Using visual bag of words,the similarity matrix of video pairs is segmented into multiple sub-similarity matrices.Each sub-similarity matrix is constructed by frame-level features that are approximately aligned in time.Copy patterns are detected and subcopied segments are localized through a greedy strategy,which are then aggregated into complete copied segments.On the copy localization dataset MIX mixed from different scenarios,this method achieves the best performance. |