Font Size: a A A

Query-oriented Micro-video Summarization

Posted on:2024-09-06Degree:MasterType:Thesis
Country:ChinaCandidate:M C JiaFull Text:PDF
GTID:2568306920951679Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,micro-videos have been widely used in information dissemination,entertainment,leisure and other fields.Their importance and value have been increasingly recognized by more and more people.Compared with long articles and pictures,micro-videos have stronger visual impact and appeal,and can therefore quickly attract viewers’ attention in a short time.Meanwhile,the production cost of short videos is relatively low,making it easier to produce and disseminate them on a large scale,leading to an explosive growth trend in the quantity of micro-videos.Faced with the large number of publications,viewers tend to use the built-in search engine of micro-video platforms to search for the micro-videos they need.In order to improve the effectiveness of retrieval applications,query-oriented micro-video summarization task aims to generate a concise sentence with two attributes:(a)summarizes the main semantic of the micro-video and(b)has a similar form of search queries to facilitate retrieval.Despite its enormous application value in the retrieval area,this direction has barely been explored.Previous summarization works mostly focus on the content summarization for traditional long videos.The direct adaption of these works is prone to gain unsatisfactory results because of the unique features of the micro-videos and queries:semantic gaps between modalities,diverse entities and scenes under limited duration,and various queries with distinct expressions.To specifically adapt with these characteristics,this article propose a queryoriented micro-video summarization model,dubbed QMS.It employs an encoder-decoderbased transformer architecture as the skeleton.The multi-modal signals are passed through two modal-specific encoders to get their representations,followed by an entity recognition module to identify and highlight critical entity information.As to the optimization,this article first develop a novel strategy to sample the informative query among the diverse query set.Under the situation of large semantic gaps between modalities,this article dynamically assign different confidence coefficients according to the semantic distance between modalities and the target in optimization process.To obtain pairs of micro-videos and summaries for the purpose of supervised learning and evaluation,this article collected user search logs from the search engine of the Kwai microvideo platform,and built a dataset called QMV-Kwai.This dataset contains 60,096 microvideos and 331,414 search queries.Extensive experiments conducted on this dataset demonstrate that the QMS model significantly outperforms several state-of-the-art baseline methods in multiple generative evaluation metrics.In addition to summarization generation evaluation,this article also validated the enhancement effect of the generated summaries on the micro-video retrieval performance.The summary results of the QMS model have improved the accuracy of micro-video retrieval task,which has a positive impact on relevant research and practice in the field of micro-videos.
Keywords/Search Tags:Multimodal Summarization, Query Suggestion, Micro-video Retrieval, Dataset
PDF Full Text Request
Related items