Font Size: a A A

Research On Multi-view Video Recommendation Approaches Based On Multimodal Content Analysis

Posted on:2016-01-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:W QuFull Text:PDF
GTID:1318330542489739Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the enhanced computer processing abilities and the fast development of social media sharing paltform,multimedia become the major method for people to record data and communicate with each other in the real life.Video as one kind of multimedia data can record the image and voice of the realistic life,therefore it has a wide application in security monitor,commericial manufacture,medical diagnosis,entertainment and so on.However,the explorsive number of videos make the users difficult to find potential valuable knowledges from the video databases.Hence it is imperative to develop applications which provide efficiently and effectively video management,recommendation and browsing.Video recommender systems recommend relevant data according to the interest of users from massive data,which are able to solve the data overload problem effectively.And this topic that combining multimedia data analyis and recommender systems has become one of the hot research fields recently.Traditional video recommendation methods only analyze the metadata(text)or the visual content(images)of videos,which only considers the content of sigle modal.However,videos can contain multiple modals including text,image and audio,the relationships among which are ignored by most existing methods.Moreover,most recommender systems provide the results with the list of videos which ignore the temporal information in each video,and are unable to provide multi-angle recommendations,such as summarization recommendation and cross-modal recommendation.From the perspective of multi-modal content analysis,this dissertation explore several video recommendation issues.The main contributions are as follows:(1)In order to measure the similarities between different videos,an intermediate sematics based distance metric learning method is proposed,which simultaneously annotates the videos and learns the distance metric.Based on single labels,the multimodal lables are proposed to describe multimodal semantic in the videos.The multiple kernel learning framwork is adopted to combine multiple semantic concept together,which could map the low-level features into an uniform intermediate concept sapce.Then the distance metric is learned based on the intermediate concept space.The method solve the problem that traditional distance measure method unable to capture the samtic simailarity between videos.Experiments on realistic dataset shows that the proposed method can measure the semtic simialrity of videos more effective than traditional methods.(2)In oder to sovle two major problems in the video recommendation,both the multi-modal content analysis and representation model and a novel semi-supervised reinforcement video recommendation method are proposed.We collect multimedia data related to videos and analyze the multi-modal content,and built a uniform reresentation for multiple modals.Our method solve the time-cosuming and infleasible problem for the video content analysis.A co-training learning framework is leveraged to enrich user profiles when the users rating only few videos,which can sovle the user profile learning problem for casual users in the video recommendation.The experiments on a standard dataset show the efficiency of the proposed method.(3)To trackle the efficiency problem in video reviewing,we propose a novel video personalized summary recommendation method which generates the video abstraction according to the user's personal interests.The emotions and interactions bewteen characters are mined based on the proposed method,and an IE-RoleNet is proposed to describe the extracted relationship.The content of videos are further transformed into the string of IE-RoleNet.The sequence ming method is used to analyze the substring,which is used to generate the final summarization.According to the needs of different users,the structure-level personalized video summarization and role-centered video summarization are provided together.We extract summarization for realistic videos and conduct user case study,the results show the proposed method can capture the semantic content of the videos effectively.(4)In order to provide multi-modal recommendation results,a novel cross-modal recommendation methods is proposed and the text and audio data can be recommended according to the user-video information.An uniform feature representation is built based on the multimodal deep learning to model the relationship between different modals,which provide the foundation for cross modal retrieval.The storylines and musics are recommended to users based on the profiles in video domain and the uniform representation of multiple modals.The experiments conducted on standard dataset validate that the proposed method can implement multimodal recommendation effectively and provide a multimodal recommendation results.This dissertation focuses on multi-view video recommendation based on multimodal content analysis.The dissertation explores the multimodal feature extraction and representation methods,and multimodal video recommendation methods from three gradations,which are feature fusion,result fusion and method fusion.The key promblems in video metric learning,video recommendation,video summary recommendation,cross-modal recommendation are reserch analyzed and solved.The researchs have theoretical value to build and implement a multi-view video recommendation system.
Keywords/Search Tags:data mining, recommender system, video analysis, multimedia processing, multimodal data
PDF Full Text Request
Related items