Font Size: a A A

Multi-feature Representation Learning And Its Application To Multimedia Data Prediction

Posted on:2018-10-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:P G JingFull Text:PDF
GTID:1368330596497234Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
The rapid development of mobile Internet and hardware acquisition equipment is engendering an exponential explosion of multimedia data.Multimedia data become very rich,diversified and complicated.Multimedia is not limited to the form of a single media,but in the form of cross-media.With the advent of the era of Artificial Intelligence 2.0,multimedia analysis gradually shifted to demand-driven multimedia intelligence analysis.Multimedia prediction plays an important role in many real-world applications.Based on the deep analysis of current domestic and foreign research,this thesis takes the high-order time series,single media data and cross-media data as our research subjects,and conduct a deep research on multimedia feature representation.The main contents and innovations of this paper are summarized as follows:1.We proposed two novel models Multilinear Orthogonal Autoregressive(MOAR)and Multilinear Constrained Autoregressive(MCAR)for high-order time series prediction tasks.MOAR is designed to preserve as much information as possible of original data with orthogonal constraint.MCAR is an enhanced method,which is developed by adding an inverse decomposition error item as the constraint.For both two models,we project the original tensor into subspaces spanned by basis matrices to discover the intrinsic temporal structure of the original tensor.To better preserve the temporal smoothness between consecutive slices of the tensor,the projection matrices are jointly learned by introducing an autoregressive(AR)model.2.We proposed a joint low-rank and sparse regression(JLRSR)framework for image memorability prediction.JLRSR aims to jointly learn a low-rank projection matrix that enables us to decompose the original data into a component part and an error part and a regression coefficient vector for image memorability prediction.The projection matrix and the regression coefficients are bound by a sparse constraint to make our approach sufficiently invariant to the training samples.Moreover,a graph regularization term is constructed to improve the generalization performance and prevent overfitting.3.We proposed a novel framework called Multi-view Transfer Learning from External Sources(MTLES)to predict image memorability.In this framework,we simultaneously leverage different types of visual feature sets and multiple types of predefined image attributes derived from external sources.Specifically,to enhance representation ability of visual features,we constructed connections between visual feature sets and high-level image attributes by transferring attribute knowledge from external sources.MTLES integrates weak learning through external sources,transfer learning,and multi-view consistency loss with different types of feature sets into a joint framework.To better solve this joint optimization problem,we further develop an alternating iterative algorithm to deal with it.4.We focused on popularity prediction of micro-videos by presenting a novel lowrank multi-view embedding learning framework.We named it as transductive lowrank multi-view regression(TLRMVR),and it can boost the performance of microvideo popularity prediction by jointly considering the intrinsic representations of the source and target samples.In particular,TLRMVR integrates low-rank multiview embedding and regression analysis into a unified framework such that the lowest-rank representation shared by all views not only captures the global structure of all views,but also indicates the regression requirements.The framework is formulated as a regression model and it seeks a set of view-specific projection matrices with low-rank constraints to map multi-view features into a common subspace.In addition,a multi-graph regularization term is constructed to improve the generalization capability and further enhance the robustness of the proposed algorithm.
Keywords/Search Tags:Multimedia data prediction, Feature representation, time series, Tensor decomposition, Image memorability, Micro-video poularity
PDF Full Text Request
Related items