Font Size: a A A

Research On Near-duplicate Video Retrieval And Cross-domain Sentiment Classification Based On Embedding Learning

Posted on:2018-03-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y B HaoFull Text:PDF
GTID:1318330542461939Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology and intelligent hardware devices,various kinds of multimedia data are shared online.Videos and texts,as two important media information carriers,are two main concerns in the field of multimedia information processing.Embedding learning(EL)is widely used in many data processing fields such as multimedia data storage,retrieval and classification,which aims at generating the low-dimensional vector representations for data by exploring and exploiting their inherent characteristics.This thesis focuses on the development of near-duplicate video retrieval(NDVR)and cross-domain sentiment classification(CDSC)systems,and proposes several novel embedding based representation methods for videos and texts in order to mine their content information and generate the corresponding representations.NDVR focuses on how to retrieval videos that are almost identical or similar to the query video,in which the key task is the accurate extraction and representation of video contents.It has been a significant research topic in multimedia given its high impact in applications,such as video search,recommendation and copyright protection,etc.In addition to accurate retrieval performance,the exponential growth of online videos has imposed heavy demands on the efficiency and scalability of the existing systems.CDSC is the task of adapting a sentiment classifier trained on a source domain to a target domain without requiring any labeled data for the target domain,which focuses on how to reduce the mismatch between different word distributions of domains and how to generate accurate and compact representations for text data.To address the above issues and to consider the properties of video and text data,we propose four embedding based data representation algorithms,including three hashing algorithms for NDVR and one embedding algorithms for CDSC.The main contributions of the thesis are summarized as follows:1.Stochastic Multi-view Hashing(SMVH)To improve both the retrieval accuracy and speed of NDVR,this thesis uses SMVH to convert multiple types of keyframe features,enhanced by auxiliary information such as video-keyframe association and ground truth relevance to binary hash code strings.Reliable mapping functions are learned by maximizing a mixture of the generalized retrieval precision and recall scores.A composite KL divergence measure is used to approximate the retrieval scores,which aligns stochastically the neighborhood structures between the original feature and the relaxed hash code spaces.As shown in the experiments and compared against various classical and state-of-the-art NDVR systems,the proposed method are more effective and efficient.2.t-Distributed Unsupervised Stochastic Multi-view Hashing(t-USMVH)and its Deep Hashing ExtensionTo improve robustness of the unsupervised learning,a novel unsupervised hashing algorithm,referred to as t-USMVH,are proposed to support NDVR.t-USMVH combines multiple types of feature representations and effectively fuses them by examining a continuous relevance score based on a Gaussian estimation over pairwise distances,and also a discrete neighbor score based on the cardinality of reciprocal neighbors.Hash functions are learned by minimizing the KL divergence between the two sets of probabilities calculated from the original feature space and the relaxed hash code space respectively.To reduce sensitivity to scale changes for mapping objects that are far apart from each other,Student t-distribution is used to estimate the similarity between the relaxed hash code vectors for keyframes.This results in more accurate preservation of the desired unsupervised similarity structure in the hash code space.In addition,to consider the issues of unsupervised deep learning and to facilitate the large-scale retrieval,this work extends the proposed t-USMVH algorithm to unsupervised deep hashing,referred to as t-UDH.By adapting the corresponding optimization objective and constructing the hash mapping function via a deep neural network,we develop a robust unsupervised training strategy for a deep hashing network.3.Embedding Learning for Cross-domain Sentiment ClassificationTo facilitate CDSC,we propose a text embedding algorithm that maps domain words and documents into a unified embedding space.The proposed algorithm uses pivots to align the source domain and the target domain,constructs three probabilistic similarity matching models to learn reliable mapping functions,and generates the final embeddings for words and documents in both domains.Pivots are used to reduce the mismatch between the word distributions of the source domain and the target domain,while the three probabilistic similarity matching models are used to preserve the neighborhood structures of text data constructed from the original feature space in the created embedding space.Experimental results verify the efficiency and effectiveness of the proposed algorithm.
Keywords/Search Tags:Embedding learning, video retrieval, hashing, multi-view learning, deep neural network, domain adaptation, sentiment classification
PDF Full Text Request
Related items