Font Size: a A A

Multi-modal Learning Based On Single-modal And Multi-modal Data

Posted on:2021-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:K T WangFull Text:PDF
GTID:2428330647451059Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Daily life is full of all kinds of data,such as language,word,picture,sound and so on.The existence way of such things is modality.Multimodality refers to the combination of two or more modalities.Multi-modal learning is committed to studying the consistency and complementarity of modality and realizing the conversion and exchange of information between each modality.In recent years,with the great improvement of big data technology and the rapid development of deep learning technology,multi-modal learning technology has further developed.Deep multi-modal learning is an inevitable outcome of the development of multi-modal learning technology.It inherits the learning tasks and purposes of traditional multi-modal learning,and uses deep learning technology to promote the development and progress of multi-modal learning and also achieved remarkable results.In the field of multi-modal learning,most methods currently can extract useful information from the modalities to improve the performance of the algorithm,but there are still many problems that cannot be solved reasonably,such as modal insufficiency,the fact that most actual data are unlabeled and how to use multi-modal learning method to solve the problem of single modality.In this article,we consider these problems and do the following research on multi-modal learning based on single-modal and multi-modal data in different scenarios:1.Existing methods do not make full use of the consistency and complementarity of the modalities,and the real data are mostly unlabeled data.Most multi-modal learning methods now only considers the consistency or complementary of the modalities,but in fact,the actual data are very complicated,it is not reasonable to consider only the consistency or complementarity of the modalities,which will cause performance degradation.Also in active tasks,most of the data are unlabeled,but few existing multi-modal learning methods can effectively use unlabeled data to improve the performance of the algorithm.In order to solve these problems,we propose a novel comprehensive multi-modal learning framework.This model strikes a balance between modal consistency and complementarity.Firstly,we use the instance-level attention mechanism to weight the modal information of each instance to obtain its sufficiency.Then we design a novel regularization metric to measure the complementarity of the modalities.Finally,we acheieve the consistency of the modalities in unlabeled data by using robust consistency metrics.Experiments show that this model can achieve good results on real data.2.Single modality construction based on consistency and complementarity.In practice,most data exist in the form of single modality,which becomes an obstacle to the use of multi-modal learning technology.Although there are some multi-modal learning methods that can handle single-modal data including constructing single-modal data into multi-modal data,these methods often ignore the principle of modal complementarity during the modal construction and overemphasize the principle of modal consistency.Therefore,in order to solve this problem,we do some research in the short text matching and propose a dual heterogeneous network with enhanced local interaction.This model consists of two modalities: location modality for describing local interactions and global semantic understanding modality for extracting global semantic information.Two heterogeneous networks are used to extract two modal features and implement the principle of modal complementarity.At the same time,model uses the attention mechanism to combine two modal information and obtain comprehensive information,which can satisfy the modal consistency principle.Finally,we design the low-level interaction function and high-level interaction function and use LSTM to improve the performance of location modality and global semantic understanding modality respectively.This greatly improves the accuracy of the model in short text matching.
Keywords/Search Tags:Artificial Intelligence, Machine Learning, Multi-Modal Learning, Consistency and Complementarity, Short Text Matching
PDF Full Text Request
Related items