Font Size: a A A

Multi-Modal Learning Techniques For Open Environment

Posted on:2020-01-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y YangFull Text:PDF
GTID:1368330605950424Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Multi-modal learning is one of the most important research fields in data mining,machine learning.Compared with single-modal learning,multi-modal learning aims to process and correlate multiple modal information.And effective multi-modal meth-ods can obtain richer semantic representations,thereby improving single-modal and multi-modal ensemble performance.Most traditional multi-modal methods rely on the assumptions that single modal itself has relatively sufficient information and the infor-mation among modalities is consistent.However,in practical applications,multi-modal features usually cannot meet above assumptions,especially in the open environment,which is affected by factors such as feature noises,incompleteness,etc.The main chal-lenges include:1)the collection costs of different modalities are inconsistent;2)the representation of different modalities is inconsistent;3)the information of different modalities is inconsistent.In summary,there exist“inconsistent modal cost","incon-sistent modal representation" and "inconsistent modal strength".The thesis aims to explore these three challenging issues,and proposed a series of comprehensive multi-modal learning methods.Meanwhile,the effectiveness of these methods is validated on public and real world data sets.The main research works are as follows:1.Introduce a serialized modal extraction method that considers the inconsistent modal costs.In open environment,collecting costs of different modalities are dif-ferent.Traditional multi-modal methods required full amount of modal information during training and testing phases,which ignores the problem of collection costs of different modalities.Based on the idea of modal information requirements for classification separability,this thesis proposes a novel end-to-end deep serialized modal extraction and classification decision method DMP(Discriminative Modal Pursuit).This method studies adaptive serialized modal extraction,and converts the modal extraction problem into parallelized label prediction and modal selection strategy problems.The main idea is to use the prediction accuracy of the current modality as the next modal selection criteria,while considering the collection cost minimization strategy to reduce the overall cost.2.Propose an incomplete modal clustering and classification method that con-siders inconsistent modal data.In open environment,it is difficult to ensure the consistency of multi-modal data.Factors such as privacy protection and packet loss collection will cause the modal incompleteness,resulting in inconsistent data among modalities.Aiming at this problem,this thesis proposes a semi-supervised multi-modal clustering and classification learning method SLIM(Semi-supervised Learning with Incomplete Modalities).This method utilizes labeled and unlabeled instances with different modalities to obtain potentially consistent prediction rep-resentations,and uses the potentially consistent prediction representations to com-plete the similarity matrix constructed by each modal features,thereby we can ob-tain separate classifier for each modality and cluster results of unlabeled instances simultaneously.Furthermore,considering that SLIM fails to make effective use of modal incomplete data,this thesis proposes a semi-supervised multi-modal clus-tering and classification learning method based on kernel technology,i.e.,SLIM-K(Kernel SLIM),in which the modal similarity matrix is replaced by the kernel matrix representation,and brought into the classifier learning as an optimization variable.3.Present a complex multi-modal multi-instance multi-label disambiguation learn-ing method that considers inconsistent modal data.In addition to modal incom-pleteness,the unclear correspondence among modalities is also a key factor leading to inconsistent data among modalities.Traditional modal correspondences require human annotation,which undoubtedly causes huge labeling overhead and relation-ship noise.In response to this problem,this thesis instead uses the consistent rep-resentation of different modal bag-levels,which can solve instance-level unclear relationships.And an end-to-end deep multi-modal multi-example multi-label dis-ambiguation network M3DN(Multi-modal Multi-instance Multi-label Deep Net-work)is also proposed.M3DN uses multi-instance processing layer to obtain dif-ferent modal consistent bag-level predictions.Meanwhile,it considers the corre-lation among labels based on optimal transport theory.In addition,this thesis fur-ther extends supervised M3DN to the semi-supervised setting,i.e.,M3DNS(Semi-Supervised M3DN).The M3DNS learns different modal prediction probabilities of unlabeled data as soft supervision information,which can constraint each other in return.Finally,the performance is further improved under semi-supervised setting.4.Introduce a strong modal model reuse learning method that considers incon-sistent modal information.In open environment,multi-modal data is usually af-fected by noise,self-defects,etc,resulting in different amounts of information for different modalities.In result,the prediction of same example by different modal-ities will also be inconsistent.That is,there exists strong and weak modalities.Building a high-performance model for weak modality requires more labeled data than strong modality,which undoubtedly increases the instance collection and la-beling cost.Therefore,some methods have been proposed to use strong modalities to assist weak modalities for model training,but this type of methods requires a full amount of strong modal feature representation during the training phase.Taking into account the factors as privacy protection,test overhead,etc,we usually only obtain the strong modal pre-trained model in practice.Aiming at solving this prob-lem,this thesis proposes FMR(Fixed Model Reuse)method from the perspective of strong modal model reuse.This method comprehensively utilizes strong modal model and labeled information to implicitly assist weak modal learning.In result,during the training phase,we can use auxiliary information of the strong modal model to obtain a more effective learner with limited weak modal data.5.Present a dynamic weighted multi-modal learning method that considers in-consistency of multi-modal strength relations.The partition of strong and weak modalities usually requires prior domain knowledge.While strong and weak rela-tionships among the modalities on different instances are also dynamic.Aiming at solving this problem,this thesis proposes a dynamic weighted multi-modal learn-ing method CMML(Comprehensive Multi-Modal Learning).On the one hand,the method uses additional attention networks to adaptively learn weights for different modalities od each instance,and uses dynamically obtained weights for weighted prediction.On the other hand,the method proposes divergence and robust consis-tency measure based on modal predictions.Consequently,the proposed method can effectively combine multi-modal information,and improve single-modal/ensemble results.
Keywords/Search Tags:Data mining, Multi-Modal learning, Open Environment, Inconsistent Modal Costs, Inconsistent Modal Representations, Inconsistent Modal Information
PDF Full Text Request
Related items