Font Size: a A A

Application For Homologous And Heterogeneous Multimodal Data Based On Multiple Deep Learning Blocks

Posted on:2020-07-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y F WuFull Text:PDF
GTID:2428330575454994Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of technology,data collection and application become com-plex and diversified.Multi-modal data is collected from multiple sources or depicted from multiple perspectives.Moreover,data obtained from different sources is called heterogenous multi-modal data;data from single source which can be described from multiple perspectives is called homologous multi-modal data.Research on how to ex-tract features from multi-modal data and utilize them is called multi-modal learning.Recently,with the improvement of computer performance,deep learning technology has become more and more mature.Then in order to extract discriminative features from multi-modal data,considering different data characteristics and application sce-narios,much researchers adopt multiple deep learning blocks to propose many multi-modal methods.However,there are still some problems,such as the inconsistency of data modalities,the complex representation of data and the lack of information.To solve these problems,we studies them from the following aspects:1.Modal Inconsistency of Heterogeneous Multimodal Data.Most existing meth-ods utilize the modal consistency to reduce the complexity of the learning prob-lem.Modal consistency refers to the content consistency between different modal instances for the same object,which requires the modal completeness.However,due to the data collection failures,deficiencies and data privacy,multi-modal data is often incomplete.On the other hand,even in the complete instances,there are still inconsistent anomalies,that means inconsistent characteristics or inconsistency on all modalities.These problems jointly lead to the inconsistent problem.Therefore,we propose a deep robust multi-modal network(DRUMN)based on deep energy blocks.Based on the deep auto-encoder blocks,we solve the modal incompleteness by maximizing the consistency among the homoge-neous multi-modalities.Then we adopt an adaptive weight estimation method based on deep energy blocks to eliminate the inconsistent anomalies.Finally,DRUMN can extract the discriminative feature representations for each modal-ity against the insufficiency caused by the incompleteness and the inconsistent anomaly issue.2.Complex Representation of Heterogeneous Multimodal Data.Previous meth-ods assume that the heterogeneous multi-modal data is consistent in instance level,while in real applications,the raw data is disordered,i.e.,one article is con-stituted with variable number of inconsistent text and image instances.It's not very suitable for existing methods to solve such problems.We propose a novel Multi-modal Multi-instance Multi-label Deep Network(M3DN)which learns the label prediction and exploits label correlation simultaneously based on the opti-mal transport.3.Information Lack of Homologous Multimodal Data.Heterogeneous multi-modal data has multiple sources,which means the information available for min-ing.In contrast,homologous multi-modal data has little information due to the single data source.To solve the problem of information lack,we can mine data information from multiple perspectives,which is the foundation of homologous multi-modal data learning.Homologous multi-modal learning provides a better way to extract data features.In different application scenarios,we can use such technology to improve the performance of models.For example,in short text matching,data only has short text pairs,while previous text matching algorithms are mostly based on a single perspective.The performance of the model is lim-ited by the information lack.Then,we mine the information in text data from multiple perspectives and levels based on homologous multimodal learning.We propose a dual heterogeneous network with enhanced local interaction(ELOI).We learn text matching from local and global perspective,considering direct and enhanced interaction simultaneously.
Keywords/Search Tags:Machine Learning, Deep Learning, Multi-Modal Learning, Robust, Anomaly Detection, Multi-Instance Multi-Label, Short Text Matching
PDF Full Text Request
Related items