Font Size: a A A

Research On Deep Multimodal Fusion Techniques And Time Series Analysis Algorithms

Posted on:2021-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:S D DaiFull Text:PDF
GTID:2428330629452635Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
The way in which people perceive and interact with their surroundings is multimodal and based on the experience of the sense of vision,hearing,touch,smell,and taste.Therefore,in order to make artificial intelligence understand our surrounding world in a better manner,it is required to interpret multimodal signals.The question remains important in the era of artificial intelligence 2.0 of how to use the mechanism of human brains for reference to process multimedia data that are heteromerous in structure and related in semantic.The problem of multimodal data fusion with deep neural networks and cross-modal representation is mainly discussed in this research.In the narrow sense,multimodal refers to different human sense,for example,human sights and the corresponding image data,and human hearing and the corresponding acoustic data.Whereas in the general sense,multimodal refers to data that are collected in multiple methods.Cross-media intelligence which is led by multimodal machine learning is mainly faced with 2 problems,semantic gap and isomeric gap.The problem of semantic gap is mainly caused by the difference between the computer representation of image and the semantic concept understood by human beings.And the isomeric gap problem mainly discusses the difference between the representation of different modalities such as vision and speech.The main work of this paper includes:(1)The R-DCCA method is proposed to address the overfitting problem and isomeric gap of multimodal deep learning.Deep neural networks are used to extract features instead of manual feature extraction,which gets rid of a priori knowledge.As traditional feature extraction methods rely on a priori knowledge seriously,it is proposed in this paper that DCCA is used to implement deep non-linear mapping of deep neural networks to map multimodal datasets from sample space to feature space.As DNNs are easy to get into overfitting problems which weakens the representation ability of the modal,R-DCCA method is proposed.By implementing ensemble methods,the generalization ability of the network is improved.Random link is introduced based on deep neural networks.The proposed method performs well in terms of generalization.(2)The GBDT-KF algorithm is proposed to solve the problem of noises.The GBDT-KF use extra features to achieve a higher robustness.GBDT-KF is proposed to perform time-series processing.By filtering the noises in the series out,the fitting ability of the algorithm is improved.In general,the origin data usually consists of noise and interference,and the deep network is easy to get into overfitting.So C-GBDT is proposed to use a sliding window to save training time.Besides,Kalman Filter is used to perform data smoothing in order to improve the precision and avoid overfitting.GBDT algorithm is used to perform data fusion in decision level,and achieved a better generalization performance and time-saving property.(3)We test the proposed methods on multimodal datasets.We use multimodal sentiment analysis dataset MOSI and mobile base station server logs dataset to test the proposed methods.The experiments are based on Intel? Xeon E4 processor and Ubuntu 16.04 LTS operating system.Python 3.7 is used to simulate the algorithm.The experimental results show that the proposed R-DCCA and GBDT-KF methods fit the requirements of multimodal data processing well.Both methods acquired good generalization performance and completed the task of multimodal representative learning.
Keywords/Search Tags:multimodal machine learning, multimodal data fusion, time series analysis, DCCA, RVFL, GBDT, Kalman Filter
PDF Full Text Request
Related items