Research On Deep Multimodal Fusion Techniques And Time Series Analysis Algorithms

Posted on:2021-04-27

Degree:Master

Type:Thesis

Country:China

Candidate:S D Dai

Full Text:PDF

GTID:2428330629452635

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

The way in which people perceive and interact with their surroundings is multimodal and based on the experience of the sense of vision,hearing,touch,smell,and taste.Therefore,in order to make artificial intelligence understand our surrounding world in a better manner,it is required to interpret multimodal signals.The question remains important in the era of artificial intelligence 2.0 of how to use the mechanism of human brains for reference to process multimedia data that are heteromerous in structure and related in semantic.The problem of multimodal data fusion with deep neural networks and cross-modal representation is mainly discussed in this research.In the narrow sense,multimodal refers to different human sense,for example,human sights and the corresponding image data,and human hearing and the corresponding acoustic data.Whereas in the general sense,multimodal refers to data that are collected in multiple methods.Cross-media intelligence which is led by multimodal machine learning is mainly faced with 2 problems,semantic gap and isomeric gap.The problem of semantic gap is mainly caused by the difference between the computer representation of image and the semantic concept understood by human beings.And the isomeric gap problem mainly discusses the difference between the representation of different modalities such as vision and speech.The main work of this paper includes:(1)The R-DCCA method is proposed to address the overfitting problem and isomeric gap of multimodal deep learning.Deep neural networks are used to extract features instead of manual feature extraction,which gets rid of a priori knowledge.As traditional feature extraction methods rely on a priori knowledge seriously,it is proposed in this paper that DCCA is used to implement deep non-linear mapping of deep neural networks to map multimodal datasets from sample space to feature space.As DNNs are easy to get into overfitting problems which weakens the representation ability of the modal,R-DCCA method is proposed.By implementing ensemble methods,the generalization ability of the network is improved.Random link is introduced based on deep neural networks.The proposed method performs well in terms of generalization.(2)The GBDT-KF algorithm is proposed to solve the problem of noises.The GBDT-KF use extra features to achieve a higher robustness.GBDT-KF is proposed to perform time-series processing.By filtering the noises in the series out,the fitting ability of the algorithm is improved.In general,the origin data usually consists of noise and interference,and the deep network is easy to get into overfitting.So C-GBDT is proposed to use a sliding window to save training time.Besides,Kalman Filter is used to perform data smoothing in order to improve the precision and avoid overfitting.GBDT algorithm is used to perform data fusion in decision level,and achieved a better generalization performance and time-saving property.(3)We test the proposed methods on multimodal datasets.We use multimodal sentiment analysis dataset MOSI and mobile base station server logs dataset to test the proposed methods.The experiments are based on Intel? Xeon E4 processor and Ubuntu 16.04 LTS operating system.Python 3.7 is used to simulate the algorithm.The experimental results show that the proposed R-DCCA and GBDT-KF methods fit the requirements of multimodal data processing well.Both methods acquired good generalization performance and completed the task of multimodal representative learning.

Keywords/Search Tags:

multimodal machine learning, multimodal data fusion, time series analysis, DCCA, RVFL, GBDT, Kalman Filter

PDF Full Text Request

Related items

1	Research On Machine Multimodal Perception
2	Research And Application Of Multimodal Learning For Heterogeneous Feature Fusion
3	Multimodal Classification System Based On Image And User Access Time Series
4	Research On Multimodal Data Fusion Methods
5	Research On Personalized Affective Interaction Based On Multimodal Semantic Analysis
6	Fusion Kalman Filter With The Wiener Filter, Based On The Observation Of The Modern Time Series Analysis Methods
7	Multimodal Sentiment Analysis For Text,Audio And Video
8	Design And Implementation Of Real Estate Price Forecasting System Based On Multimodal Information Fusion
9	The Research On Multimodal Fusion Emotion Recognition Based On Deep Learning
10	Research On The Relationship Between Network Big Data And Movie Time Series Data Based On Machine Learning And MF-DCCA