Font Size: a A A

Application Research Of Multimodal Learning Based On Deep Spectral Kernel Network

Posted on:2023-09-12Degree:MasterType:Thesis
Country:ChinaCandidate:H Y LiFull Text:PDF
GTID:2568307061450284Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
With the development of the Internet,personalized social platform has gradually influenced our life.The management of information content in the Internet has been widely concerned by the Party and government organs and enterprises.There has been an explosion in all types of media data,such as audio,text,images,and video.In this case,the classical unimodal data analysis technology is obviously not enough to effectively analyze multimodal data.The need of cyber security strategy has opened the era of multimodal analysis technology.Multimodal learning attracts the attention of researchers.Multimodal emotion recognition and crossmodal retrieval are two important topics of multimodal learning.They have broad application prospects in the fields of network public opinion monitoring,network information management and so on.In emotion recognition,capturing the interactive information among multimodal is necessary.In unaligned multimodal data,there are temporal characteristics caused by different sampling frequencies of different models.In crossmodal retrieval,bridging the heterogeneity of multimodal data and effectively measuring the similarity of samples are the basis of accurate and comprehensive retrieval results.Crossmodal subspace learning is one of the challenges of crossmodal retrieval.The mining of temporal interaction information among multimodal data,the representation of arbitrary high-order derivatives and the construction of subspace are urgent problems to be solved in the field of multimodal learning.Deep spectral kernel network is a new deep learning architecture.By combining spectral kernel and deep learning,it reveals the dynamic characteristics of input dependence and longrange global characteristics,and achieves better performance than the existing kernel methods and deep neural network.DSKN brings new ideas to solve the above multimodal learning problems.The application of deep spectral kernel network in multimodal learning is of great research value.The main contributions of this paper are as follows:(1)An emotion recognition model of unaligned multimodal data based on DSKN is proposed.The model fully excavates the time-series characteristics within and among multimodal,and establishes an effective semantic alignment mechanism among multimodal.On the one hand,deep spectral kernel network is embedded into the attention mechanism,and deep kernel crossmodal attention and deep kernel crossmodal Transformer are constructed to solve the problem that the existing models ignore the characteristics of temporal interaction.On the other hand,in the semantic alignment stage,based on the text information,the semantic alignment strategy between text-image and text-audio is constructed to solve the problems of information redundancy and complex calculation in the semantic alignment operation of existing models.Comparative experiments on the emotion recognition dataset verify that the performance of this model is superior to the existing models.At the same time,ablation experiments verify the advantages of deep spectral kernel network in solving the problem of unaligned multimodal emotion recognition.(2)A crossmodal retrieval model based on DSKN is proposed.The model explores the data distribution of multimodal and supervision information from labels,and constructs an effective crossmodal subspace mapping method.Firstly,the maximum average difference loss is explicitly constructed by embedding deep kernel,which overcomes the defect that the loss function can only measure low-order or local statistics,and restricts the distribution consistency of different modes.Then,the supervision information of labels is introduced to construct the semantic structure discrimination loss which constraints feature mapping of samples.Finally,in order to improve the quality of the retrieval vector and maintain the inherent characteristics of different modes,the mutual information measurement between the input and output of the sub network is maximized.The comparative experiments on the crossmodal retrieval dataset verify the improvement of the retrieval performance of the model in this paper.Further,the ablation experiments verify the rationality of the construction of crossmodal subspace.
Keywords/Search Tags:Deep Spectral Kernel Network, Multimodal Learning, Cyber Science
PDF Full Text Request
Related items