Font Size: a A A

A Study Of Depression Recognition Model Based On Multimodal Fusion Few-shot Learning

Posted on:2024-08-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y J XiaFull Text:PDF
GTID:2544307121983759Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the continuous development of information technology,the study of depression identification has become one of the key studies that many researchers have focused on.At present,the analysis and recognition of depression based on unimodal has been relatively mature,but with the increase of information representation,unimodal can hardly represent the global representation,so depression should be analyzed from the perspective of multidimensional information,and depression recognition based on multiple modalities has become the mainstream research direction.In response to the problems of prediction accuracy and training complexity arising from existing depression recognition methods,this paper improves depression recognition from the following directions.(1)In the field of depression recognition,existing methods based on single modalities such as voice,text,and video do not consider the interaction of information in the spatio-temporal dimension and the fusion of multimodal information.In this paper,we propose a fusion recognition method based on information from three modalities to generate a global fusion representation using information interactions among the three modalities.First,a multivariate regression model is used to pre-train the multimodal fusion weight parameters by drawing on the migration learning approach to address the training difficulties in fusion;Then,the consistency of the time dimension steps of the three modalities is exploited for dimension capture to achieve modal alignment,and the influence of the time step on the modal information is fully utilized.For the overall fusion,a hybrid fusion model is used,and a multilayer High-way network is employed for high-level feature extraction of each modal information separately;Finally,in order to fully characterize the global multimodal fusion information,multimodal fusion is performed by Bi LSTM,and finally the prediction results are obtained.In the experiments based on the Distress Analysis Interview Corpus(DAIC-WOZ,Depression Analysis Interview Corpus)dataset,the results show that the recognition accuracy of the three modalities after subjecting them to the above operations is 80%higher than the baseline level.(2)For the above-mentioned problems in multimodal fusion recognition of depression methods,a multimodal depression recognition framework based on few-shot learning methods is proposed again in the following of this paper to solve the problem of accuracy degradation caused by incomplete training for few-shot datasets.Firstly,each multimodal global representation formed at the end is viewed as the initial sample node of the graph neural network of the few-shot learning approach,and the training and test sets are divided into support and query sets,respectively,by drawing on the N way-K shot classification strategy for few-shot datasets;Next,the few-shot node classification method GCN is used to update the graph representation after each iteration,and the support set node features are passed to the query set nodes under the graph convolution operation to complete the learning of the query set node labels and obtain the recognition results;Finally,the multimodal fusion module is embedded into the few-shot learning module for the purpose of identifying depression.This study not only solves the multimodal fusion problem,but also can effectively improve the generalization performance of few-shot learning.The method achieves a relatively good accuracy on DAIC-WOZ,a publicly available depression-based dataset,and the final recognition results far exceed the baseline level,with 13% improvement in precision compared to the multimodal depression recognition model without the addition of the few-shot learning module.This demonstrates the strong generalizability of this model for few-shot medical data processing.
Keywords/Search Tags:Multimodal fusion, Few-shot learning, Depression recognition
PDF Full Text Request
Related items