Font Size: a A A

Research On Aided Diagnosis Of Depression Based On Audio And Text Dual-Modal Recognition

Posted on:2023-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:Z L WangFull Text:PDF
GTID:2544307172457614Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Depression is a very harmful mental disease,and it has shown an upward trend in recent years.Many patients suffering from depression are deeply tormented.Early psychological intervention combined with drug treatment is the most effective method.However,the diagnosis of depression does not have a clear physiological indicator,so the relevant research focuses on finding a feasible and effective identification technology to assist in the diagnosis.Relevant research in the field of speech shows that speech is feasible for depression recognition,and speech has the advantages of small data volume,low invasiveness,and easy promotion.It is suitable for assisting doctors in completing preliminary diagnosis.Aiming at the limitations of single audio modality recognition,this thesis proposes a dual-modal depression recognition model that combines audio information and text information(semantic information),making full use of the complementarity between audio information and text information to obtain better Model performance,and build a classification and regression dualtask model to provide more information for depression recognition systems.First of all,in view of the limited representation ability of a single feature group,this thesis extracts global features and local features for a single modality.High-level statistical features containing global information and Mel spectrum features containing local information are extracted in audio modality,and sentence features containing global information and word features containing local information are extracted using a pre-trained Bert-base model in text modality.Then,in view of the high dimension of audio and text local features and the existence of timing problems,a Bi-directional Long Short-Term Memory(Bi LSTM)network is introduced to further extract context information.The test results in a single modality show that the method of multi-feature group fusion can obtain Better model performance.Finally,in view of the lack of correlation of multi-modal features,this thesis proposes a dual-modal local attention mechanism to automatically align and pay attention to the temporal features extracted by the deep Bi LSTM network to improve the correlation between modalities.At the same time,this thesis adopts the strategy of feature-level fusion to avoid the model being too complicated,and proposes a dual-modal depression recognition model with hierarchical attention feature fusion.The experimental results show that the model has good performance,and the F1 score of 0.69 is obtained in the classification task,and the mean absolute error and root mean square error are 4.89 and 5.99 in the regression task.This thesis synthesizes the key points of audio and text dual-modal depression recognition,and designs a dual-modal depression recognition system based on practical application scenarios.The system embeds the dual-modal dual-task model proposed in this thesis and implements it in the Linux which includes three main functional areas: the interviewee area,information management area and back-end management area.The system will save the interviewee’s audio,convert it into text information through the background port,and use it as the input of dual-modal data after processing.Finally,the validity of the model and the applicability of the system are verified by testing the audio from the test set and external input.
Keywords/Search Tags:Depression, Speech analysis, Dual-modal fusion, Depression recognition system
PDF Full Text Request
Related items