| The current commonly used screening methods for cardiovascular diseases include electrocardiogram(ECG),cardiac auscultation,echocardiography,X-ray,etc.Different types of medical data can provide different perspectives of information.For example,an electrocardiogram can be used to detect abnormalities in heart rhythm and electrical activity,while a phonocardiogram(PCG)obtained through an electronic stethoscope can be used to detect abnormalities in heart valves and blood flow.However,most current automatic detection methods for cardiovascular diseases only target single types of medical data,and cannot comprehensively assess the status and severity of cardiovascular diseases.Therefore,how to effectively integrate different types and modalities of medical data for accurate cardiovascular diseases detection is currently a research focus.Among them,cardiovascular diseases detection based on ECG and heart sound signals has become a hot research topic due to its non-invasive nature,low cost,and real-time capability.However,research on integrating the two is still limited.In this study,we conducted research on multi-modal data fusion methods for cardiovascular diseases detection based on ECG and heart sound signals,and the main findings are as follows:(1)Existing multimodal data fusion methods often require designing feature extraction models separately for each type of data,which lacks flexibility.Therefore,in this thesis,we design a feature extraction model that can adapt to both electrocardiogram and heart sound signals based on their common characteristics.Addressing the limitations of traditional single-scale signal feature extraction methods,which are limited to analyzing signals from a specific perspective and cannot fully describe the diversity and complexity of signals,we propose a deep dual-scale residual network from a multiscale perspective.Through the dual-scale feature aggregation module in this network,the features of the signals are decomposed and feature extraction is performed at different scales,and then combined to generate more accurate and richer feature representations.Traditional neural network models cannot dynamically adjust the weight parameters in the model for different input data.To address this issue,we propose a squeeze-andexcitation attention convolutional network in this thesis,where the squeeze-andexcitation attention mechanism can adaptively adjust the importance of feature maps for each channel in the model,thereby improving the feature extraction and classification performance of the model.(2)The existing methods for multimodal data fusion generally adopt a direct concatenation approach,but the fused features from different types of data may contain redundancy or complementarity.This approach may result in overfitting or underutilization of the information provided by the data.To address these issues,this thesis proposes a recursive feature elimination model to reduce the redundancy between features and increase the complementarity among features.To tackle the challenge of capturing complex dependencies in existing multimodal data feature selection methods,this thesis proposes a temporal pattern attention network.This model combines bidirectional long short-term memory networks and temporal pattern attention mechanisms,which can effectively capture complex relationships between different modalities and time steps,while also adaptively assigning weights to each data modality for efficient feature selection and classification of the fused electrocardiogram and phonocardiogram signals.Experimental results demonstrate that the performance of the model is significantly improved after feature fusion,surpassing the AUC value of any single-modality data.Moreover,compared to the temporal pattern attention network,the recursive feature elimination model achieves better classification performance with an AUC value of 0.962. |