| IgA nephropathy(Immunoglobulin A Nephropathy,IgAN)is the most common primary glomerulonephritis worldwide,with up to 30% of patients potentially progressing to end-stage kidney disease,meaning that their kidney function can no longer sustain minimal life activities,and the only treatment option is kidney transplantation or dialysis.Research has shown that IgA nephropathy is the result of multiple mechanisms,influenced by various factors such as familial inheritance,autoimmune factors,and environmental factors.Therefore,it is of great significance for the diagnosis and treatment of IgA nephropathy to predict the development of IgA nephropathy patients with clinical and genetic and other information.However,current research mainly focuses on predicting the progression of IgA nephropathy through machine learning models based on clinical features.At the same time,patient samples often present imbalanced distribution,and the scale of medical datasets is small,which brings challenges to the model training.The thesis aims to comprehensively analyze and evaluate the disease progression of IgA nephropathy patients from multiple dimensions,construct the largest known data set of IgA nephropathy patients in China and propose interdisciplinary research methods for predicting the disease development of IgA nephropathy based on both single-modal and multi-modal data,with the main focus on the following aspects:Firstly,predict the probability of renal deterioration in IgA nephropathy patients based on clinical features using the XGBoost algorithm,ranks the clinical features’ importance in the input model,and explains the influence of features on the prediction results using SHAP methods.Experimental results show that the XGBoost model has better classification ability and good interpretation,and the explanation of the model prediction results are generally consistent with clinical diagnosis experience.Secondly,Thesis considers both clinical and genetic features,constructs a multimodal dataset,introduces self-attention mechanisms and cross-attention mechanisms to capture the correlation information within the same modal data and between different modal data,proposes a model structure based on attention mechanisms for cross-modal feature fusion of clinical and genetic features to predict the outcome of IgA nephropathy patients.Experimental results verify the effectiveness of the cross-modal feature fusion model.Thirdly,targeting the class imbalance problem,thesis make use of methods such as random undersampling,data augmentation,and progressive balancing sampling to improve the multi-modal IgA nephropathy diagnosis model,enhancing the model’s ability to recognize tail samples and improving its generalization ability and robustness.Compared with the original multi-modal dataset,the model’s classification accuracy is improved by22.58%,14.52%,and 9.68% respectively for each of the three improvement methods mentioned above. |