| The early detection of gastric cancer is of great significance for reducing mortality and improving the quality of life of patients.In recent years,many studies have been devoted to predicting an individual’s risk of disease through electronic health record(EHR)data.However,few of these studies effectively integrate different types of clinical data,and the potential value between different types of data cannot be mined.And related studies are scattered,few studies focus on risk prediction of gastric cancer.In order to solve these problems,this paper proposes a fusion model for risk prediction of gastric cancer and precancerous diseases based on multi-type data.The main work of this paper includes the following aspects:(1)Collect EHR data of patients with gastric diseases from hospital information systems and inspection information systems of multiple medical institutions,and integrate them.Data cleaning and preprocessing are used to improve the data quality of the integrated data,making it suitable for the needs of the deep neural network model,and forming a risk prediction data set of gastric cancer and precancerous diseases.(2)A fusion model of gastric cancer risk prediction based on multitype data was designed.Using pre-trained language model and autoencoder,respectively,to extract features from the admission record text and laboratory test data in the patient’s EHR data,and perform feature fusion on the extracted features for disease risk prediction.To avoid overwhelming low-dimensional features,separate pre-training and dimensional augmentation strategies are also used.The experimental results show that the prediction accuracy of the model reaches 0.949337,which is better than the current existing models.(3)Aiming at the problem that the common pre-trained language model cannot fully extract the implicit information in the text,a multi-scale text analysis method is proposed.A self-attention mechanism is used to fuse the outputs of different Transformer modules,and based on this,a multi-type gastric precancerous disease prediction model fused with multiscale text analysis is designed.The prediction accuracy of this model for various precancerous diseases of gastric cancer reached 0.824182.In comparison with several different models,this method showed certain advantages.At the same time,the comparative experiments of different text fusion strategies also prove the effectiveness of this method.(4)In order to improve the practicability of the model proposed in this paper and help it to be applied to clinical practice more quickly,a gastric cancer risk management system based on multi-type data is designed,which can automatically give the patient’s disease risk according to the patient’s EHR data.It can also manage the patient’s disease risk and doctor’s orders,and build a unified platform for the prevention and treatment of gastric cancer for doctors and patients. |