| With the rapid development of artificial intelligence and big data technology,data resources are becoming more and more abundant,and the utilization of data resources is becoming more and more critical.Some medical data related to human beings are often related to major interests,and the owners of these data are unable to exchange data at will,which hinders the development of artificial intelligence technology in the medical field.Therefore,the field of biomedical health is facing the problem of data island caused by data sharing.Federal learning is the answer to this problem.In the process of Federated learning,multi-party communication nodes need to carry out the process of data alignment.However,it is often difficult to choose the ID field of human related data.The selection of ID usually has an important impact on the results of Federated learning.Therefore,the personal digital ID based on Federated learning is of great significance.In view of the above situation,this paper starts with finding short tandem repeats that can be used as identification marks in gene data,constructs short tandem repeats data sets by using MSDB and NCBI databases,and proposes a short tandem repeats recognition method based on deep learning.This paper starts from two ideas: regard the detection task as one-dimensional target detection task and regard the detection task as anomaly detection task in time series.Combined with sequential coding,one hot coding,PCA,self encoder and other representation and dimension reduction methods,CNN and LSTM methods are used for experiments,and compared with traditional short tandem repeat detection software.Finally,it was found that one hot coding combined with CNN model was the best method to detect short tandem repeats,which could achieve 95% recognition accuracy in gene data of different species.In order to verify that CNN can detect short tandem repeats in different length gene sequences,we changed the sequence length and carried out experiments.CNN still showed 96% recognition accuracy,which proved that our method is feasible.Then,we propose a method to form personal gene digital ID based on short tandem reset points,and experiment on Homo sapiens data,and get the correct results.We combine this ID with RSA algorithm,and propose a key pair generation algorithm that can generate a recoverable private key.The key pair generated by this algorithm can solve the problem that the identity of users can not be verified due to the loss of private key in the network communication environment,and eventually cause property loss.Finally,we apply the personal gene digital ID to federal learning as the ID of the patient.We use the stroke prediction data set of kaggle website as the experimental data.After analyzing the data,we use smote algorithm for oversampling,and use horizontal and vertical federal learning methods to predict the stroke of patients.Then we use LR,random forest,Ann and lightgbm methods as the comparative experiments,and use a variety of evaluation indicators to evaluate the experimental results,Finally,it proves the feasibility of federal learning in disease prediction. |