Font Size: a A A

The Research And Development Of Privacy-Preserving Genomic Data Prediction Platform

Posted on:2022-07-10Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhaoFull Text:PDF
GTID:2494306347473034Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As the next generation of diagnostic technology,precision medicine has attracted more and more attention from all countries.By combining genomics and other medical cutting-edge technologies,precision medicine can quickly find the cause of disease,classify the state and process of disease accurately,and finally achieve personalized treatment for specific patients.Genomics research is an indispensable part to promote the development of precision medicine.In recent years,the development of machine learning also helps researchers to analyze,and interpret genomic data to predict diseases,which accelerates precision medical services for consumers.However,due to the high sensitivity of genomic data,once the data source is leaked or abused,it is likely to bring adverse effects to the providers.And some studies have shown that,by some means of attack,the adversary can obtain the privacy attribute of the original training data from the machine learning model.How to combine machine learning with genomic data under the premise of protecting the security and carry out reasonable scientific research and application has also become a problem to be solved by all countries.In view of this background,we developed a privacy-preserving genomic data prediction platform based on secret sharing and Intel SGX technology:(1)This paper proposes a privacy-preserving framework based on secret sharing and knowledge distillation.The framework adopts a dual server architecture.The machine learning model and users’ privacy input can be uploaded to the servers in the form of secret share.The server performs secure two-party ML inference based on the shares of models and inputs.The shares of inference can be reconstructed in client.And then the knowledge transfer will be completed by using hard voting.The framework ensures the privacy of users’ input and training models,and provides reliable prediction accuracy and reasonable computational efficiency.(2)Combined with Intel SGX technology,the knowledge distillation process is moved into trusted execution environment.The blind matrix is used to protect the safety of intermediate results between servers,which prevents the privacy leaks in process of interaction and improves the security of the privacy framework in(1).In order to avoid the malicious adversary obtaining the prediction results directly and then attacking the prediction model,the confidence masing technology is used to protect the prediction vector.The new framework achieves a good balance between practicality and privacy.(3)A disease prediction platform based on genomic expression data is developed and deployed in a real environment.Under the premise of privacy preserving,the platform can realizes the prediction the association between genomic expression data and disease.
Keywords/Search Tags:genomic data, privacy preserving machine learning, secret sharing, Intel SGX
PDF Full Text Request
Related items