Distributed Gene Sequence Similarity Calculation Based On Secure Multiparty Computation

Posted on:2017-01-02

Degree:Master

Type:Thesis

Country:China

Candidate:L C Wang

Full Text:PDF

GTID:2308330485980617

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

Genomic data sharing and analysis provides a key way of understanding biological mechanisms, carrying out high level medical diagnosis and treatments, facilitating clinical healthcare data reuse, and accelerating scientific discoveries. As one of important similarity measurements, Edit distance is widely used in human genome studies. However, because of the sensitive personal information contained in genetic data, broad dissemination without adequate protection would cause harmful effects on participants. Confronting the challenge of privacy-preserving sequence sharing and analysis over genome datasets stored in different institutions around the world, this paper proposed a distributed genome sequence similarity calculation model based on Secure Multiparty Computation(SMC) and implemented it with the latest technology advances in Goldreich-Micali-Wigderson(GMW) protocol. The main contributions of this work consist of following three points:(1) Design and implementation of the distributed secure sequence analysis model using GMW circuit. This paper introduced and contrasted current SMC privacy protection protocols to conclude that GMW has some advantages over other protocols after adopting optimization techniques like Oblivious Transfer(OT) extension and OT pre-computation. Then, we designed a distributed secure genome sequence comparison model independent of any third-party platform, which meets the requirements of realistic scenario. Furthermore, it was implemented with logic circuit based on GMW protocol.(2) Genome data preprocessing and results integration. To increase the efficiency without loss the worthy of application, an approximate Edit distance method was leveraged to measure the similarity of gene sequences. To enable the parties to share part of the heavy computation loads of the model locally, we align the test datasets to a public reference sequence and further optimizing the distance method. To minimize the circuitry complexity as much as possible, we utilize the hash algorithm DJB2 to encode genome sequences as binary string. To reduce memory consumption and network bandwidth, we propose the idea of data segmentation. Therefore, the final results should be a synthesis of secure computation outputs conducted from all data blocks.(3) Experiments and discussion. We designed experiments for specific application scenarios as well as representative parameters, and summarized the performance of the prototype in terms of efficiency, number of gates(e.g. circuit complexity), Error Rate, and network bandwidth. What’s more, we compared experiment results with other reported work to further demonstrate the advantages of our model. Compared with the existing solutions, the distributed secure sequence analysis model not only supports sequence comparison over any number of parties against a semi-host adversary corrupting setting without leaking any party’s genetic information, but also only take 8s to get the distance of two sequences that respectively belong to two parties with 5000 gene locus on each sequence, and 320 s to compute the distance in a 3-party setup with 5 sequence in each party.

Keywords/Search Tags:

genome sequence Edit distance, privacy-preserving, data sharing, secure multiparty computation, distributed computation

PDF Full Text Request

Related items

1	Research On Privacy-preserving Technologies In Distributed Environmen And Their Applications
2	Research On Efficient CRT-Based Secure Multiparty Computation Scheme
3	Privacy Preserving Decision Tree Prediction Scheme Based On Secure Multiparty Computation
4	A Number Of Secure Multi-party Computation Problems
5	Outsourcing Computation Of Privacy Preserving Anomaly Detection Algorithm Based On Secure Multiparty Computation
6	Design And Implementation Of Unified Framework Of Clustering Algorithmns Based On Secure Multiparty Computation
7	Research On Key Technologies Of Privacy-preserving Data Mining On The Cloud
8	Research On Secure Multiparty Computation In The Application Of Internet Of Things
9	Research On Secure Multiparty Computation Of Set-Related Problems
10	Research On Key Technologies Of Secure Data Processing Based On Privacy-preserving Computation