Font Size: a A A

Privacy-Preserving SVM Model For Distributed Medical Data Analysis

Posted on:2019-05-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:Mohammed Zain Omer Yousif MohaFull Text:PDF
GTID:1318330569487573Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
To build reliable prediction model and recognize useful patterns,aggregating datasets maintained by different sources such as healthcare providers have become increasingly prevalent.Most distributed data mining algorithms can efficiently manage and mine a complete dataset from distributed sources.However,it might reveal critical information about the individual data;thus leading to increasing concerns about privacy,which discourages different participants from sharing data.Also,mining data from incomplete datasets usually provides poor results in data mining process.Privacy Preserving of Distributed Data Mining(PPDDM)techniques are providing a means to address the privacy issues without accessing real data to avoid disclosing information behind the final results.Recently,many state-of-the-art PPDDM methods based on randomization and anonymization techniques have been developed.These techniques use data distortion to mask the original records values and data transformation to construct a set of anonymous records.Therefore,a variety of cryptographic methods can be used for communication among different participants either vertically or horizontally partitioned data.So,a secure computation can be possibly achieved without revealing the sensitive information.Classification is an essential task of data mining that aimed to discover knowledge and classify new instances.Support Vector Machine(SVM)has been considered as one of the essential algorithms that can be used in various spheres of classification problems.In this work,a new protocol for preserving privacy in distributed SVM classification model is proposed.Based on Gram matrix,the proposed protocol is used to construct a global SVM classifier over vertically partitioned data from multiple participants.To aggregate the distributed data at a third party securely,the proposed protocol based on Paillier cryptosystem properties is used to calculate the inner dot product between the data points,build the global SVM model and classify new patient's data.A privacy-preserving protocol of distributed SVM model over vertically partitioned data with imputing missing data is also proposed.This protocol utilizes multiple imputation technique to handle the missing values before aggregating distributed datasets.In addition,a new framework for maintaining the privacy of distributed data mining over horizontally partitioned data with imputing missing data is proposed.The framework of this study contains three layers:(1)the bottom layer intended to handle the missing values on the local participant's data by using the multiple imputation technique,(2)the medium layer preserve the privacy of participants by applying Paillier cryptosystem properties to calculate the inner dot product of the Gram matrix between the data points,(3)the top layer is concerned with building a global SVM model by a semi-honest third party and applying the model to classify new patient's data.The performance evolution of the proposed framework on the distributed and centralized data was investigated by using an accuracy metric.The obtained results showed that the accuracy of the distributed SVM model is the same as when the data is centralized.The distributed SVM model achieved better results with imputing missing data when compared with omitted missing values.Also,the framework achieved a better processing time over distributed data compared with centralized data.This framework,besides the efficiency demonstrated in solving the problem of cooperative privacypreservation of distributed data between participants that builds a non-linear level classification model,it also gains a significant improvements in the performance of the created classifiers.
Keywords/Search Tags:Distributed Data Mining, Privacy Preserving, Homomorphic Encryption, Gram Matrix, Data Imputation
PDF Full Text Request
Related items