Privacy Preserving Feature Selection In Distributed Environment

Posted on:2014-01-19

Degree:Master

Type:Thesis

Country:China

Candidate:W Q Wan

Full Text:PDF

GTID:2248330395483984

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the development of network technology and the improvement of computing power andstorage ability, the size of dataset is rapid growth. In order to obtain valuable information from thedata, data mining is necessary. And feature selection is one of the important and frequently usedtechniques in data preprocessing for data mining. It reduces the number of features, removesirrelevant, redundant, or noisy data, and brings the immediate effects for applications: speeding up adata mining algorithm, improving mining performance such as predictive accuracy and resultcomprehensibility.Privacy preserving is very important in data mining, which is given a great concern as datamining is widely used. Thus, how to select feature effectively based on privacy preserving is a hottopic. However, most of feature selection methods do not address issues about privacy, such asmedical and financial records, which may leads to serious information security problems in datamining and pattern recognition. In addition, the data from sorts of application may be stored inmultiple sites. In order to mining so large and distributed data, distributed computing technologyhas emerged. The purpose of this work is to develop a privacy preserving-based distributed featureselection algorithm and preserve the privacy of features and data.In order to preserve privacy for features, combing PCA (Principal Component Analysis) andSVM-RFE, optimizing the evaluation criterion on three methods, a privacy preserving featureselection algorithm based on PCA and SVM-RFE is proposed. The simulation results indicate thatthe algorithm performs well. While selecting the important features, it can decrease the sum offeatures subset’s amount of information to the utmost.In order to preserve privacy for data, under the Map-Reduce framework, combining the threestatistics including gini index, misclassification and entropy with the differential privacy, we presenta new privacy preserving-based distributed feature selection algorithm. At the same time, thetheoretic analysis for privacy guarantee is also presented. The simulation results on UCI repositoryand synthetic dataset indicate that during the selection of important features, it can preserve privacyinformation to a certain extent with less time cost than on centralized counterpart.

Keywords/Search Tags:

Privacy preserving, Feature selection, Distribution, Differential privacy, Principal component analysis

PDF Full Text Request

Related items

1	Feature Selection Algorithm Based On Privacy Preserving
2	Ensemble Feature Selection Based On Privacy Preserving
3	Differential Privacy Based Principal Component Analysis Algorithm Design
4	The Application Research Of Differential Privacy Data Release On The Precision Poverty Alleviation Big Data Platform
5	Research On Principal Component Analysis Algorithm Under Differential Privacy
6	Research On Privacy Preserving Publishing Of Big Location Data Based On Differential Privacy
7	Adaptive Differential Privacy And Its Applications
8	Trajectory Privacy Preserving Based On Statistical Differential Privacy
9	Research On Principal Component Algorithm Of Difference Privacy Based On Covariance Matrix
10	Preserving User Privacy For Large-Scale Personalized Online Video Service