Font Size: a A A

Feature Selection Algorithm Based On Privacy Preserving

Posted on:2015-08-21Degree:MasterType:Thesis
Country:ChinaCandidate:K ChenFull Text:PDF
GTID:2298330467477063Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, data mining has emerged as a very active research area. This field of researchstudies how knowledge or patterns can be extracted from large data stores. However, data withextremely high dimensionality has caused the curse of dimensionality to existing learning methods.Feature selection is one of the most common techniques used to overcome the curse ofdimensionality. It aims to choose a best subset from the original ones according to certainevaluation criterion. However, everything has its two sides, data mining is no exception, it can leadto leakage of private information while mining knowledge. The question how these two contrastinggoals, mining new knowledge while protecting privacy information, can be reconciled, is the focusof this research.In this thesis, we propose two new privacy preserving feature selection algorithms to preservethe privacy of features and data. In order to preserve privacy for data, combining the gini index, wepropose privacy preserving feature selection algorithm based differential privacy. At the same time,in order to deal with large scale datasets, the MapReduce framework is adopted. The simulationresults and theoretic analysis indicate that during the selection of important features, the proposedalgorithm can spend less time to preserve privacy information than on a centralized Environment.In order to preserve privacy for features, combing PCA (Principal Component Analysis) andunsupervised feature selection using feature similarity, optimizing the feature similarity measureevaluation criteria, a new privacy preserving unsupervised feature selection algorithm is proposed.The simulation results show that the algorithm can decrease the sum of features subset’s amount ofinformation while guaranting the classification accuracy.
Keywords/Search Tags:Feature selection, Privacy preserving, Differential privacy, MapReduce, Principalcomponent analysis, Unsupervised
PDF Full Text Request
Related items