Privacy Model With Machine Learning Technique Toward Obtaining Optimal Utility

Posted on:2018-05-24

Degree:Master

Type:Thesis

Country:China

Candidate:Geoffrey Eustace Mtui

Full Text:PDF

GTID:2348330536981833

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

As a concern for privacy comes in high demand as the increase of technological growth which causes more personal data to be shared across different organizations,devices,and in internet of things(IOT),causing both privacy and utility becomes at risk,especially due to the fact that utility of such datasets diminishes even while confidentiality is achieved,hence new problem arises in maintain data privacy at the same time in retaining as much utility as possible in a larger set of data.In this research,we investigate the privacy model approach with the use Flash sort Algorithm during k_anonymization together with C4.5 classifier machine learning technique,an approach that seeks to preserve data privacy and at the same time maintain optimal utility of the dataset.The first step of the methodology applies a strong data privacy granting technique on larger statistical dataset(Adult)with 30,162 records and attributes,where by flash sort algorithm was used to k-anonymize the dataset,with selected optimal privacy level set as k-value of 2,then followed by C4.5 classification process aiming at attaining as much utility as possible by classifying the dataset.Further investigation is done by reducing the size of the Statistical dataset to half(15081 records)and also same approach was re-applied with the reduced number of attributes(5 attributes used)in the Statistical dataset(Adult)so as to also investigate the effect of our approach in these conditions of the dataset.The findings in this study revealed some significant results in our approach,hence,was able to maintain the accuracy of the data compared with other researchers,our proposed methodology provides better results with 90.77% utility,Furthermore,toward our proposed methodology approach we obtain comparatively utility loss by 0.5% on half size of the dataset,Comparatively with the great loss of utility by 2.28% on reducing the number of attributes,whereas on the larger size of the dataset used we obtain a comparable loss of 1.24% from the original un-anonymized dataset.Even though it provides an increase in accuracy compared to other researcher but fails to provide the maximum result as expected on the larger size of the dataset,the findings show that our approach works well on reducing the number of the dataset,and provide the lowest utility value with the reduction number of attributes.The study predicts that using the same approach with altering the privacy approach and with different types of classifiers would yield a better result in the future especially when working with a larger dataset.

Keywords/Search Tags:

Privacy, Machine learning, K-anonymity, Flash algorithm C4.5 classifier

PDF Full Text Request

Related items

1	Research On Privacy-preserving Data Publishing Algorithms Based On Different Anonymity Requests
2	Research On Privacy- Preserving Data Mining Based On K-anonymity Algorithm
3	Research On Anonymity Models And Algorithms For Resisting The Attack Of Sub-trajectory
4	Studies On Classifiers Based On Decision Boundaries From The Perspective Of Dividing Data Space
5	Research On K-anonymity Algorithm And Anonymous Technology In Privacy Protection
6	Study On Privacy Protection Algorithm Based On K-Anonymity
7	Research On Anonymity Models And Algorithms Of Trajectory Privacy Preservation To Resist Location Linkage Attack
8	Research On Privacy Protection Algorithm Based On (?, K)-Anonymity
9	Research On Privacy Protection Based On K-anonymity
10	Research On Anonymity Models And Algorithms For Privacy-Preservation Data Publishing