Font Size: a A A

Privacy Model With Machine Learning Technique Toward Obtaining Optimal Utility

Posted on:2018-05-24Degree:MasterType:Thesis
Country:ChinaCandidate:Geoffrey Eustace MtuiFull Text:PDF
GTID:2348330536981833Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As a concern for privacy comes in high demand as the increase of technological growth which causes more personal data to be shared across different organizations,devices,and in internet of things(IOT),causing both privacy and utility becomes at risk,especially due to the fact that utility of such datasets diminishes even while confidentiality is achieved,hence new problem arises in maintain data privacy at the same time in retaining as much utility as possible in a larger set of data.In this research,we investigate the privacy model approach with the use Flash sort Algorithm during k_anonymization together with C4.5 classifier machine learning technique,an approach that seeks to preserve data privacy and at the same time maintain optimal utility of the dataset.The first step of the methodology applies a strong data privacy granting technique on larger statistical dataset(Adult)with 30,162 records and attributes,where by flash sort algorithm was used to k-anonymize the dataset,with selected optimal privacy level set as k-value of 2,then followed by C4.5 classification process aiming at attaining as much utility as possible by classifying the dataset.Further investigation is done by reducing the size of the Statistical dataset to half(15081 records)and also same approach was re-applied with the reduced number of attributes(5 attributes used)in the Statistical dataset(Adult)so as to also investigate the effect of our approach in these conditions of the dataset.The findings in this study revealed some significant results in our approach,hence,was able to maintain the accuracy of the data compared with other researchers,our proposed methodology provides better results with 90.77% utility,Furthermore,toward our proposed methodology approach we obtain comparatively utility loss by 0.5% on half size of the dataset,Comparatively with the great loss of utility by 2.28% on reducing the number of attributes,whereas on the larger size of the dataset used we obtain a comparable loss of 1.24% from the original un-anonymized dataset.Even though it provides an increase in accuracy compared to other researcher but fails to provide the maximum result as expected on the larger size of the dataset,the findings show that our approach works well on reducing the number of the dataset,and provide the lowest utility value with the reduction number of attributes.The study predicts that using the same approach with altering the privacy approach and with different types of classifiers would yield a better result in the future especially when working with a larger dataset.
Keywords/Search Tags:Privacy, Machine learning, K-anonymity, Flash algorithm C4.5 classifier
PDF Full Text Request
Related items