Bridging Distinct Domains in Privacy Related Learning Problem

Posted on:2018-07-12

Degree:Ph.D

Type:Thesis

University:Drexel University

Candidate:Overdorf, Rebekah

Full Text:PDF

GTID:2448390002497131

Subject:Computer Science

Abstract/Summary:

Development of efficient and effective machine learning methods has prompted a surge of research on their application from use in spam filtering to recommender systems. Blindly applying machine learning tools to learning problems in privacy and security, however, does not often produce the desired results. Applications of machine learning in privacy and security are often affected by this difference as adversaries are ordinarily present and training data with reliable ground truth is frequently difficult to obtain. This problem is exacerbated by the fact that data used for testing methods may differ from the real world data that the model is created for. This thesis addresses three learning problems in privacy and security, all of which have data from different domains that needs to be considered.;In authorship attribution we tackle the cross-domain case in which the training data and testing data are written in different contexts and mediums. Research in this area has been limited to texts written in the same domain, an assumption that cannot often be made in real world settings. We explore cross-domain attribution in three such domains: blogs, Twitter feeds, and Reddit comments.;Research in website fingerprinting focuses on a single domain, the incoming and outgoing packets on a network, to determine which webpage a user is visiting. In addition to this domain, we focus on the websites themselves and develop methods that successfully determine which website level features cause a site to be more or less susceptible to this type of attack.;Similarly, most research on the economies and structure of cybercriminal forums focuses on only the domain of private messages. While there is some research that has investigated what can be learned from the public interactions on these forums, no work has tried to bridge these domains. We present a method to predict which public threads are likely to trigger private interactions.

Keywords/Search Tags:

Domains, Machine learning, Privacy

Related items

1	Research And Implementation On Privacy Protection Technology For Training Samples In Machine Learning
2	Privacy Model With Machine Learning Technique Toward Obtaining Optimal Utility
3	Research On Privacy Protection In Machine Learning
4	Outsourced Machine Learning With Privacy Protection
5	Research And Implementation Of Malicious Domains Detection Technology Based On Big Data Analysis
6	Research On Privacy Protection Based On Linear Regression Machine Learning Algorithm
7	Research On Key Technologies Of Privacy-and Integrity-preserving Machine Learning
8	Research And Implementation Of Android Application Privacy Rating Technology
9	On Extreme Learning Machine For Preserving Privacies
10	Research On Data Privacy-Preserving In Machine Learning