Font Size: a A A

Bridging Distinct Domains in Privacy Related Learning Problem

Posted on:2018-07-12Degree:Ph.DType:Thesis
University:Drexel UniversityCandidate:Overdorf, RebekahFull Text:PDF
GTID:2448390002497131Subject:Computer Science
Abstract/Summary:
Development of efficient and effective machine learning methods has prompted a surge of research on their application from use in spam filtering to recommender systems. Blindly applying machine learning tools to learning problems in privacy and security, however, does not often produce the desired results. Applications of machine learning in privacy and security are often affected by this difference as adversaries are ordinarily present and training data with reliable ground truth is frequently difficult to obtain. This problem is exacerbated by the fact that data used for testing methods may differ from the real world data that the model is created for. This thesis addresses three learning problems in privacy and security, all of which have data from different domains that needs to be considered.;In authorship attribution we tackle the cross-domain case in which the training data and testing data are written in different contexts and mediums. Research in this area has been limited to texts written in the same domain, an assumption that cannot often be made in real world settings. We explore cross-domain attribution in three such domains: blogs, Twitter feeds, and Reddit comments.;Research in website fingerprinting focuses on a single domain, the incoming and outgoing packets on a network, to determine which webpage a user is visiting. In addition to this domain, we focus on the websites themselves and develop methods that successfully determine which website level features cause a site to be more or less susceptible to this type of attack.;Similarly, most research on the economies and structure of cybercriminal forums focuses on only the domain of private messages. While there is some research that has investigated what can be learned from the public interactions on these forums, no work has tried to bridge these domains. We present a method to predict which public threads are likely to trigger private interactions.
Keywords/Search Tags:Domains, Machine learning, Privacy
Related items