Font Size: a A A

Authorship Attribution in the Enron Email Corpus

Posted on:2011-08-18Degree:M.SType:Thesis
University:University of Maryland, Baltimore CountyCandidate:Corbin, MichaelFull Text:PDF
GTID:2468390011471160Subject:Information Technology
Authorship attribution is the study of determining the author of a document by analyzing its contents. Different techniques have been developed over the years in order to accomplish this goal. Computers have aided in the development of these methods over the past few decades by providing a way to quickly process large amounts of data. Until recently most studies focused on corpora composed of books, plays, and papers. I applied the methods that have proven to work well on various corpora to the Enron email corpus. I focused on using function words and n-grams to determine the authorship of emails. I then selected the method that worked the best by evaluating how well they clustered a user's emails together and by how well each method separated the emails from different users apart.
Keywords/Search Tags:Enron email
Related items