Font Size: a A A

The provenance hierarchy of computer programs

Posted on:2012-02-23Degree:Ph.DType:Thesis
University:The University of Wisconsin - MadisonCandidate:Rosenblum, Nathan EFull Text:PDF
GTID:2450390011952338Subject:Computer Science
Abstract/Summary:
Where did this binary come from? How was it compiled? What language did the programmer choose? Who wrote this code? These questions rarely occur to most computer users, but for analysts working in forensics, reverse engineering, and software theft, they are of paramount importance. The provenance of a program binary---the specific process through which an idea is transformed into executable code---can provide valuable insight, yet it is in the very domains where such information would be most useful that it is least likely to be available.;The thesis of this dissertation is that characteristics of a program's provenance are inherently preserved during translation from source code to an executable form. We model program provenance as a hierarchy, and show that it is possible to recover details of a program's path through this hierarchy by combining evidence extracted from the program with models derived from large binary code data sets using machine learning techniques. In addition, we show that recoverable provenance characteristics extend beyond the tool chain used to produce a binary; we demonstrate that programmers can be identified based solely on characteristics of executable code, and introduce techniques to cluster programs according to stylistic (as opposed to functional) similarity.
Keywords/Search Tags:Program, Code, Provenance, Hierarchy
Related items