Transformer specification language: A system for generating analyzers and its applications

Posted on:2012-09-19

Degree:Ph.D

Type:Dissertation

University:The University of Wisconsin - Madison

Candidate:Lim, Junghee

Full Text:PDF

GTID:1468390011459191

Subject:Computer Science

Abstract/Summary:

As computers have become a pivotal component of daily lives, computer safety, reliability, and security issues have become enormously important. A considerable amount of recent research in program analysis and software engineering has been carried out on techniques and tools for finding software bugs and security vulnerabilities, and on checking computer-safety properties. Most of this research has focused on analyzing source code. Recently, machine-code analysis has begun to receive great attention both because source code is often unavailable and because there can be mismatches in various ways between source code and the machine code generated from the source code.;The tools and techniques for analyzing machine code are, in principle, language-independent. However, their implementations are often tied to one specific instruction set. Retargeting them to another instruction set can be an expensive and error-prone process. This dissertation describes a system that I developed, called TSL (for "Transformer Specification Language") that provides a systematic solution to the problem of creating retargetable tools for analyzing machine-code. The TSL system is a meta-tool, or tool generator, that automatically creates different abstract interpreters for machine-code instruction sets. The system addresses the problem of supporting multiple instruction sets by providing a YACC-like mechanism for creating key components of machine-code analyzers. The TSL system takes a single, unified description of the concrete operational semantics of an instruction set, which is specified in TSL, a strongly typed, first-order functional language, and automatically creates implementations of different abstract interpreters for the given instruction set.;TSL provides a fixed set of base-types and operators, as well as map-types with map-access and (applicative) map-update operations. The TSL compiler generates a common intermediate representation that allows the meanings of the input-language constructs to be redefined by supplying alternative interpretations of the base-types, map-types, and the operations on them ("semantic reinterpretation"). Because all the abstract operations are defined at the meta-level , a semantic reinterpretation is independent of any given instruction set defined in TSL. Therefore, each implementation of an analysis component's driver serves as the unchanging driver for use in different instantiations of the analysis component for different instruction sets. The TSL language becomes the specification language for retargeting that analysis component to different instruction sets.;As an application of the TSL system, we developed a novel way of applying semantic reinterpretation to automatically create symbolic-analysis primitives for symbolic evaluation, weakest-liberal precondition, and symbolic composition. Furthermore, using the TSL system, as well as the TSL-generated symbolic-analysis primitives, we developed a machine-code verification tool, called MCVETO, and a concolic-execution-based program-exploration tool, called BCE. (1) MCVETO addresses a large number of issues that arise when developing model-checking tools for machine code, for which standard techniques used in source-code model-checking tools would be unsound if applied to machine code. (2) What distinguishes the work on BCE is that it makes use of control-dependence information to make program exploration goal-directed toward a given set of targets.

Keywords/Search Tags:

System, Specification language, TSL, Instruction set, Machine code, Source code

Related items

1	Design And Implementation Of General Instruction Set Description Language
2	Research And Implementation Of A C Language Source Code Static Detection Tool
3	Source Code Based Suspicious Code And Bad Programming Practice Detecting
4	The Design And Implementation Of C++ Source Code Vulnerability Static Scanning System
5	Source Certification System And Structure,
6	Instruction Set Design For Encryption Application Specific Instruction Set Processor
7	Programming Logic Modeling And Application Of Object-Oriented Source Code
8	Research On The Technology Of Similarity Comparison Between Source Code And Binary Code Based On Intermediate Representation
9	Automatic Source Code Analysis And Conversion Of C Language IC ATE Application Programs
10	Programming language evolution and source code rejuvenation