Download Formalizing the SSA-based Compiler for Verified Advanced Program Transformations PDF

TitleFormalizing the SSA-based Compiler for Verified Advanced Program Transformations
Author
LanguageEnglish
File Size1.4 MB
Total Pages140
Table of Contents
                            University of Pennsylvania
ScholarlyCommons
	1-1-2013
Formalizing the SSA-based Compiler for Verified Advanced Program Transformations
	Jianzhou Zhao
		Recommended Citation
	Formalizing the SSA-based Compiler for Verified Advanced Program Transformations
		Abstract
		Degree Type
		Degree Name
		Graduate Group
		First Advisor
		Subject Categories
Abstract
Contents
List of Tables
List of Figures
Introduction
Background
	Program Refinement
	Static Single Assignment
	LLVM
	The Simple SSA Language—Vminus
Mechanized Verification of Computing Dominators
	The Specification of Computing Dominators
		Dominance
		Specification
		Instantiations
	The Allen-Cocke Algorithm
		DFS: PO-numbering
		Kildall's algorithm
		The AC algorithm
	Extension: the Cooper-Harvey-Kennedy Algorithm
		Correctness
	Constructing Dominator Trees
	Dominance Frontier
	Performance Evaluation
The Semantics of Vminus
	Dynamic Semantics
	Dominance Analysis
	Static Semantics
Proof Techniques for SSA
	Safety of Vminus
	Generalizing Safety to Other SSA Invariants
	The Correctness of SSA-based Transformations
The formalism of the LLVM IR
	The Syntax
	The Static Semantics
	A Memory Model for the LLVM IR
		Rationale
		LLVM memory commands
		The byte-oriented representation
		The LLVM flattened values and memory accesses
	Operational Semantics
		Nondeterminism in the LLVM operational semantics
		Nondeterministic operational semantics of the SSA form
		Partiality, preservation, and progress
		Deterministic refinements
	Extracting an Interpreter
Verified SoftBound
	Formalizing SoftBound for the LLVM IR
	Extracted Verified Implementation of SoftBound
Verified SSA Construction for LLVM
	The mem2reg Optimization Pass
	The vmem2reg Algorithm
	Correctness of vmem2reg
		Preserving promotability
		Preserving well-formedness
		Program refinement
		The correctness of vmem2reg
	Extraction and Performance Evaluation
	Optimized vmem2reg
		O1 Level—Pipeline fusion
		The Correctness of vmem2reg-O1
		O2 Level—Minimal -nodes Placement
		The Correctness of vmem2reg-O2
The Coq Development
	Definitions
	Proofs
	OCaml Bindings and Coq Extraction
Related Work
Conclusions and Future Work
Bibliography
                        
Document Text Contents
Page 1

University of Pennsylvania
ScholarlyCommons

Publicly Accessible Penn Dissertations

1-1-2013

Formalizing the SSA-based Compiler for Verified
Advanced Program Transformations
Jianzhou Zhao
University of Pennsylvania, [email protected]

Follow this and additional works at: http://repository.upenn.edu/edissertations

Part of the Computer Sciences Commons

This paper is posted at ScholarlyCommons. http://repository.upenn.edu/edissertations/825
For more information, please contact [email protected]

Recommended Citation
Zhao, Jianzhou, "Formalizing the SSA-based Compiler for Verified Advanced Program Transformations" (2013). Publicly Accessible
Penn Dissertations. 825.
http://repository.upenn.edu/edissertations/825

http://repository.upenn.edu?utm_source=repository.upenn.edu%2Fedissertations%2F825&utm_medium=PDF&utm_campaign=PDFCoverPages
http://repository.upenn.edu/edissertations?utm_source=repository.upenn.edu%2Fedissertations%2F825&utm_medium=PDF&utm_campaign=PDFCoverPages
http://repository.upenn.edu/edissertations?utm_source=repository.upenn.edu%2Fedissertations%2F825&utm_medium=PDF&utm_campaign=PDFCoverPages
http://network.bepress.com/hgg/discipline/142?utm_source=repository.upenn.edu%2Fedissertations%2F825&utm_medium=PDF&utm_campaign=PDFCoverPages
http://repository.upenn.edu/edissertations/825?utm_source=repository.upenn.edu%2Fedissertations%2F825&utm_medium=PDF&utm_campaign=PDFCoverPages
http://repository.upenn.edu/edissertations/825
mailto:[email protected]

Page 2

��������������������
������������������
��� �������������������
��������������

Abstract
Compilers are not always correct due to the complexity of language semantics and transformation algorithms,
the trade-offs between compilation speed and verifiability,etc.The bugs of compilers can undermine the
source-level verification efforts (such as type systems, static analysis, and formal proofs) and produce target
programs with different meaning from source programs. Researchers have used mechanized proof tools to
implement verified compilers that are guaranteed to preserve program semantics and proved to be more
robust than ad-hoc non-verified compilers.

The goal of the dissertation is to make a step towards verifying an industrial strength modern compiler--
LLVM, which has a typed, SSA-based, and general-purpose intermediate representation, therefore allowing
more advanced program transformations than existing approaches. The dissertation formally defines the
sequential semantics of the LLVM intermediate representation with its type system, SSA properties, memory
model, and operational semantics. To design and reason about program transformations in the LLVM IR, we
provide tools for interacting with the LLVM infrastructure and metatheory for SSA properties, memory
safety, dynamic semantics, and control-flow-graphs. Based on the tools and metatheory, the dissertation
implements verified and extractable applications for LLVM that include an interpreter for the LLVM IR, a
transformation for enforcing memory safety, translation validators for local optimizations, and verified SSA
construction transformation.

This dissertation shows that formal models of SSA-based compiler intermediate representations can be used
to verify low-level program transformations, thereby enabling the construction of high-assurance compiler
passes.

Degree Type
Dissertation

Degree Name
Doctor of Philosophy (PhD)

Graduate Group
Computer and Information Science

First Advisor
Steve Zdancewic

Subject Categories
Computer Sciences

This dissertation is available at ScholarlyCommons: http://repository.upenn.edu/edissertations/825

http://repository.upenn.edu/edissertations/825?utm_source=repository.upenn.edu%2Fedissertations%2F825&utm_medium=PDF&utm_campaign=PDFCoverPages

Page 70

Some of this partiality is related to well-formedness of the SSA program. For example,

evalND(g,∆,%x) is undefined if %x is not bound in ∆. These kinds of errors are ruled out by the

static well-formedness constraints imposed by the LLVM IR (Section 6.2).

In other cases, we have chosen to use partiality in the operational semantics to model certain

failure modes for which the LLVM specification says that the behavior of the program is undefined.

These include: (1) attempting to free memory via a pointer not returned from malloc or that has

already been deallocated, (2) allocating a negative amount of memory, (3) calling load or store on

a pointer with bad alignment or a deallocated address, (4) trying to call a non-function pointer, or

(5) trying to execute the unreachable command. We model these events by stuck states because

they correspond to fatal errors that will occur in any reasonable realization of the LLVM IR by

translation to a target platform. Each of these errors is precisely characterized by a predicate over

the machine state (e.g., BadFree(config,S)), and the “allowed” stuck states are defined to be the

disjunction of these predicates:

Stuck(config,S) = BadFree(config,S)

_ BadLoad(config,S)

_ . . .

_ Unreachable(config,S)

To see that the well-formedness properties of the static semantics rule out all but these known

error configurations, we prove the usual preservation and progress theorems for the LLVMND se-

mantics.

Theorem 18 (Preservation for LLVMND). If (config, S) is well formed and config ‘ S� S0, then

(config, S0) is well formed.

Here, well-formedness includes the static scoping, typing properties, and SSA invariants from

Section 6.2 for the LLVM code, but also requires that the local mappings ∆ present in all frames of

the call stack must be inhabited—each binding contains at least one value v—and that each defined

variable that dominates the current continuation is in ∆’s domain.

That defined variables dominate their uses in the current continuation follows Lemma 11 with

considering the context of the full LLVM IR. To show that the ∆ bindings are inhabited after the step,

we prove that (1) non-undef values V are singletons; (2) undefined values from constants typundef

59

Page 71

contain all possible values of first class types typ; (3) undefined values from loading uninitialized

memory or incompatible physical data contain at least paddings indicating errors; (4) evaluation of

non-deterministic values by evalbopND returns non-empty sets of values given non-empty inputs.

Theorem 19 (Progress for LLVMND). If the pair (config, S) is well formed, then either S has

terminated successfully or Stuck(config,S) or there exists S’ such that config ‘ S� S0.

This theorem holds because in a well-formed machine state, evalND always returns a non-empty

value set V; moreover jump targets and internal functions are always present.

6.4.4 Deterministic refinements

Although the LLVMND semantics is useful for reasoning about the validity of LLVM program

transformations, Vellvm provides a LLVMD, a deterministic, small-step refinement, along with two

large-step operational semantics LLVM�DFn and LLVM

DB.

These different deterministic semantics are useful for several reasons: (1) they provide the basis

for testing LLVM programs with a concrete implementation of memory (see the discussion about

Vellvm’s extracted interpreter in the next Section), (2) proving that LLVMD is an instance of the

LLVMND and relating the small-step rules to the large-step ones provides validation of all of the

semantics (i.e., we found bugs in Vellvm by formalizing multiple semantics and trying to prove

that they are related), and (3) the small- and large-step semantics have different applications when

reasoning about LLVM program transformations.

Unlike LLVMND, the frames for these semantics map identifiers to single values, not sets,

and the operational rules call deterministic variants of the nondeterministic counterparts (e.g., eval

instead of evalND). To resolve the nondeterminism from undef and faulty memory operations, these

semantics fix a concrete interpretation as follows:

� undef is treated as a zeroinitializer

� Reading uninitialized memory returns zeroinitializer

These choices yield unrealistic behaviors compared to what one might expect from running

a LLVM program against a C-style runtime system, but the cases where this semantics differs

correspond to unsafe programs. There are still many programs, namely those compiled to LLVM

60

Page 139

[62] V. C. Sreedhar and G. R. Gao. A Linear Time Algorithm for Placing φ-nodes. In POPL ’95:

Proceedings of the 22nd ACM SIGACT-SIGPLAN Symposium on Principles of Programming

Languages, 1995.

[63] M. Stepp, R. Tate, and S. Lerner. Equality-Based Translation Validator for LLVM. In CAV

’11: Proceedings of the 23rd International Conference on Computer Aided Verification, 2011.

[64] Z. T. Sudipta Kundu and S. Lerner. Proving Optimizations Correct Using Parameterized

Program Equivalence. In PLDI ’09: Proceedings of the ACM SIGPLAN 2009 Conference

on Programming Language Design and Implementation, 2009.

[65] D. Syme. Reasoning with the Formal Definition of Standard ML in HOL. In Sixth Interna-

tional Workshop on Higher Order Logic Theorem Proving and its Applications, 1993.

[66] Z. Tatlock and S. Lerner. Bringing Extensibility to Verified Compilers. In PLDI ’10: Pro-

ceedings of the ACM SIGPLAN 2010 Conference on Programming Language Design and

Implementation, 2010.

[67] J.-B. Tristan, P. Govereau, and G. Morrisett. Evaluating Value-graph Translation Validation for

LLVM. In PLDI ’11: Proceedings of the ACM SIGPLAN 2011 Conference on Programming

Language Design and Implementation, 2011.

[68] J.-B. Tristan and X. Leroy. Formal Verification of Translation Validators: a Case Study on

Instruction Scheduling Optimizations. In POPL ’08: Proceedings of the 35th Annual ACM

SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2008.

[69] J.-B. Tristan and X. Leroy. Verified Validation of Lazy Code Motion. In PLDI ’09: Pro-

ceedings of the ACM SIGPLAN 2009 Conference on Programming Language Design and

Implementation, 2009.

[70] J. B. Tristan and X. Leroy. A Simple, Verified Validator for Software Pipelining. In POPL

’10: Proceedings of the 37th Annual ACM SIGPLAN-SIGACT Symposium on Principles of

Programming Languages, 2010.

[71] M. J. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley Longman

Publishing Co., Inc., Boston, MA, USA, 1995.

128

Page 140

[72] B. Yakobowski. Étude sémantique d’un langage intermédiaire de type Static Single Assign-

ment. Rapport de dea (Master’s thesis), ENS Cachan and INRIA Rocquencourt, Sept. 2004.

[73] X. Yang, Y. Chen, E. Eide, and J. Regehr. Finding and understanding bugs in C compilers. In

PLDI ’11: Proceedings of the ACM SIGPLAN 2011 Conference on Programming Language

Design and Implementation, 2011.

[74] A. Zaks and A. Pnueli. Program Analysis for Compiler Validation. In PASTE ’08: Proceedings

of the 8th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and

Engineering, 2008.

[75] J. Zhao, S. Nagarakatte, M. M. K. Martin, and S. Zdancewic. Formalizing the LLVM Interme-

diate Representation for Verified Program Transformations. In POPL ’12: Proceedings of the

38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages,

2012.

[76] J. Zhao, S. Nagarakatte, M. M. K. Martin, and S. Zdancewic. Formal verification of SSA-based

optimizations for LLVM. In PLDI ’13: Proceedings of the ACM SIGPLAN 2012 Conference

on Programming Language Design and Implementation, 2013.

[77] J. Zhao and S. Zdancewic. Mechanized Verification of Computing Dominators for Formalizing

Compilers. In CPP ’12: The Second International Conference on Certified Programs and

Proofs, 2012.

[78] L. Zhao, G. Li, B. De Sutter, and J. Regehr. ARMor: Fully Verified Software Fault Isolation. In

EMSOFT ’11: Proceedings of the 9th ACM International Conference on Embedded Software,

2011.

129

Similer Documents