From d316b4a2eef9214e76367e9bf811f81b78baf2f1 Mon Sep 17 00:00:00 2001 From: Aaron Gutierrez Date: Wed, 13 May 2020 22:20:26 -0700 Subject: [PATCH] Add compiler L6 report --- 15411/index.html | 615 +++++++++++++++++++++++++++++++++++++++++++++++ index.html | 2 + pub.py | 16 +- 3 files changed, 632 insertions(+), 1 deletion(-) create mode 100644 15411/index.html diff --git a/15411/index.html b/15411/index.html new file mode 100644 index 0000000..45085bc --- /dev/null +++ b/15411/index.html @@ -0,0 +1,615 @@ + + + + + + C0 to Clac: Targeting a Stack-based Virtual Machine + + + + +
+

C0 to Clac: Targeting a Stack-based Virtual Machine

+

Aaron Gutierrez & Shyam Raghavan

+

15-411 Final Project

+

Fall 2015

+
+
+

Introduction

+

As current 15-122 TAs, we felt as though an interesting and difficult project +would be to retarget the L4 compiler to a minimal adaptation of Clac. Through +the implementation of this retargeted compiler, we hoped to learn the following: +how to retarget a register-based language to a stack-based virtual machine +language; how to minimize the instruction set needed for a normally very large +register-based instruction set to a very small stack-based instruction set; and +how to develop conventions for a very small language that mimic behaviors of a +very large language.

+ +

In order to learn more about Clac and the process of compiling to a +stack-based language, we researched information about Forth and the development +philosophy behind Forth. This helped us determine what modifications were +available to make Clac more targetable.

+ +

We also researched methods for considering the optimizations of stack-based +virtual machines and for register allocation on the stack. While we decided that +we would not focus on optimizations and would instead ensure correctness, we did +implement a simple register allocator to better utilize the operand stack.

+ +

Retargeting the register-based x86-64 assembly language into a stack-based +virtual machine language allowed us to delve into the specifics of the +implementation of compilers that target stack-based virtual machines (like the +Java Virtual Machine). One of the traditional advantages of a stack-based +virtual machine language is that the instruction set (and therefore the number +of bytes needed to represent each instruction) is very small - a disadvantage +(that we clearly rediscovered) is that because the instructions are less +specific, more instructions (and the total number of bytes needed to represent +them) are needed to achieve the same functionality. Finally, we also needed to +develop conventions on function calling, function flow, and memory - because we +were retargeting from x86-64 assembly, most of the conventions we developed were +inspired by it.

+ +

The development of this retargeted compiler, while not immediately useful, +helped us learn skills necessary for engineering software for a rapidly-moving, +ever-changing target, required us to be very careful in the engineering and the +maintenance of conventions for a language and virtual machine, and forced us to +consider tradeoffs between performance and correctness with the short amount of +time and resources we had to complete the compiler.

+ +

Project

+

For this project, we chose to retarget C0 code to a minimal adaptation of the +interpreted language based off of Clac used in 15-122 Principles of Imperative +Computation called Clac.

+ +

In order to accomplish this, we must minimally alter the Clac runtime to +allow for heap-like memory access and allocation. We must also alter the +compiler to produce a stack-based internal representation and a back-end +Clac-based representation. In sum, this means augmenting the internal +representation and the back-end of the compiler.

+ +

We aimed to keep the clac language as close to the original as possible. The +majority of the work in this project is derived from the difficulty in adapting +an internal representation designed to target x86-64 assembly to output clac. +While we could have improved performance and optimizations, we chose to instead +focus on correctness and developed with the explicit belief that correctness was +paramount over performance given our time constraints and the resources +available.

+ +

Implementation

+

Modifications to the Claculator

+

The Clac language needed to be extended in order to successfully compile all +(or at least a significant portion) of the features needed by the L4 language. +We started with a working implementation of the Claculator from the fall +2015 version of the 15-122 assignment, which has user-defined tokens. We +were able to implement almost all of the control flow needed using just +user-defined tokens, with one exception, noted below.

+ +

Our first modification adds a mutable, global array of values to complement +the operand stack. To use the memory array, the following instructions were +added:

+ +
+
get:
+
n get will take the value stored at index n and place it at the top of the operand +stack
+ +
put:
+
v n put will place the value v at index n +in the memory array
+ +
alloc:
+
n alloc allocates a contiguous block of memory for +n values and places the index of the +first location on the top of the operand stack
+
+ +

These instructions were designed to be in the spirit of the original Clac +language. The get and put +tokens are similar in behavior to the ! and +@ words, respectively, in the Forth language upon +which Clac is based. The alloc token serves an +analogous purpose to the same keyword in L4.

+ +

Additionally, the remaining comparison operators, +<=, >=, =, != were added. These behave as +expected. Bitwise operators were added to support bitwise arithmetic. Bitwise +and (&), or (|), and +exclusive or (^), along with arithmetic shifts +(<< and >>), +behave as they do in C0 and L4.

+ +

A couple other modifications were made to support programs that do not return +a value. The abort token causes the Claculator to +raise a SIGABRT signal, used for assertions. +c0_mem_error raises a +SIGUSR2 signal, used to indicate memory errors in the +source program, rather than the Claculator.

+ +

With the above extensions, we are able to compile almost all programs in the +L4 language, with one large exception. Clac’s only control flow tokens are +if and skip. In order to +handle programs that contain loops, the loop guard and loop body become separate +functions. As a result, without further modification, we could not compile +programs that return inside the body of a loop.

+ +

One of the features of the calculator that we aimed to preserve is the +simplicity of the interpreter loop. Though the introduction of user-defined +tokens and the call stack complicates things slightly, for the most part it +consists of a queue where each token is considered exactly once. If we were to +add backwards skips, the interpreter would become significantly more involved. +So in order to support returning from the body of a loop, we made hack that aims +to leave the interpreter loop simple.

+ +

To overcome that shortcoming we introduced a special +return token. For most functions, +return causes the rest of the instruction queue to be +skipped, and the stack is left unmodified. But for functions whose names start +with ‘.’ return causes the current function +and the calling function to return. If the calling function’s name also +starts with ‘.’, it’s calling function also returns recursively. We are +therefore allowed to use functions like basic blocks and keep the interpreter +loop relatively simple.

+ +

In order to better match the behavior of a C0 program, the Claculator was +modified to not print a prompt or the operand stack when reading from a file, +and to return the top value on the stack, if it exists, when exiting.

+ +

Modifications to Our Compiler

+

Because Clac programs are quite significantly different than x86-64 programs, +we had to add a completely new back end to our compiler. We are able to use the +same intermediate tree representation for both the x86-64 and the Clac back end. +From that point, we add two new steps to generate Clac code.

+ +

Our first translation targets a high-level stack-based form that contains +variables, branches, and unconditional jumps. This serves mostly as a +translation from infix to reverse polish form. An example program in this form +can be seen in Figure 1.

+ +
+
Figure 1: Example program after different stages
+ +
+
int foo(int a) {
+  int b = 2 * a;
+  if (a < 0) {
+    b = 0;
+  }
+
+  return b;
+}
+
+int main() {
+  return foo(5);
+}
+
(a) L4 Source
+
+ +
+
foo(a) {
+  %t0Q  <--  arg
+  %t1Q  <--  (2 * %t0Q)
+  if (((%t0Q < 0) == 1))
+  goto .3
+  goto .4
+  .3
+  %t1Q  <--  0
+  goto .5
+  .4
+  .5
+  return %t1Q
+}
+
+main() {
+  return foo(5)
+}
+
(b) Intermediate Tree
+
+ +
+
foo:
+  storevar 0
+  push 2
+  loadvar 0
+  mul
+  storevar 1
+  loadvar 0
+  push 0
+  <
+  push 1
+  ==
+  if {
+    push 0
+    storevar 1
+  } else {
+     
+  }
+  loadvar 1
+  return
+
+main:
+  push 5
+  foo
+  return
+
(c) High-level Stack Form
+
+
+ + +

From this high-level language, then translate to Clac. Because Clac doesn’t +specify how the memory array should be used, we emulate the x86_64 stack +discipline. All function arguments are passed on the stack, and temporary +variables are assigned from the highest-index down. Our Clac implementation +creates a 50,000 element memory array, so we start from index 49,999. To +simulate stack frames, we use memory address 0 as a pseudo-stack pointer. For +example, the nth local variable for a +function is access by:

+ +

0 get n + get

+ +

The value at address 0 is decremented the appropriate amount at the start of +each function, and increment before every instance of +return.

+ +

Conditional branches like

+ +

… exp if stms else stms …

+ +

are expanded to

+ +

… exp if .1 1 skip .2 …; +: .1 stms ; +: .2 stms ;

+ +

Loops such as

+ +

… .1 exp if stms goto .1 else …

+ +

become

+ +

… .1 …; +: .1 exp if .2 1 skip .3 ; +: .2 stms .1 ; +: .3 ;

+ +

Translation of arithmetic and stack manipulation expressions are mostly +trivial. The final output of our sample program can be seen in Figure 2.

+ +
+
: _c0_foo 0 get 3 - 0 put 0 get 0 + put 2 0 get 0 + get * 0 get 1 + put 0 get 0 + get 0 < 1 = if .6 1 skip .7 0 get 1 + get 0 get 3 + 0 put return ;
+: .6 0 0 get 1 + put ;
+: .7 ;
+: _c0_main 0 get 0 - 0 put 5 _c0_foo 0 get 0 + 0 put return ;
+
+49999 0 put _c0_main
+
Figure 2: Example program compiled to Clac
+
+ +

At the end of our Clac source file, we see our minimal runtime, which sets +the stack pointer to the largest index, and calls the function +_c0_main.

+ +

Memory operations were very straight forward to implement. Unlike when +implementing L4 for x86-64, pointers are the same size as integers, because +pointers are just indexes into the memory array. Dereferencing a pointer p is simply +p get get, and computing array and struct offsets are +greatly simplified by the fact that all small types have size 1.

+ +

In order to support array bound checking, the size of the array needs to be +stored in memory. We employ the same strategy as before, storing the size at the +index before the first array element. Bounds checks and null checks are +implemented exactly as before, adding an if statement before all accesses and +conditionally evaluating the new c0_mem_error token in +Clac if the access is invalid. NULL is represented by +the value 0.

+ +

Register Allocation

+

In the most straightforward translation, the operand stack is rarely +employed. To increase utilization of the stack, we implemented Koopman’s +algorithm for register allocation on stack machines. The algorithm as +implemented is described below.

+ +
+

Koopman’s Algorithm, as implemented

+
    +
  1. Compute the set of temps that is used at some point after each line + within basic blocks. This analysis is greatly simplified as the only Clac + instruction that uses a value is get.

  2. + +
  3. For each temp that is used within a block, when it is first written to, + additionally push a copy of the value onto the top of the operand + stack.

  4. + +
  5. For subsequent reads, replace the get + instruction with a pick instruction that moves the + corresponding value to the top of the stack

  6. +
+
+ +

For simple programs, our register allocator works correctly. However, our +implementation is not correct for all inputs, and time did not permit more +effort to correct our implementation. As such, register allocation is disabled +by default, but can be enabled with the –clac-alloc +flag.

+ +

Because of the nature of the stack, values are not always in the same +location relative to the top. As such, care must be taken to keep track of where +values are located.

+ +
+
Figure 3: Simple program with and without register + allocation
+
+
int main() {
+  int t0 = 15;
+  int t1 = 411;
+
+  return t0 * 1000 + t1;
+}
+
(a) L4 Source
+
+ +
+
0 get 3 - 0 put
+15 0 get 0 + put
+411 0 get 1 + put
+0 get 0 + get
+1000
+*
+0 get 1 + get
++
+0 get 3 + 0 put
+return
+
(b) Main method without register allocation
+
+ +
+
0 get 3 - 0 put
+15 1 pick 0 get 0 + put
+411 1 pick 0 get 1 + put
+2 pick
+1000
+*
+2 pick
++
+0 get 3 + 0 put
+return
+
(c) Main method with register allocation
+
+
+ +

In Figure 3, we see a sample program with and without register allocation. +Notice that we still place values in the memory array even though they are no +longer used. An implementation that recognizes dead +put instructions is outside the scope of our project, +but would be interesting to study further.

+ +

Testing Methodology

+

To test our implementation, we reused most of the tests from Lab 5. Any test +that used library functions were deleted, as we did not port the full runtime to +Clac, but all other features of L4 were implemented.

+ +

To run tests against our implementation, we modified the existing testing +framework to pass the necessary flag for our compiler to target Clac. To run the +code and produce output, it then calls our modified Claculator with the compiled +Clac code. The Claculator is linked with a library that raises the correct +signals to test for memory errors, SIGABRT is +generated using the C0 assert function, and otherwise +the value returned from the main method is return by the main method of the +Claculator. Thus, we should see exactly the same effects from running our Clac +code using the Claculator and the same code compiled natively for x86-64. As +such, the testing framework took very little modification.

+ +

For speed, the Claculator is compiled with the unsafe C0 runtime using the +-O2 flag for gcc. Even so, execution time is +significantly higher than running the native executable. For instance, the gcd +benchmark runs around 500 times slower than our native compiler output with all +optimizations disabled. We could have chosen to increase the timeouts for +testing, but chose not to so that the grading time was still sane.

+ +

Analysis

+ +

Quantitative Analysis

+

We explicitly made the decision to not focus on optimizations or performance +- this decision was made with the reasoning that (a) the virtual machine +implementation of Clac has no way of being possibly as fast as the hardware +instruction set given by x86-64 assembly, precisely because the virtual machine +implementation of Clac is compiled into x86-64 assembly, (b) stack-based +machines are empirically slower than register-based machines because of the +larger number of instructions, and (c) with the limited amount of time and +resources we had available, we felt as though we needed to focus on either +correctness or performance, and chose correctness to be paramount.

+ +

While we did make this decision, it is important to note that, as mentioned +earlier, the gcd benchmark runs about 500 times slower +on the Clac virtual machine than on the native compiler output with all +optimizations disabled. In addition, running on a system with two quad-core +Intel X5560 CPUs running at 2.8Ghz and the default timeouts in the autograder, +208 tests timed out, as opposed to the regression tests, in which 72 tests timed +out.

+ +

Again, because this was not the focus of this project, we chose to disregard +performance issues and focus on improving correctness throughout our time +working on the compiler. In the future, to improve the performance of our +Clac-targeted compiler, we could improve the stack-frame addressing system - +rather than executing 5 instructions (0 get i + get) +to get the ith stack-frame variable, +we could either alter the instruction set to include a stack pointer or improve +the ability of the compiler to decide what needs to be stored on the operand +stack and/or on the stack frame. We could also reduce the amount of dead code +created by the compiler - while function arguments are evaluated from left to +right in C0, the way we were able to implement this was by evaluating them from +left to right, dropping them off the operand stack, and then evaluating them +from right to left in preparation of calling the function (because the calling +convention developed expected the arguments in left-to-right order). In order to +reduce this dead code, we could either alter the calling convention to expect +the arguments in right-to-left order (and then, instead of dropping arguments +upon left-to-right evaluation, we leave it on the operand stack) or alter the +specification that arguments are evaluated in left-to-right order.

+ +

Regardless of the improvements we can make to the Clac runtime or the +compiled code, we know that we will never be able to improve the performance of +the Clac virtual machine to be faster than that of x86-64 assembly. Because the +only test cases we fail are those that particularly stress the compiler or the +runtime, we feel as though the performance is robust enough for demonstration to +curious 15-122 students that wonder why they’re asked to add function +definitions to a language like Clac.

+ +

On that point, with the single addition of the memory array, a large number +of C0 programs can be successfully compiled to clac. Provided that all functions +only return at the end, the addition of return to clac +is avoided. With under 20 simple lines of additional code, a 15-122 student +could have a way to run non-trivial programs in their own clac implementation +for testing and exploration purposes.

+ +

Qualitative Analysis

+

In terms of qualitative analysis, it appears that Clac code is in fact +(although precedent tells us that this is not the case) more concise than x86-64 +assembly. This is probably because of the discrepancy between Clac being a +virtual machine language and x86-64 assembly being a hardware instruction +set.

+ +

The hardest areas to implement involved the design and fleshing out of the +conventions having to do with control flow and calling functions. The +difficulties presented themselves because there were no existing conventions, +and many possibilities either allowed for ambiguities or did not ascribe to C0 +conventions. For example, while some solutions for things like +if-else control flow did ascribe to C0 semantics, they +left room for ambiguities in the manner in which code that returned from a +branch was supposed to execute. This particular problem was resolved with the +introduction of a return token, but decisions like +these forced us to carefully develop a set of guidelines and rules to follow +when designing and writing the compiler. While making these convention decisions +were not easy, the decisions that were made allowed the runtime for Clac to be +(a) minimally altered and (b) very easy to invoke.

+ +

Conclusion

+

In conclusion, while the retargeting of a C0 compiler to Clac virtual machine +code is not the most practically useful, we’ve learned quite a few skills that +will have very practical applications both in academia and in the software +industry.

+ +

By completing this compiler, we were able to develop our own calling +convention and control flow convention. We also developed a code generation step +from an intermediate tree representation to a stack-based intermediate +representation. All of these tasks required us to be very deliberate about +documentation, to consider tradeoffs between performance and correctness with a +limited amount of resources, and develop a piece of software for a +rapidly-changing target (like from x86-64 assembly to Clac virtual machine +code).

+ +

Finally, it has been an amazing experience and a really fun one building the +largest single piece of code we’ve seen from start to finish. We’ve enjoyed the +class immensely, and we’d like to thank Rob and the course staff for making it +such an enjoyable experience.

+ + diff --git a/index.html b/index.html index efdd388..f6d5580 100644 --- a/index.html +++ b/index.html @@ -54,6 +54,8 @@
  • I extended the Intel SPMD Program Compiler (ISPC) to support polymorphic datatypes in my 15-418 Final Project
  • +
  • I targeted a stack-based language in our + 15-411 Final Project
  • I spent several semesters as a teaching assistant for 15-122, 15-292, and diff --git a/pub.py b/pub.py index fd71d82..56b9c60 100755 --- a/pub.py +++ b/pub.py @@ -134,6 +134,9 @@ def upload_img(): for f in files: upload_file('img/{}'.format(f)) +def upload_15411(): + upload_file('15411/index.html') + def upload_15418(): upload_file('15418/index.html') upload_file('15418/style.css') @@ -148,7 +151,16 @@ if __name__ == '__main__': parser = argparse.ArgumentParser(description='Publish www.aarongutierrez.com') pub_help = 'What to publish' - pub_choices = ['all', 'root', 'img', 'bench-local', 'bench', '15418', 'campaign'] + pub_choices = [ + 'all', + 'root', + 'img', + 'bench-local', + 'bench', + '15411', + '15418', + 'campaign', + ] parser.add_argument('pub', choices=pub_choices, help=pub_help) args = parser.parse_args() @@ -161,6 +173,8 @@ if __name__ == '__main__': make_bench() if args.pub == 'bench' or args.pub == 'all': upload_bench() + if args.pub == '15411' or args.pub == 'all': + upload_15411() if args.pub == '15418' or args.pub == 'all': upload_15418() if args.pub == 'campaign' or args.pub == 'all':