diff --git a/15411/index.html b/15411/index.html new file mode 100644 index 0000000..45085bc --- /dev/null +++ b/15411/index.html @@ -0,0 +1,615 @@ + + +
+ + +Aaron Gutierrez & Shyam Raghavan
+15-411 Final Project
+Fall 2015
+As current 15-122 TAs, we felt as though an interesting and difficult project +would be to retarget the L4 compiler to a minimal adaptation of Clac. Through +the implementation of this retargeted compiler, we hoped to learn the following: +how to retarget a register-based language to a stack-based virtual machine +language; how to minimize the instruction set needed for a normally very large +register-based instruction set to a very small stack-based instruction set; and +how to develop conventions for a very small language that mimic behaviors of a +very large language.
+ +In order to learn more about Clac and the process of compiling to a +stack-based language, we researched information about Forth and the development +philosophy behind Forth. This helped us determine what modifications were +available to make Clac more targetable.
+ +We also researched methods for considering the optimizations of stack-based +virtual machines and for register allocation on the stack. While we decided that +we would not focus on optimizations and would instead ensure correctness, we did +implement a simple register allocator to better utilize the operand stack.
+ +Retargeting the register-based x86-64 assembly language into a stack-based +virtual machine language allowed us to delve into the specifics of the +implementation of compilers that target stack-based virtual machines (like the +Java Virtual Machine). One of the traditional advantages of a stack-based +virtual machine language is that the instruction set (and therefore the number +of bytes needed to represent each instruction) is very small - a disadvantage +(that we clearly rediscovered) is that because the instructions are less +specific, more instructions (and the total number of bytes needed to represent +them) are needed to achieve the same functionality. Finally, we also needed to +develop conventions on function calling, function flow, and memory - because we +were retargeting from x86-64 assembly, most of the conventions we developed were +inspired by it.
+ +The development of this retargeted compiler, while not immediately useful, +helped us learn skills necessary for engineering software for a rapidly-moving, +ever-changing target, required us to be very careful in the engineering and the +maintenance of conventions for a language and virtual machine, and forced us to +consider tradeoffs between performance and correctness with the short amount of +time and resources we had to complete the compiler.
+ +For this project, we chose to retarget C0 code to a minimal adaptation of the +interpreted language based off of Clac used in 15-122 Principles of Imperative +Computation called Clac.
+ +In order to accomplish this, we must minimally alter the Clac runtime to +allow for heap-like memory access and allocation. We must also alter the +compiler to produce a stack-based internal representation and a back-end +Clac-based representation. In sum, this means augmenting the internal +representation and the back-end of the compiler.
+ +We aimed to keep the clac language as close to the original as possible. The +majority of the work in this project is derived from the difficulty in adapting +an internal representation designed to target x86-64 assembly to output clac. +While we could have improved performance and optimizations, we chose to instead +focus on correctness and developed with the explicit belief that correctness was +paramount over performance given our time constraints and the resources +available.
+ +The Clac language needed to be extended in order to successfully compile all +(or at least a significant portion) of the features needed by the L4 language. +We started with a working implementation of the Claculator from the fall +2015 version of the 15-122 assignment, which has user-defined tokens. We +were able to implement almost all of the control flow needed using just +user-defined tokens, with one exception, noted below.
+ +Our first modification adds a mutable, global array of values to complement +the operand stack. To use the memory array, the following instructions were +added:
+ +n get will take the value stored at index n and place it at the top of the operand
+stack v n put will place the value v at index n
+in the memory array n alloc allocates a contiguous block of memory for
+n values and places the index of the
+first location on the top of the operand stackThese instructions were designed to be in the spirit of the original Clac
+language. The get and put
+tokens are similar in behavior to the ! and
+@ words, respectively, in the Forth language upon
+which Clac is based. The alloc token serves an
+analogous purpose to the same keyword in L4.
Additionally, the remaining comparison operators,
+<=, >=, =, != were added. These behave as
+expected. Bitwise operators were added to support bitwise arithmetic. Bitwise
+and (&), or (|), and
+exclusive or (^), along with arithmetic shifts
+(<< and >>),
+behave as they do in C0 and L4.
A couple other modifications were made to support programs that do not return
+a value. The abort token causes the Claculator to
+raise a SIGABRT signal, used for assertions.
+c0_mem_error raises a
+SIGUSR2 signal, used to indicate memory errors in the
+source program, rather than the Claculator.
With the above extensions, we are able to compile almost all programs in the
+L4 language, with one large exception. Clac’s only control flow tokens are
+if and skip. In order to
+handle programs that contain loops, the loop guard and loop body become separate
+functions. As a result, without further modification, we could not compile
+programs that return inside the body of a loop.
One of the features of the calculator that we aimed to preserve is the +simplicity of the interpreter loop. Though the introduction of user-defined +tokens and the call stack complicates things slightly, for the most part it +consists of a queue where each token is considered exactly once. If we were to +add backwards skips, the interpreter would become significantly more involved. +So in order to support returning from the body of a loop, we made hack that aims +to leave the interpreter loop simple.
+ +To overcome that shortcoming we introduced a special
+return token. For most functions,
+return causes the rest of the instruction queue to be
+skipped, and the stack is left unmodified. But for functions whose names start
+with ‘.’ return causes the current function
+and the calling function to return. If the calling function’s name also
+starts with ‘.’, it’s calling function also returns recursively. We are
+therefore allowed to use functions like basic blocks and keep the interpreter
+loop relatively simple.
In order to better match the behavior of a C0 program, the Claculator was +modified to not print a prompt or the operand stack when reading from a file, +and to return the top value on the stack, if it exists, when exiting.
+ +Because Clac programs are quite significantly different than x86-64 programs, +we had to add a completely new back end to our compiler. We are able to use the +same intermediate tree representation for both the x86-64 and the Clac back end. +From that point, we add two new steps to generate Clac code.
+ +Our first translation targets a high-level stack-based form that contains +variables, branches, and unconditional jumps. This serves mostly as a +translation from infix to reverse polish form. An example program in this form +can be seen in Figure 1.
+ +int foo(int a) {
+ int b = 2 * a;
+ if (a < 0) {
+ b = 0;
+ }
+
+ return b;
+}
+
+int main() {
+ return foo(5);
+}foo(a) {
+ %t0Q <-- arg
+ %t1Q <-- (2 * %t0Q)
+ if (((%t0Q < 0) == 1))
+ goto .3
+ goto .4
+ .3
+ %t1Q <-- 0
+ goto .5
+ .4
+ .5
+ return %t1Q
+}
+
+main() {
+ return foo(5)
+}foo:
+ storevar 0
+ push 2
+ loadvar 0
+ mul
+ storevar 1
+ loadvar 0
+ push 0
+ <
+ push 1
+ ==
+ if {
+ push 0
+ storevar 1
+ } else {
+
+ }
+ loadvar 1
+ return
+
+main:
+ push 5
+ foo
+ return
+ From this high-level language, then translate to Clac. Because Clac doesn’t +specify how the memory array should be used, we emulate the x86_64 stack +discipline. All function arguments are passed on the stack, and temporary +variables are assigned from the highest-index down. Our Clac implementation +creates a 50,000 element memory array, so we start from index 49,999. To +simulate stack frames, we use memory address 0 as a pseudo-stack pointer. For +example, the nth local variable for a +function is access by:
+ +0 get n + get
The value at address 0 is decremented the appropriate amount at the start of
+each function, and increment before every instance of
+return.
Conditional branches like
+ +… exp if stms else stms …
are expanded to
+ +… exp if .1 1 skip .2 …;
+: .1 stms ;
+: .2 stms ;
Loops such as
+ +… .1 exp if stms goto .1 else …
become
+ +… .1 …;
+: .1 exp if .2 1 skip .3 ;
+: .2 stms .1 ;
+: .3 ;
Translation of arithmetic and stack manipulation expressions are mostly +trivial. The final output of our sample program can be seen in Figure 2.
+ +: _c0_foo 0 get 3 - 0 put 0 get 0 + put 2 0 get 0 + get * 0 get 1 + put 0 get 0 + get 0 < 1 = if .6 1 skip .7 0 get 1 + get 0 get 3 + 0 put return ;
+: .6 0 0 get 1 + put ;
+: .7 ;
+: _c0_main 0 get 0 - 0 put 5 _c0_foo 0 get 0 + 0 put return ;
+
+49999 0 put _c0_main
+At the end of our Clac source file, we see our minimal runtime, which sets
+the stack pointer to the largest index, and calls the function
+_c0_main.
Memory operations were very straight forward to implement. Unlike when
+implementing L4 for x86-64, pointers are the same size as integers, because
+pointers are just indexes into the memory array. Dereferencing a pointer p is simply
+p get get, and computing array and struct offsets are
+greatly simplified by the fact that all small types have size 1.
In order to support array bound checking, the size of the array needs to be
+stored in memory. We employ the same strategy as before, storing the size at the
+index before the first array element. Bounds checks and null checks are
+implemented exactly as before, adding an if statement before all accesses and
+conditionally evaluating the new c0_mem_error token in
+Clac if the access is invalid. NULL is represented by
+the value 0.
In the most straightforward translation, the operand stack is rarely +employed. To increase utilization of the stack, we implemented Koopman’s +algorithm for register allocation on stack machines. The algorithm as +implemented is described below.
+ +Compute the set of temps that is used at some point after each line
+ within basic blocks. This analysis is greatly simplified as the only Clac
+ instruction that uses a value is get.
For each temp that is used within a block, when it is first written to, + additionally push a copy of the value onto the top of the operand + stack.
For subsequent reads, replace the get
+ instruction with a pick instruction that moves the
+ corresponding value to the top of the stack
For simple programs, our register allocator works correctly. However, our
+implementation is not correct for all inputs, and time did not permit more
+effort to correct our implementation. As such, register allocation is disabled
+by default, but can be enabled with the –clac-alloc
+flag.
Because of the nature of the stack, values are not always in the same +location relative to the top. As such, care must be taken to keep track of where +values are located.
+ +0 get 3 - 0 put
+15 0 get 0 + put
+411 0 get 1 + put
+0 get 0 + get
+1000
+*
+0 get 1 + get
++
+0 get 3 + 0 put
+return
+0 get 3 - 0 put
+15 1 pick 0 get 0 + put
+411 1 pick 0 get 1 + put
+2 pick
+1000
+*
+2 pick
++
+0 get 3 + 0 put
+return
+In Figure 3, we see a sample program with and without register allocation.
+Notice that we still place values in the memory array even though they are no
+longer used. An implementation that recognizes dead
+put instructions is outside the scope of our project,
+but would be interesting to study further.
To test our implementation, we reused most of the tests from Lab 5. Any test +that used library functions were deleted, as we did not port the full runtime to +Clac, but all other features of L4 were implemented.
+ +To run tests against our implementation, we modified the existing testing
+framework to pass the necessary flag for our compiler to target Clac. To run the
+code and produce output, it then calls our modified Claculator with the compiled
+Clac code. The Claculator is linked with a library that raises the correct
+signals to test for memory errors, SIGABRT is
+generated using the C0 assert function, and otherwise
+the value returned from the main method is return by the main method of the
+Claculator. Thus, we should see exactly the same effects from running our Clac
+code using the Claculator and the same code compiled natively for x86-64. As
+such, the testing framework took very little modification.
For speed, the Claculator is compiled with the unsafe C0 runtime using the
+-O2 flag for gcc. Even so, execution time is
+significantly higher than running the native executable. For instance, the gcd
+benchmark runs around 500 times slower than our native compiler output with all
+optimizations disabled. We could have chosen to increase the timeouts for
+testing, but chose not to so that the grading time was still sane.
We explicitly made the decision to not focus on optimizations or performance +- this decision was made with the reasoning that (a) the virtual machine +implementation of Clac has no way of being possibly as fast as the hardware +instruction set given by x86-64 assembly, precisely because the virtual machine +implementation of Clac is compiled into x86-64 assembly, (b) stack-based +machines are empirically slower than register-based machines because of the +larger number of instructions, and (c) with the limited amount of time and +resources we had available, we felt as though we needed to focus on either +correctness or performance, and chose correctness to be paramount.
+ +While we did make this decision, it is important to note that, as mentioned
+earlier, the gcd benchmark runs about 500 times slower
+on the Clac virtual machine than on the native compiler output with all
+optimizations disabled. In addition, running on a system with two quad-core
+Intel X5560 CPUs running at 2.8Ghz and the default timeouts in the autograder,
+208 tests timed out, as opposed to the regression tests, in which 72 tests timed
+out.
Again, because this was not the focus of this project, we chose to disregard
+performance issues and focus on improving correctness throughout our time
+working on the compiler. In the future, to improve the performance of our
+Clac-targeted compiler, we could improve the stack-frame addressing system -
+rather than executing 5 instructions (0 get i + get)
+to get the ith stack-frame variable,
+we could either alter the instruction set to include a stack pointer or improve
+the ability of the compiler to decide what needs to be stored on the operand
+stack and/or on the stack frame. We could also reduce the amount of dead code
+created by the compiler - while function arguments are evaluated from left to
+right in C0, the way we were able to implement this was by evaluating them from
+left to right, dropping them off the operand stack, and then evaluating them
+from right to left in preparation of calling the function (because the calling
+convention developed expected the arguments in left-to-right order). In order to
+reduce this dead code, we could either alter the calling convention to expect
+the arguments in right-to-left order (and then, instead of dropping arguments
+upon left-to-right evaluation, we leave it on the operand stack) or alter the
+specification that arguments are evaluated in left-to-right order.
Regardless of the improvements we can make to the Clac runtime or the +compiled code, we know that we will never be able to improve the performance of +the Clac virtual machine to be faster than that of x86-64 assembly. Because the +only test cases we fail are those that particularly stress the compiler or the +runtime, we feel as though the performance is robust enough for demonstration to +curious 15-122 students that wonder why they’re asked to add function +definitions to a language like Clac.
+ +On that point, with the single addition of the memory array, a large number
+of C0 programs can be successfully compiled to clac. Provided that all functions
+only return at the end, the addition of return to clac
+is avoided. With under 20 simple lines of additional code, a 15-122 student
+could have a way to run non-trivial programs in their own clac implementation
+for testing and exploration purposes.
In terms of qualitative analysis, it appears that Clac code is in fact +(although precedent tells us that this is not the case) more concise than x86-64 +assembly. This is probably because of the discrepancy between Clac being a +virtual machine language and x86-64 assembly being a hardware instruction +set.
+ +The hardest areas to implement involved the design and fleshing out of the
+conventions having to do with control flow and calling functions. The
+difficulties presented themselves because there were no existing conventions,
+and many possibilities either allowed for ambiguities or did not ascribe to C0
+conventions. For example, while some solutions for things like
+if-else control flow did ascribe to C0 semantics, they
+left room for ambiguities in the manner in which code that returned from a
+branch was supposed to execute. This particular problem was resolved with the
+introduction of a return token, but decisions like
+these forced us to carefully develop a set of guidelines and rules to follow
+when designing and writing the compiler. While making these convention decisions
+were not easy, the decisions that were made allowed the runtime for Clac to be
+(a) minimally altered and (b) very easy to invoke.
In conclusion, while the retargeting of a C0 compiler to Clac virtual machine +code is not the most practically useful, we’ve learned quite a few skills that +will have very practical applications both in academia and in the software +industry.
+ +By completing this compiler, we were able to develop our own calling +convention and control flow convention. We also developed a code generation step +from an intermediate tree representation to a stack-based intermediate +representation. All of these tasks required us to be very deliberate about +documentation, to consider tradeoffs between performance and correctness with a +limited amount of resources, and develop a piece of software for a +rapidly-changing target (like from x86-64 assembly to Clac virtual machine +code).
+ +Finally, it has been an amazing experience and a really fun one building the +largest single piece of code we’ve seen from start to finish. We’ve enjoyed the +class immensely, and we’d like to thank Rob and the course staff for making it +such an enjoyable experience.
+ + diff --git a/index.html b/index.html index efdd388..f6d5580 100644 --- a/index.html +++ b/index.html @@ -54,6 +54,8 @@