CS143
14-01 Intermediate language¶
What is Intermediate language?¶
The intermediate language is a language between the source and the target.
Why bother to introduce Intermediate language?¶
Because it provides an intermediate level of abstraction, it has more details than the source. For example, we want to optimize register usage, most source languages have no notion of the register at the source level so there is no way to even express the kinds of optimization you might want to do with registers. It will also have fewer details than the target. For example, the intermediate language is a little bit above the level of the particular instruction set of a particular machine, and therefore, it is easier to retarget that intermediate level of code to lots of different kinds of machines because it doesn't have all the grubby details of a particular machine.
14-02 Optimization Overview¶
When should we perform optimizations?¶
In fact, we can perform them on the AST, a big advantage of that is that it's machine-independent, but it turns out AST is too high for a lot of optimizations we want to do because optimizations depend on lower-level details of the machine. Another possibility would be to perform optimizations directly on the assembly language but they are machine-dependent and we would have to reimplement optimizations for each kind of architecture. So intermediate language is an ideal choice.
What is the purpose of optimization?¶
The purpose of optimization is to improve a program's resource utilization such as execution time, code size, network messages sent, and so on. For languages like C and Cool, there are three granularities of optimizations. One is called Local optimizations which occur within a single block. Then there're what are called Global optimizations. This is really misnamed because it is not global across the entire program, it's global across an entire function. So global optimizations would apply to a single function and apply to all the basic blocks of that function. And finally, there are inter-procedural optimizations. These are optimizations that work across method boundaries. They take multiple functions and move things around to try to optimize the collection of functions as a whole.
In practice, often a conscious decision is made not to implement the fanciest optimization known, why?¶
First, some optimizations are hard to implement for SE. Second, some optimizations are costly in compilation time. Even though the compiling happens offline not part of the running of the program, the programmer still has to wait when the optimizing compiler does its optimizations. Third, some of these optimizations have a low payoff. They might only do it by a small amount. Last, unfortunately, many fancy optimizations are all there!
So the goal of optimization is to maximum benefit for minimum cost.
14-03 Local Optimizations¶
Algebraic Simplification¶
Some statements can be deleted.¶
x := x + 0;
x := x * 1;
Some statements can be simplified.¶
x := x * 0; => x := 0;
y := y ** 2; => y := y * y;
z := z * 8; => z := z << 3;
p := p * 15; => t := p << 4; p : = t - p;
For the operator **
, it's probably that's going to wind up in our generated code is a call to some built-in math library which will introduce the function call overhead and some kind of general loop in there. So in a special case where we know the exponent is 2, it's much more efficient to just replace that call to exponentiate by an explicit multiply.
Another example is z := z * 8;
. If we have a multiplication by a power of 2
, we can replace that with a left bit shift. In fact, it doesn't have to be a power of two if we have a multiplication by some other number, that can be replaced by some combination of shifting and subtractions. I want to point out that these transformations will not result in any kind of speedup because, on modern machines, multiplication is just as fast as any other single instruction.
constant folding¶
Operations on constants can be computed at compile time.
x := 2 + 2; => x := 4;
if 2 > 0 jump L => jump L;
There is one situation that you should be aware of in which constant folding can be very dangerous. It's something that really illustrates some of the subtleties of program optimization and programming language semantics. Let's consider the scenario where we have two machines, X, Y. Now the compiler is running on machine X, and the compiler is producing code for machine Y, and the code will run on it. This is called cross-compiler(Just considering the embedded system code). The problem comes, if X and Y are different architectures. So let's say we have the instruction a := 1.5 + 3.7;
, and you would like to constant fold that down to equal 5.2
. Now the problem is that if you simply execute this as a floating-point operation on X, the roundoff and the pointing number semantics may be slightly different from Y. So on Y you may get something like a:= 5.19;
. There might be a small difference in the floating-point result.
Eliminate unreachable basic blocks¶
An unreachable basic block is one that is not the target of any jump or falls through. It will make the code smaller and run faster because of the cache effects to increase the spatial locality.
Why would unreachable basic blocks occur?¶
There are several ways in which unreachable code can arise. The most common cause is that the code is actually parameterized with code that is only compiled and used in certain situations. In c, it would be sort of typical to see some code that looks like this:
#define DEBUG 0
if(DEBUG){ ...}
Another case where unreachable code comes up is with libraries. So very frequently, programs are written to use generic libraries that the program might only use a very small part of the interface. But the library might supply hundreds of methods to cover all the situations. But for your program, you might only be using three of those methods and the rest of the methods could potentially be removed from the final binary to make the code smaller.
And finally, another way that unreachable basic blocks occur is as the result of other optimizations. So as we can see, optimizations frequently lead to more optimizations. And it could just be through other rearrangements of the compiler to make some redundant BBs removed.
Some optimizations are simplified if each register occurs only once on the left-hand side of an assignment.¶
If each register is assigned at most once then some of these optimizations are easier to talk about.
Common subexpression elimination¶
In SSA and x :=
is the first use of x
in a block. Then when two assignments have the same rhs, they compute the same value.
x := y + z;
...
w := y + z;
x := y + z;
...
w := x;
copy propogation¶
If w := x
appears in a block, replace subsequent uses of w
with the use of x
.
b := z + y;
a := b;
x := 2 * a;
can be replaced by:
b := z + y;
a := b;
x := 2 * b;
This optimization can only be useful in conjunction with some of other optimizations such as constant folding and dead code elimination.
Sumarry¶
Each of these optimizations presented actually doesn't make the program run faster at all. They don't make program run slower either, but by themselves, they don't actually make any improvement to the program. But typically, the optimizations will interact , so performing one optimization will enable another. So the way to think about an optimizing compiler is that it has a big bag of tricks(program transformations), when faced with a program to optimize, it is going to rummage around in its bag looking for an optimization that applies to some part of the code. If it finds one, it will do the optimization. And it will repeat it and look back and see if there is another optimization that applies. Then it will just keep doing this until it reaches a point where none of the optimizations it knows can be applied.
14-04 Peephole Optimizations¶
Let's see a variation on local optimization that applies directly to assembly code called peephole optimization. Peephole optimization in one such technique that peephole stands for a short (usually continuous) sequence of instructions, what optimizer will do is to replace the sequence with another equivalent one(but faster).
One example is:
move $a $b
move $b $a
can be replaced by the following code if move $b $a
is not the target of a jump.
move $a $b
Another example is
addiu $a $a i;
addiu $a $a j;
addiu $a $a i+j;
Many basic block optimizations can be cast also as peephole optimizations.:
addiu $a $b 0
->move $a $b
move $a $a
->
Peephole optimizations must be applied repeatedly for maximum effect. In fact, program optimization is grossly misnamed, I think program improvement is a more appropriate term because compilers will just improve the program as much as they can, and there is no guarantee that the best code will be produced.
16-01 Register Allocation¶
Overview¶
Intermediate code can use an unlimited number of temporaries, this simplifies optimization because they don't have to worry about preserving the right number of registers in the code but it does complicate the final translation into assembly code because we might be using too many temporaries and this is actually a problem in practice. So it's common for intermediate code to use more temporaries than physical registers on the machine. Then the problem is to rewrite the intermediate code to use no more temporaries than physical registers. And we are going to assign multiple temporaries to each register. So we are going to have a many-one mapping from temporaries to registers. How can we actually make a single register hold multiple values? Well, the trick is that it's fine for registers to have local values as long as it only has one value at a time.
Let's consider this program:
a := c + d;
e := a + b;
f := e - 1;
Here, I'm assuming that a and e are not used anywhere else and so it turns out that a, e, and f could all actually live in the same register. Alright, that's assuming that a and e are dead after their uses. And what will that look like, well let's allocate them all to a particular register r1
and let's assign c
, d
, and b
into their own individual registers and the code would like this:
r1 := r2 + r3;
r1 := r1 + r4;
r1 := r1 - 1;
And so now notice this is just a translation of the code over here into registers but there is a many one mapping of names above to register names below.
History¶
A register allocation is an old problem. In fact, it was first recognized way back in the 1950s in the original Fortran project but originally, register allocation was done with a fairly crude algorithm, and someone rapidly or very quickly noticed that was actually a bottleneck in the quality of code generation that actually limitations on the ability of register allocation have a really significant effect on the overall quality of the code that compilers could produce. And then about 30 years later, in 1980, a breakthrough occurred where a group of researchers at IBM discovered a register allocation scheme based on graph coloring. And the great thing about this scheme is that it's pretty simple. It's easy to explain. It's global, meaning it takes advantage of information from the entire control flow graph at the same time and also happens to work well in practice.
register interface graph(RIG)¶
And here's the basic principle that underlies the modern register allocation algorithms. So, if I have two temporaries t1
and t2
, I want to know when they can share register. So, they're allowed to share a register if they are not live at the same time.
Let's take a look at a control flow graph and now, we know that in order to do the register allocation to solve the register allocation at least in this in this way, we're going to need liveness information. Well, we're going to construct an undirected graph and in this graph, there will be a node for each temporary so each variable will have a node in the graph and there'll be an edge between two temporaries if they are live simultaneously at some point in the program. This is the graph (RIG) constructed from the code and the line analysis above, it's easy to read off from the graph what the constraints are.
So, for example, b and c cannot be in the same register because b and c are connected by an edge, which means they're live simultaneously at some part, some point in the program and so they have to live in different registers. On the other hand, there is at, there is no edge between b and d. So, this edge is missing, and therefore, it's possible that b and d could be allocated in the same register.
So a great thing about the register interference graph (RIG) is that it extracts exactly the information needed to characterize a legal register assignment. So, it gives us a representation of all the possible legal register assignments. Now, we haven't actually gotten a register assignment out of the register interference graph, but the first step is to characterize the problem in some kind of precise way. The other thing that is good about is a global view of the register requirements meaning it's over the entire control flow graphs. So, takes into account information from every part of the control flow graph which will help us to make good global decisions about what value is very important to live in registers. And finally, the other thing to notice is that, after reconstruction, the register allocation for the algorithm is architecture-independent.
16-02 Graph Coloring¶
A graph coloring is an assignment of colors to nodes such that the nodes connected by an edge have different colors. And then the graph is k-colorable if it has a coloring that uses k or fewer colors. In our problem, the colors correspond to registers so we want to do is to assign colors or registers to the graph nodes. And we're going to let k, the number, the maximum number of colors we're allowed to use be the number of machine registers.
Algorithm¶
- Step1
- Pick a node
t
with fewer than k neighbors in RIG - Eliminate t and its edges from RIG
- If the resulting graph is k-colorable, then so is the original graph
- Step2: Assign colors to nodes on the stack
- Start with the last node added
- At each step pick a color different from those assigned to already colored neighbors
16-03 Spilling¶
In this lecture, we are going to continue our discussion of register allocation, and this time we are going to talk about what happens when we can't successfully color the graph, in which case we have to do something known as spilling. The graph coloring algorithm that we discussed in the previous last doesn't always succeed in coloring an arbitrary graph. And it may well get stuck and not be able to find a coloring. And so in that case the only conclusion we can reach is that we can't hold all the values to register. And those temporary values have to live somewhere so where should they live? Well, they're going to have to live in memory. That's the only other kind of story that we have. And so we're going to pick some values and spill them into memory. The ideas that we have, the
picture in your mind should be a bucket and it can hold a fixed amount of stuff, those are the registers and when it gets too full, some of the stuff spills over, and ends up someplace else. Now, when does the graph coloring do get stuck? Well, the only situation in which we won't be able to make progress is all the nodes have k or more neighbors. Given that the machine we want to use only has three registers and so we, instead of finding a 4-coloring of this graph, we need to find a 3-coloring. So let's think about how to find the three coloring of this graph. If we apply the heuristic, we'll remove A from the graph but then we're going to get stuck. Because once you take A
out of the graph and its edge is out and every node left has three or more neighbors as at least three neighbors. So, there's no node that we can delete from the graph and be guaranteed to be able to find the coloring for it with the heuristic that we discussed in the previous lecture. So, in this situation, what we're going to do is we're going to pick and know that there is a candidate for spilling. This is a node that we think we may have to assign into a memory location rather than to our register.
Let's assume for the sake of this example that we pick f
and we're going to spill f
. Now we have to try to assign a color to f
and it could be, we could get lucky and discover that even though f
had more than there neighbors or three or more neighbors when we remove it from the graph, it could be that when we go to construct the coloring for the subgraph that those neighbors actually don't use all of the registers. And so, this is called optimistic coloring. So we pick a candidate for spilling. We tried to color the subgraph. Once we have a coloring for the sub-graph, now we see if we just get lucky are able to assign a register to f
in which case we can just go ahead and continue the color of the rest of the graph as if nothing had
happened.
So in this case let's take a look at what happens. We're going to add f
back into the graph. And in this case, optimistic coloring will not work so, in fact, f
had more than K neighbors and after we color the sub-graph, it turns out that those neighbors are using all K. And so there is no register leftover for f
and we're going to have to
actually spill it and store it in memory. So, if optimistic coloring fails as it does in
this example, then we spill f
. So, what we're going to do is allocate the memory
location for f
and typically, what that means is that we'll allocate a position in
the current stack frame. Let's call this address fa
for the address of f
. And then
we're going to modify the control flow graph. We're going to change the code for
that compiling. So, before each operation that reads f
, we're going to insert a load
that loads from that address to the current value of f
into a temporary name. And similarly, after each operation that writes f
, we're going to insert the store so we're going to save the current value of f
into its location in memory. So, here is the original code from which we constructed the registry interference graph and notice that there are few references to f in here and we just highlight them, alright.
So, here we had the use of f
, the read of f
in this statement and now we preceded that by a load. And notice that I've given a new name here I've called f1
, that's because the different uses of f
in the control flow graph don't all have to have the same temporary name and actually, it would be a good idea to separate them so
each distinct to use of f
will get its own name. So here we load the value of f
and then it gets to use in the statement. Here we have a write to f
and so we store
the current value of f
to a different name f2
. And finally, the third use of f
there's another load of f
right here which is then used in this computation here of b
. Okay. So, that is the systematic way to modify the code to use f
in storage. And now, we have to recompute the aliveness of f
. And so, whathappens there? Well, here is the original aliveness information from which we computed the register interference graph, okay.
And now notice that f
is gone. We no longer use f
in the programs so we can delete all the places where we mentioned that f
was live and now we have the three new names, f1
, f2
, and f3
. And we have to add in their aliveness information so it creates a new program where we inserted statements. And of course, where we have a load of the current value of f
lives right before the use in the next statement. Here, we have the right of the current value of f
and that's live right before the store, and then here's another load of the current value of f
which is live until the use in the next statement. Okay. And so, now notice here that f
used to be live in many places in the code. And now not only is f
of the different versions in fewer places also we've distinguished them. So, it actually separates the different uses of f
and so this will
have their own nodes in their own set of interferences in the graph and they won't
share them with the other users of f
and that will actually also reduce the number
of edges in the graph.
To summarize the example above, once we have decided that we are actually going to spill a temporary f
, that means we're going to change the program where have loads and stores to the program and now we're going to have a different program and that's going to change our register allocation problems. We're going to have to recompute the aliveness of information, we have to rebuild the restrain interference graph and then we're going to have to try again to color that block graph. Now, it turns out that this new aliveness information is almost the same as it was before. So, all the temporary names other than f
are not much affected by new statements that are added. And, f
itself has changed fairly dramatically. Certainly the old name f
is no longer used and so it's like this information goes away and then we've also split f
into three, in this case, three different temporaries, one for each of the different uses of f
in the control flow graph. And I noticed that each of these new uses of f
or these new versions of f
is live in a very, very small area.
For a load instruction, f
is live only between the load and the next instruction where it's used and similarly for a store, f
is live only between the store itself and the proceeding instruction, the one they created f
. And the effect is to greatly reduce the live range of the spilled variable. So because the live range of f
is reduced by
spilling, it has fewer interferences in the new program than it did in the old program.
And so what that means the particulars in the rebuild register interference graph. f
will have fewer neighbors. Some of the neighbors that it had before have gone away because it's live in fewer places.
So if we look at the new register interference graph, we can see that among all the different versions of f
. Remember that f
has been split into three temporaries in this graph. We see that they only interfere with d
and c
, whereas before f have several other neighbors in the graph. And now, in fact, this new graph is three colorable.
Of course, it might be the case that we can't just spill one name. We might have to spill several different temporaries before the coloring is found. And, the tricky part is what to spill. So, this is the hard decision that has to be made during restore allocation. Now any choice is correct. It's only a question of performance so you know some choices of spilling will lead to better code than others but any choice of spilling is going to resolve in a correct program. And there's heuristics that people use to pick which temporaries to spill and here are a few or I think three of the most popular ones. One is to spill the temporaries have the most conflicts. And the reason for that is that the temporary will most affect the number of interferences in the graph. We'll remove enough edges from the graph that they become tolerable with the number of registers we have. Another possibility is spilling temporaries that have few definitions and uses. And, here the idea is that by spilling those since they're not used very much, the number of load and write in storage will have to add, will be relatively small and so if a variable just isn't used in many places, then the actual cost, in terms of additional instructions that are going to be executed to spill it, is relatively small. And another one and this is actually the one that I think that all the compilers implement is to avoid spilling in an inner loops. So, if you have a choice between spilling a variable that's used within innermost loop for the program and one that is used someplace else. It's probably preferred that you spill the one that is used not in the innermost loop absolutely because again, that will result in fewer loads in stores. You really want to avoid adding additional instructions to your inner loop.
16-04 Managing Caches¶
In this lecture, we're going to talk about another very important resource, the cache, and what compilers can and can't do to manage them. Modern computer systems have quite elaborate memory hierarchies.
So the bottom line is that it's very important for high performance to manage these resources properly. Particular to manage the registers and the cache as well if you want your program to perform well. Compilers have become very good at managing registers and in fact, I think today, most people would agree that for almost all programs, compilers do a better job at managing registers than
programmers can. And so, it's very worthwhile to leave the job of allocating registers or assigning registers to the compiler. However, compilers are not good at managing caches. If programmers want to get good cache performance, they have to understand the
behavior of the cache is on the machine and have to understand what their program is doing, you have to understand a little bit about what the compiler is capable of doing and then they still have to write the program in such a way that is going to be cache-friendly. So, it's still very much an open question. How much a compiler can do to improve cache performance? Although, there are a few things that we've found compilers can do reliably. So, to see one of those things that compilers can actually do let's take a look at this example loop. So, we have an outer loop on j
and inner loop on i
and then in each iteration of the inner loop we're reading from, some vector B
, performing some computational and storing the results into the ith element of the vector. Now, as it turns out, this particular program has really, really terrible cache performance. This is going to behave very badly. And so, let's think about what's going to happen.
for(j := 1; j < 10; j++)
for(i=1; i<100000; i++)
a[i] *= b[j];
The important thing here is that on every iteration of the loop, we're referring to fresh data. And, and if these data values are large enough, if they take up an entire cache line, then each iteration of the loop is going to be a cache miss for both elements, and we won't get any benefit of the cache. And this loop will run at the rate of the main memory and not at the rate of the cache. Now, the other thing that's important here is that this loop bound here is very large and I picked it to be very large to suggest that it's much larger than the size of the cache. So, as we get towards the end of the loop what's going to happen is we will have filled up the whole cache, so this whole cache will be filled with values from a and b, and then it's going to start swapping values that are already in the cache. And if this loop, the size of these vectors is twice the size of the cache, by the time we come around and complete the entire execution of the inner loop, what's in the cache is the second half of the a and b arrays, it's not the first half of the a and b arrays. And so, then when we go back around and execute another iteration of the outer loop, now what's in the cache is also, going to be not the data that we're referencing. And so when we come back and begin the execution of the inner loop the second time, what's in the cache are the values from the high numbered elements of the a and b vector but not the low numbered elements. And so, these references are all misses again.
A better version is below.
for(i=1; i<100000; i++)
for(j := 1; j < 10; j++)
a[i] *= b[j];
Now compilers can perform this simple loop interchange optimization. This particular kind of optimization is called loop interchange, where you just switch in the order of loops. Not many compilers actually implement this optimization because in general, it's not easy to decide whether you can reverse the orders of the loops.
17-01 Automatic Memory Management¶
In this lecture, we're going to start our discussion of garbage collection or automatic memory management. To set the stage, let's first talk about the problem that we're trying to solve. So, if one has to manage memory manually, meaning you have to do all the allocation and deallocation explicitly yourself, that is a hard way to programming leads to certain kinds of bugs that are very difficult to eliminate from programs. So, in particular, these days you see this primarily in C and C++ programs, those are the main languages that are used that have manual memory management and, the kinds of storage bugs that you can get because it has manual memory management are things like forgetting to free unused memory so that's a memory leak, dereferencing dangling pointers, overriding parts of a data structure, unintentionally. And I want to emphasize that these kinds of bugs are often some of the very last bugs to be found in complex systems. They often persist into production and sometimes for a very long time after the code is in production use. And why is that? The reason is that these kinds of bugs, storage bugs, typically have effects that are far away in time and space from the source and so how can that happen? Well let's think about some objects in memory and now let's say for these objects you might have some fields, let's say you have a few fields and I am keeping some pointers to it. So somewhere on the program is a reference to this particular object and now I free it. So I am doing my own memory management like free this object but I forget that I had this pointer. And so now what's happened all the storage has been freed it's no longer really valid memory but the pointer still exist to it. And then when I come along and allocate something else it might allocate the same piece of memory. So this might now be a different kind of object, okay. So I might have a different type here even. This memory might be used for something completely different and that will probably cause my program to crash. So this is a very, very old problem, it's been studied since at least the 1950s. It was first thought about carefully in Lisp. And there are some well-known techniques for completely automatic memory management so you don't have to manage memory yourself. And this only became mainstream actually in the 1990s so with the popularity of Java.
The basic strategy in automatic memory management is pretty simple. When an object is created, when we allocate a new object, the run time system will find some unused space for that object and it will just allocate it. So whenever you say new of some class name in Cool. Some memory is automatically allocated by the system, some previously unused memory is automatically allocated by the system for that object. And if you keep doing this over and over and over again and after a while, you're going to run out of space. So eventually there is no more unused space left for additional objects. And at that point, you have to do something, you have to reclaim some of the space in order to allocate more objects and the observation that garbage collection systems rely upon is that some of the spaces being used are probably occupied by objects that will never be used again. So some of these objects are not going to be referred to again by the program and if we can figure out which objects those are then we could deallocate them and reuse the space for new objects. So the big question is, how can we know that an object will never be used again? And, most of the garbage collection techniques that are out there today rely on the following observation that a program can only use the objects that it can find.
We're going to say that an object x is reachable if and only if one of the following two things is true. So either a register contains a pointer to x or either the x is reachable immediately from some register. So all the other objects, the ones that you were not able to reach by recursively starting at registers and following pointers as far as you could, those objects can never be used.
Reachability is an approximation.
x := new A;
y := new B
x := y;
if alwaysTrue() then x := new A else x.foo() fi
So let's take a look at another example that illustrates some interesting aspects of reachability and its use in automatic memory management. So what does this example do? The first thing it does is to allocate an A object, on the heap and assigns that to the variable x. So, x is a pointer to that object. And then it allocates a B object and y will point to that object. And then, it assigns the value of y to x, alright. And then we're going to go off and we're going to execute this conditional. And notice that this conditional is going to do. It's going to always be true, alright? So the predicate will always be true so it'll never take the false branch. All it's going to ever do is take the true branch and what's it going to do there, is immediately going to overwrite x. And so x is going to wind up pointing at some other new object. It doesn't matter what it is. And now, let's say that at this point right here, is where we try to do a garbage collection. So you know, for some reason this is the point where the program stops and tries to collect unused memory. And what can it collect? We can see that this object is unreachable, okay. So the first A object becomes unreachable at that point and it can be collected. Now, what about the second object? Well, it is reachable, it's clearly reachable. It's reachable through x, okay at that point and it's also reachable as it happens through y. And so it's not garbage and it's not going to be collected but notice that the x value is always going to be overwritten, okay? So the program, the compiler doesn't know that this branch is always going to be true. So, it doesn't realize that the value that x has at this point won't ever be used again but that value is immediately going to be overwritten, every time we take this conditional. So in fact the B value will never be used again even though it is reachable. And so what this tells you is that reachability is an approximation. And by that I mean it's an approximation for the objects that will never be used again. What we're really interested in when we do garbage collection is collecting objects that will never be used in the future execution of the program. Because obviously that space is wasted and could be put to some other use that might be better and reachability approximates that. So if an object is unreachable it definitely won't be used again however, just because an object is reachable it's not a guarantee that it will be used again.
So now let's talk about how we do garbage collection in Cool.
So Cool has a fairly simple structure. It uses an accumulator which of course points to an object and that object may point to other objects and so on. So we have to trace all the objects reachable from the accumulator but we also have to worry about the stack pointer so there's also stuff reachable from the stack. And each stack frame of course may contain pointers like, and you know for example the method parameters that are stored on the stack. Each stack frame may also contain some non-pointers, alright? So if I think about the layout of each activation record there would be some mix of pointers and non-pointers. Things like the return address so we have to know the layout of the frame. But if we do know the layout and of course, the compiler is deciding on the layout so it naturally does know the layout, it can find all the pointers in the frame. Essentially, the compiler has to keep a record for each kind of activation record it builds for each method. In Cool, we start tracing from the accumulator and the stack and these are called the roots.
17-03 Stop-and-Copy¶
In this lecture, we are going to look at the second garbage collection technique, stop and copy. In stop-and-copy garbage collection, memory is organized into two areas. The old space is used for allocation and the new space is reserved for the garbage collector.
How does it work?¶
In fact, there are some more advanced techniques that allow the program to use more than half of the space, but a fairly significant fraction of the space has to be reversed for the GC. There's a heap pointer in the old space and everything to the left of the heap pointer is currently in use. When it comes to allocating an object, it will just keep marching through the old space. When the old space is full, the program stops, and GC will copy all the reachable objects from the old space to the new space. After that, simply swap the roles of the old and new space, and then the program resumes.
how to implement the traversal of the object graph without using any extra space¶
We're going to partition the new space into three contiguous regions. There's a region called
the empty region where we're allocating new objects and there's an allocation pointer that points to the beginning of that region. Immediately to the left of that region are the objects that have already been copied, but not scanned. What does that mean? Well, that means that the object has been copied but we haven't yet looked at its pointers inside the object to see where they go. To the left of that are the objects that have been copied and scanned. These are objects that have been copied over and we've also processed all the pointers inside those objects. The area between the scanned pointer and the allocation pointer is the work quest.
Alogrithm¶
Example¶
forwarding pointer: a pointer stored in the old version object points to the new copy.
step | graph |
---|---|
1 | |
2 | |
3 | |
4 | |
5 | |
6 |
Advantages¶
- Very simple and fast for allocation. (just increment the heap pointer)
- Collection is relatively cheap (mark-and-sweep: O(memory), strop-and-copy: O(reachable objects))
Disadvantage¶
- Not suitable for C/C++ because pointer is part of the semantics.