Friday, 18 May 2018 - Björn Gustavsson
This blog post continues the exploration of Core Erlang by
looking at some optimizations done by the sys_core_fold
compiler pass. The Core Erlang language was introduced in
the previous blog post.
To prepare the examples in this blog post I used two commands.
$ erlc +time +dcore core_fold_example.erl
Compiling "core_fold_example"
parse_module : 0.000 s 9.4 kB
transform_module : 0.000 s 9.4 kB
lint_module : 0.005 s 9.4 kB
expand_records : 0.000 s 9.4 kB
core : 0.000 s 59.3 kB
listing : 0.003 s 59.3 kB
The dcore
option produces the file core_fold_example.core
containing a listing of the Core Erlang code produced by the core
parse (implemented by the module v3_core
).
$ erlc +time +dcopt core_fold_example.erl
Compiling "core_fold_example"
parse_module : 0.000 s 9.4 kB
transform_module : 0.000 s 9.4 kB
lint_module : 0.002 s 9.4 kB
expand_records : 0.000 s 9.4 kB
core : 0.000 s 59.3 kB
sys_core_fold : 0.000 s 25.3 kB
core_transforms : 0.000 s 25.3 kB
listing : 0.002 s 25.3 kB
The dcopt
option produces the file core_fold_example.copt
containing a listing of the Core Erlang code as it looks
after optimization by the sys_core_fold
pass.
As was mentioned in my first blog post about the compiler,
compile:options()
will print most of the hidden options for
the compiler.
The most basic optimization done by sys_core_fold
is constant propagation.
Consider this Erlang function:
a() ->
A = 42,
{ok,A}.
It can be translated to Core Erlang like this:
'a'/0 =
fun () ->
let <A> = 42
in {'ok',A}
The variable A
is bound to a constant (as opposed to an expression such
as function call). We can replace all occurrences of the variable A
with
the constant value 42
and eliminate the let
:
'a'/0 =
fun () ->
{'ok',42}
case
expressionsActually, the first version of a/0
that I showed was already
slightly optimized by me.
Here is the actual Core Erlang code (only slightly edited to remove annotations and unnecessary line breaks):
'a'/0 =
fun () ->
case <> of
<> when 'true' ->
let <A> = 42
in {'ok',A}
<> when 'true' ->
primop 'match_fail'({'function_clause'})
end
The let
has been wrapped in a useless outer case
. The
case
would serve some purpose if there had been some function
arguments, but why complicate the code generator if sys_core_fold
is
perfectly capable of simplifying this code?
sys_core_fold
will simplify the code in several steps.
First it will look at each clause. If a clause can't possibly
be executed (for example, it its guard is false
) it will be
dropped. If a clause will always match, all clauses following
the clause will be dropped.
In this case, the first clause will always match, because the
pattern is a list of no variables that can't fail to match, and
the guard is true
. Thus the second clause is unreachable and
is dropped:
'a'/0 =
fun () ->
case <> of
<> when 'true' ->
let <A> = 42
in {'ok',A}
end
The next step is to see if there is only one clause remaining.
If it is, the body of the clause can be kept and the case
eliminated:
'a'/0 =
fun () ->
let <A> = 42
in {'ok',A}
Let's see how a more complicated function can be optimized following the steps just described. Consider this Erlang function:
aa() ->
case {a,tuple} of
[List] -> List;
{A,B} -> {tuple,A,B};
_ -> something_else
end.
Translated to Core Erlang code (with the outer case
and
annotations removed) it will look this:
'aa'/0 =
fun () ->
case {'a','tuple'} of
<[List|[]]> when 'true' ->
List
<{A,B}> when 'true' ->
{'tuple',A,B}
<_@c1> when 'true' ->
'something_else'
<_@c0> when 'true' ->
primop 'match_fail'({'case_clause',_@c0})
end
Let's go through the clauses one by one:
The first clause will only match a list with exactly one element.
The case
expression is a tuple, so the first clause can't
possibly match. It will be dropped.
The second clause will match a tuple with (any) two elements. The case expression is a tuple with two elements, so this clause will always match.
There is no need to look at the remaining clauses, since the second clause will always match. The remaining clauses are dropped.
We now have:
'aa'/0 =
fun () ->
case {'a','tuple'} of
<{A,B}> when 'true' ->
{'tuple',A,B}
end
This is a case
with just one clause, so we can keep
the body of the clause and remove the case
. But there is
a problem if we do that naively:
'aa'/0 =
fun () ->
{'tuple',A,B}
The variables A
and B
are used, but they don't have
any values bound to them. We must use a let
to bind
the variables before they can be used:
'aa'/0 =
fun () ->
let <A,B> = <'a','tuple'>
in {'tuple',A,B}
Propagating constants, the final code is:
'aa'/0 =
fun () ->
{'tuple','a','tuple'}
Here is an example of a common pattern of matching several expressions in parallel:
b(A, B) ->
case {A,B} of
{true,false} -> ok;
{false,true} -> not_ok;
{_,_} -> error
end.
The unoptimized Core Erlang code looks like this:
'b'/2 =
fun (_@c1,_@c0) ->
case <_@c1,_@c0> of
<A,B> when 'true' ->
case {A,B} of
<{'true','false'}> when 'true' ->
'ok'
<{'false','true'}> when 'true' ->
'not_ok'
<{_@c5,_@c6}> when 'true' ->
'error'
<_@c2> when 'true' ->
primop 'match_fail'({'case_clause',_@c2})
end
end
The case
expression is {A,B}
. When executing the case
a tuple will built, and then almost immediately discarded.
That is wasteful. Therefore sys_core_fold
rewrites the
code to eliminate the tuple building:
'b'/2 =
fun (_@c1,_@c0) ->
case <_@c1,_@c0> of
<'true','false'> when 'true' ->
'ok'
<'false','true'> when 'true' ->
'not_ok'
<_@c5,_@c6> when 'true' ->
'error'
end
Here a value list is used instead of a tuple. (See previous blog post for several examples of value lists.)
Another common pattern where tuples are built and immediately discarded is shown in this example:
c(X) ->
{A,B} = case X of
a1 -> {10,1};
b2 -> {20,2};
_ -> {100,42}
end,
A+B.
The unoptimized Core Erlang code looks like this:
'c'/1 =
fun (_@c0) ->
case _@c0 of
<X> when 'true' ->
let <_@c2> =
case X of
<'a1'> when 'true' ->
{10,1}
<'b2'> when 'true' ->
{20,2}
<_@c5> when 'true' ->
{100,42}
<_@c1> when 'true' ->
primop 'match_fail'({'case_clause',_@c1})
end
in
case _@c2 of
<{A,B}> when 'true' ->
call 'erlang':'+'(A, B)
<_@c3> when 'true' ->
primop 'match_fail'({'badmatch',_@c3})
end
<_@c4> when 'true' ->
primop 'match_fail'({'function_clause',_@c4})
end
Here a tuple is built and assigned to _@c2
. It is then matched
in a case
.
First the code is optimized like this to eliminate the tuple building
in each clause of the first case
:
'c'/1 =
fun (_@c0) ->
let <_@f4,_@f5> =
case _@c0 of
<'a1'> when 'true' ->
<10,1>
<'b2'> when 'true' ->
<20,2>
<_@c5> when 'true' ->
<100,42>
end
in
let <_@c2> = {_@f4,_@f5}
in
case _@c2 of
<{A,B}> when 'true' ->
call 'erlang':'+'(A, B)
<_@c3> when 'true' ->
primop 'match_fail'({'badmatch',_@c3})
end
end
Applying all of the optimizations previously described, the remaining tuple building and matching can be eliminated:
'c'/1 =
fun (_@c0) ->
let <_@f4,_@f5> =
case _@c0 of
<'a1'> when 'true' ->
<10,1>
<'b2'> when 'true' ->
<20,2>
<_@c5> when 'true' ->
<100,42>
end
in
call 'erlang':'+'(_@f4, _@f5)
That was a quick look at some of the optimizations done by
sys_core_fold
.
Some of the optimizations are very simple. The power of the
sys_core_fold
pass comes from the combination of optimizations. One
optimization gives opportunities for other optimizations, as could be
seen in the examples.
Why is the optimization pass called sys_core_fold
?
A hint can be found in the title of this Wikipedia article: Constant folding.