NEWS

Core Erlang Optimizations

Friday, 18 May 2018 - Björn Gustavsson

This blog post continues the exploration of Core Erlang by looking at some optimizations done by the sys_core_fold compiler pass. The Core Erlang language was introduced in the previous blog post.

To prepare the examples in this blog post I used two commands.

$ erlc +time +dcore core_fold_example.erl
Compiling "core_fold_example"
 parse_module                  :      0.000 s       9.4 kB
 transform_module              :      0.000 s       9.4 kB
 lint_module                   :      0.005 s       9.4 kB
 expand_records                :      0.000 s       9.4 kB
 core                          :      0.000 s      59.3 kB
 listing                       :      0.003 s      59.3 kB

The dcore option produces the file core_fold_example.core containing a listing of the Core Erlang code produced by the core parse (implemented by the module v3_core).

$ erlc +time +dcopt core_fold_example.erl
Compiling "core_fold_example"
 parse_module                  :      0.000 s       9.4 kB
 transform_module              :      0.000 s       9.4 kB
 lint_module                   :      0.002 s       9.4 kB
 expand_records                :      0.000 s       9.4 kB
 core                          :      0.000 s      59.3 kB
 sys_core_fold                 :      0.000 s      25.3 kB
 core_transforms               :      0.000 s      25.3 kB
 listing                       :      0.002 s      25.3 kB

The dcopt option produces the file core_fold_example.copt containing a listing of the Core Erlang code as it looks after optimization by the sys_core_fold pass.

As was mentioned in my first blog post about the compiler, compile:options() will print most of the hidden options for the compiler.

The most basic optimization

The most basic optimization done by sys_core_fold is constant propagation.

Consider this Erlang function:

a() ->
    A = 42,
    {ok,A}.

It can be translated to Core Erlang like this:

'a'/0 =
    fun () ->
       let <A> = 42
       in {'ok',A}

The variable A is bound to a constant (as opposed to an expression such as function call). We can replace all occurrences of the variable A with the constant value 42 and eliminate the let:

'a'/0 =
    fun () ->
	{'ok',42}

Optimizing case expressions

Actually, the first version of a/0 that I showed was already slightly optimized by me.

Here is the actual Core Erlang code (only slightly edited to remove annotations and unnecessary line breaks):

'a'/0 =
    fun () ->
        case <> of
	  <> when 'true' ->
	      let <A> = 42
	      in {'ok',A}
	  <> when 'true' ->
		primop 'match_fail'({'function_clause'})
	end

The let has been wrapped in a useless outer case. The case would serve some purpose if there had been some function arguments, but why complicate the code generator if sys_core_fold is perfectly capable of simplifying this code?

sys_core_fold will simplify the code in several steps.

First it will look at each clause. If a clause can't possibly be executed (for example, it its guard is false) it will be dropped. If a clause will always match, all clauses following the clause will be dropped.

In this case, the first clause will always match, because the pattern is a list of no variables that can't fail to match, and the guard is true. Thus the second clause is unreachable and is dropped:

'a'/0 =
    fun () ->
        case <> of
	  <> when 'true' ->
	      let <A> = 42
	      in {'ok',A}
	end

The next step is to see if there is only one clause remaining. If it is, the body of the clause can be kept and the case eliminated:

'a'/0 =
    fun () ->
       let <A> = 42
       in {'ok',A}

Another case example

Let's see how a more complicated function can be optimized following the steps just described. Consider this Erlang function:

aa() ->
    case {a,tuple} of
	[List] -> List;
	{A,B} ->  {tuple,A,B};
	_ ->      something_else
    end.

Translated to Core Erlang code (with the outer case and annotations removed) it will look this:

'aa'/0 =
    fun () ->
      case {'a','tuple'} of
	<[List|[]]> when 'true' ->
	    List
	<{A,B}> when 'true' ->
	    {'tuple',A,B}
	<_@c1> when 'true' ->
	    'something_else'
	<_@c0> when 'true' ->
	    primop 'match_fail'({'case_clause',_@c0})
      end

Let's go through the clauses one by one:

  • The first clause will only match a list with exactly one element. The case expression is a tuple, so the first clause can't possibly match. It will be dropped.

  • The second clause will match a tuple with (any) two elements. The case expression is a tuple with two elements, so this clause will always match.

  • There is no need to look at the remaining clauses, since the second clause will always match. The remaining clauses are dropped.

We now have:

'aa'/0 =
    fun () ->
      case {'a','tuple'} of
	<{A,B}> when 'true' ->
	    {'tuple',A,B}
      end

This is a case with just one clause, so we can keep the body of the clause and remove the case. But there is a problem if we do that naively:

'aa'/0 =
    fun () ->
       {'tuple',A,B}

The variables A and B are used, but they don't have any values bound to them. We must use a let to bind the variables before they can be used:

'aa'/0 =
    fun () ->
      let <A,B> = <'a','tuple'>
      in {'tuple',A,B}

Propagating constants, the final code is:

'aa'/0 =
    fun () ->
	{'tuple','a','tuple'}

Avoiding tuple building

Here is an example of a common pattern of matching several expressions in parallel:

b(A, B) ->
    case {A,B} of
	{true,false} -> ok;
	{false,true} -> not_ok;
	{_,_} -> error
    end.

The unoptimized Core Erlang code looks like this:

'b'/2 =
    fun (_@c1,_@c0) ->
	case <_@c1,_@c0> of
	  <A,B> when 'true' ->
	      case {A,B} of
		<{'true','false'}> when 'true' ->
		    'ok'
		<{'false','true'}> when 'true' ->
		    'not_ok'
		<{_@c5,_@c6}> when 'true' ->
		    'error'
		<_@c2> when 'true' ->
		      primop 'match_fail'({'case_clause',_@c2})
	      end
	end

The case expression is {A,B}. When executing the case a tuple will built, and then almost immediately discarded. That is wasteful. Therefore sys_core_fold rewrites the code to eliminate the tuple building:

'b'/2 =
    fun (_@c1,_@c0) ->
	case <_@c1,_@c0> of
	  <'true','false'> when 'true' ->
	      'ok'
	  <'false','true'> when 'true' ->
	      'not_ok'
	  <_@c5,_@c6> when 'true' ->
	      'error'
	end

Here a value list is used instead of a tuple. (See previous blog post for several examples of value lists.)

Another common pattern where tuples are built and immediately discarded is shown in this example:

c(X) ->
    {A,B} = case X of
		a1 -> {10,1};
		b2 -> {20,2};
		_ ->  {100,42}
	    end,
    A+B.

The unoptimized Core Erlang code looks like this:

'c'/1 =
    fun (_@c0) ->
	case _@c0 of
	  <X> when 'true' ->
	      let <_@c2> =
		  case X of
		    <'a1'> when 'true' ->
			{10,1}
		    <'b2'> when 'true' ->
			{20,2}
		    <_@c5> when 'true' ->
			{100,42}
		    <_@c1> when 'true' ->
			  primop 'match_fail'({'case_clause',_@c1})
		  end
	      in
		  case _@c2 of
		    <{A,B}> when 'true' ->
			call 'erlang':'+'(A, B)
		    <_@c3> when 'true' ->
			  primop 'match_fail'({'badmatch',_@c3})
		  end
	  <_@c4> when 'true' ->
		  primop 'match_fail'({'function_clause',_@c4})
	end

Here a tuple is built and assigned to _@c2. It is then matched in a case.

First the code is optimized like this to eliminate the tuple building in each clause of the first case:

'c'/1 =
    fun (_@c0) ->
	let <_@f4,_@f5> =
	    case _@c0 of
	      <'a1'> when 'true' ->
		  <10,1>
	      <'b2'> when 'true' ->
		  <20,2>
	      <_@c5> when 'true' ->
		  <100,42>
	    end
	in
            let <_@c2> = {_@f4,_@f5}
            in
	          case _@c2 of
		    <{A,B}> when 'true' ->
			call 'erlang':'+'(A, B)
		    <_@c3> when 'true' ->
			  primop 'match_fail'({'badmatch',_@c3})
		  end
	end

Applying all of the optimizations previously described, the remaining tuple building and matching can be eliminated:

'c'/1 =
    fun (_@c0) ->
	let <_@f4,_@f5> =
	    case _@c0 of
	      <'a1'> when 'true' ->
		  <10,1>
	      <'b2'> when 'true' ->
		  <20,2>
	      <_@c5> when 'true' ->
		  <100,42>
	    end
	in
	    call 'erlang':'+'(_@f4, _@f5)

Conclusion

That was a quick look at some of the optimizations done by sys_core_fold.

Some of the optimizations are very simple. The power of the sys_core_fold pass comes from the combination of optimizations. One optimization gives opportunities for other optimizations, as could be seen in the examples.

Points to Ponder

Why is the optimization pass called sys_core_fold?

A hint can be found in the title of this Wikipedia article: Constant folding.