Author: Björn Gustavsson <bjorn(at)erlang(dot)org>
Status: Accepted/23.0 Proposal is to be implemented in OTP release 23.0
Type: Standards Track
Created: 28-Jan-2020
Erlang-Version: 23
Post-History: 28-Jan-2020

EEP 52: Allow key and size expressions in map and binary matching

Abstract

This EEP proposes an extension to matching of binaries to allow the size of a segment to be a guard expression and to the matching of maps to allow the key to be a guard expression.

Specification

We propose that in binary matching the size of a binary segment can be a guard expression. Here is an example:

example1(<<Size:8,Payload:((Size-1)*8)/binary,Rest/binary>>) ->
   {Payload,Rest}.

The same expressions as in guards are allowed, except that the old-style type tests (such a list/1 or tuple/1) are not allowed. Unless the expression consists of a single number or single variable is must be enclosed in parentheses. Any variables used in the expression must have been previously bound, or become bound in the same binary pattern as the expression. That is, the following example is illegal:

illegal_example2(N, <<X:N,T/binary>>) ->
    {X,T}.

A binary pattern will fail to match, if size expression in any of its segments does not evaluate successfully or evaluates to a non-integer value. For example:

example3(<<X:(1/0)>>) -> X;
example3(<<X:not_integer>>) -> X;
example3(_) -> no_match.

The first clause will not match because the evaluation of 1/0 fails. The second clause will not match because the size evaluates to an atom.

In the current map matching syntax, the keys in a map pattern must be a single value or a literal. That leads to unnatural code if the keys in a map are complex terms. For example:

example4(M, X) ->
    Key = {tag,X},
    #{Key := Value} = M,
    Value.

We propose that the key in a map pattern can be a guard expression. That will allow the previous example to be written like this:

example5(M, X) ->
    #{{tag,X} := Value} = M,
    Value.

All variables used in a key expression must be previously bound. Thus, the following example is illegal:

illegal_example6(Key, #{Key := Value}) -> Value.

Motivation

The current limitations of map keys are surprising. A literal tuple such as {a,b} is allowed as a key, while a tuple with a variable such as {a,Var} is not.

In binary matching, it has always been possible to multiply a matched out number by a small constant using the unit: modifier. The proposed extension makes is possible in more circumstances to match both header and payload in the same binary pattern.

Rationale

Why allow guard expressions?

We did consider only allowing term construction and expressions using arithmetic operators. There are two reasons we went with guard expression instead:

  • It is easy to explain and understand exactly what expression are allowed as segment sizes and as map keys, because the same type of expressions are allowed in guards.

  • When calculating the size of binary segment, a subset of guard BIFs could conceivable be useful in practice. For example: ceiling/1, round/1, byte_size/1, bit_size/1, and map_get/2. We did not want an arbitrary list of allowed BIFs in size expressions, so the only logical thing is to allow full guard expressions.

Why are not absurd size expressions compilation errors?

Size expressions that obviously never evaluates to an integer will not cause a compilation error (but may cause a warning). For example:

example6(Bin, V) ->
    <<X:(is_list(V))>> = Bin,
    X.

The reason is that rules for what is a legal Erlang program should be simple and unambigous, to help both people and tools that generate Erlang programs.

Why are parentheses required around non-trivial size expressions?

For the same reason that they are required when constructing binaries, namely that language grammar would be ambiguous without the parentheses, since binary patterns use the characters :, /, and - with a different meaning than in the rest of the language.

Backwards Compatibility

Using the extended expressions segment size and map keys would cause a compilation error in OTP 22 and previous releases. Therefore, no existing source code can be affected.

However, there are changes to the semantics of Core Erlang that may make it necessary to update languages compilers or tools that generate Core Erlang code.

There are two major changes:

  • A binary pattern in Core Erlang is no longer allowed to bind and use a variable in the same binary pattern.

  • To fully support binary matching in receives, receives must be lowered to more primitive operations.

Binary matching in Core Erlang

In Erlang, a variable can be bound in a binary pattern and used later in the same pattern as the size of a segment:

foo(<<Sz:16,X:Sz>>) -> X.

In OTP 22 and previous releases, the translation to Core Erlang is straightforward:

'foo'/1 =
    fun (_0) ->
        case _0 of
          <#{#<Sz>(16,1,'integer',['unsigned'|['big']]),
             #<X>(Sz,1,'integer',['unsigned'|['big']])}#> when 'true' ->
              X
          <_1> when 'true' ->
              %% Raise function_clause exception.
              .
              .
              .
        end

While the translation is straightforward, all Core Erlang passes would need to handle binding and using a variable in the same scope. That would become even more complicated if we were to allow expressions as segment sizes.

In OTP 23, all variables used in a segment size expression must be already bound in the enclosing environment. The previous example must be rewritten like this using nested cases:

'foo'/1 =
    fun (_0) ->
          case _0 of
              <#{#<Sz>(16,1,'integer',['unsigned'|['big']]),
               #<_2>('all',1,'binary',['unsigned'|['big']])}#> when 'true' ->
                  case _2 of
                     <#{#<X>(Sz,1,'integer',['unsigned'|['big']])}#> when 'true' ->
                         X
                     <_3> when 'true' ->
                         %% Raise function_clause exception.
                         .
                         .
                         .
                    end
               <_4> when 'true' ->
                    %% Raise function_clause exception.
                    .
                    .
                    .
              end

However, as can be seen from the example, the code for raising the function_clause exception has been duplicated. The code duplication is no big deal in this simple example, but it would be in a function where the binary matching clause was followed by many other clauses. To avoid the code duplication, we must use letrec with the letrec_goto annotation:

'foo'/1 =
    fun (_0) ->
        ( letrec
              'label^0'/0 =
                  fun () ->
                        case _0 of
                          <_1> when 'true' ->
                                %% Raise function_clause exception.
                                .
                                .
                                .
                        end
          in  case _0 of
                <#{#<Sz>(16,1,'integer',['unsigned'|['big']]),
                   #<_2>('all',1,'binary',['unsigned'|['big']])}#> when 'true' ->
                    case _2 of
                      <#{#<X>(Sz,1,'integer',['unsigned'|['big']])}#> when 'true' ->
                          X
                      <_3> when 'true' ->
                            apply 'label^0'/0()
                    end
                <_4> when 'true' ->
                      apply 'label^0'/0()
              end
          -| ['letrec_goto'] )

When a letrec has been given the annotation letrec_goto, it will be specially translated. The apply operations will be translated to a goto instead of a call to a local function.

Translating receive to Core Erlang

Consider this example:

bar(Timeout) ->
    receive
        {tag,Msg} -> Msg
    after
        Timeout ->
            no_message
    end.

In a OTP 22 and earlier, the translation to Core Erlang was straightforward:

'bar'/1 =
    fun (Timeout) ->
        receive
          <{'tag',Msg}> when 'true' ->
              Msg
        after Timeout ->
          'no_message'

In order to fully support binary matching in OTP 23, a receive in Erlang has now been lowered to more primitive operations in Core Erlang:

'foo'/1 =
    fun (Timeout) ->
        ( letrec
              'recv$^0'/0 =
                  fun () ->
                      let <PeekSucceeded,Message> =
                          primop 'recv_peek_message'()
                      in  case PeekSucceeded of
                            <'true'> when 'true' ->
                                case Message of
                                  <{'tag',Msg}> when 'true' ->
                                      do  primop 'remove_message'()
                                          Msg
                                  <Other> when 'true' ->
                                      do  primop 'recv_next'()
                                            apply 'recv$^0'/0()
                                end
                            <'false'> when 'true' ->
                                let <TimedOut> =
                                    primop 'recv_wait_timeout'(Timeout)
                                in  case TimedOut of
                                      <'true'> when 'true' ->
                                          do  primop 'timeout'()
                                              'no_message'
                                      <'false'> when 'true' ->
                                          apply 'recv$^0'/0()
                                    end
                          end
          in  apply 'recv$^0'/0()
          -| ['letrec_goto'] )

When compiling from Core Erlang code in OTP 23, the compiler will accept Core Erlang code that uses the receive construct and automatically lower it to the more primitive operations. That is, for the example above, the Core Erlang translation from OTP 22 will be accepted as input to the compiler in OTP 23.

Here is another example where the Core Erlang code from OTP 22 will not be accepted. Here is the Erlang code:

foobar() ->
    receive
        <<Sz:16,X:Sz>> -> X
    end.

In OTP 22, this would be translated to Core Erlang code like this:

'foobar'/0 =
    fun () ->
        receive
          <#{#<Sz>(16,1,'integer',['unsigned'|['big']]),
             #<X>(Sz,1,'integer',['unsigned'|['big']])}#> when 'true' ->
              X
        after 'infinity' ->
          'true'

That translation will not be accepted by the compiler in OTP 23. The receive must be lowered to more primitive operations, and the binary matching must be rewritten using nested cases:

'foobar'/0 =
    fun () ->
        ( letrec
              'recv$^0'/0 =
                  fun () ->
                      let <_5,_0> =
                          primop 'recv_peek_message'()
                      in  case _5 of
                            <'true'> when 'true' ->
                                ( letrec
                                      'label^0'/0 =
                                          fun () ->
                                                do  primop 'recv_next'()
                                                    apply 'recv$^0'/0()
                                  in  case _0 of
                                        <#{#<Sz>(16,1,'integer',['unsigned'|['big']]),
                                           #<_1>('all',1,'binary',['unsigned'|['big']])}#> when 'true' ->
                                            case _1 of
                                              <#{#<X>(Sz,1,'integer',['unsigned'|['big']])}#> when 'true' ->
                                                  do  primop 'remove_message'()
                                                      X
                                              <_2> when 'true' ->
                                                    apply 'label^0'/0()
                                            end
                                        <_3> when 'true' ->
                                              apply 'label^0'/0()
                                      end
                                  -| ['letrec_goto'] )
                            <'false'> when 'true' ->
                                  let <_4> =
                                      primop 'recv_wait_timeout'
                                          ('infinity')
                                  in  case _4 of
                                        <'true'> when 'true' ->
                                            do  primop 'timeout'()
                                                'true'
                                        <'false'> when 'true' ->
                                            apply 'recv$^0'/0()
                                      end
                          end
          in apply 'recv$^0'/0() )
          -| ['letrec_goto']

Implementation

The implementation can be found in PR #2521.

Copyright

This document has been placed in the public domain.