Author: Björn Gustavsson <bjorn(at)erlang(dot)org>
Status: Draft
Type: Standards Track
Created: 14-Sep-2020
Erlang-Version: 24
Post-History:

EEP 54: Provide more information about errors

Abstract

This EEP proposes a mechanism for reporting more human-readable information about what went wrong when a BIF raises an exception. The same mechanism can be used by libraries or applications to provide more detailed error messages.

Specification

In OTP 23 and earlier, the shell prints a terse message when a call to a built-in function (BIF) fails:

1> element(a,b).
** exception error: bad argument
     in function  element/2
        called as element(a,b)

The bad argument message informs us that one or more of the arguments to the call were incorrect in some way (in this example, both arguments have wrong types).

We propose a mechanism that enables the shell to print more helpful error messages. Here is how the message would be printed with the reference implementation of this EEP:

1> element(a, b).
** exception error: bad argument
     in function  element/2
        called as element(a,b)
        *** argument 1: not an integer
        *** argument 2: not a tuple

Note that the exact formatting and phrasing of the messages is an implementation detail outside the scope of this EEP. What will be specified here are the APIs and conventions that make these messages possible.

Proposals in this EEP

An extension of the format of the call-stack back trace (stacktrace) format to indicate that there exists extended error information for that call, and a convention for how extended error information can be provided.
A new erlang:error/3 BIF to allow libraries and applications to raise an exception with extended error information in the stacktrace.
New functions erl_error:format_exception/3 and erl_error:format_exception/4 to allow libraries and applications to format stacktraces in the same style as the shell.

Extending the stacktrace

The stack back-trace (stacktrace) is currently a list of tuples. For the purpose of this EEP we are only interested in the first entry in the stacktrace. It looks like {Module,Function,Arguments,ExtraInfo}, where ExtraInfo is a list of two-tuples. As an indication that extended error info is available, we propose adding an {error_info,ErrorInfoMap} tuple to ExtraInfo in the first element in the stacktrace.

The map ErrorInfoMap contains further information about the error. This should contain at least the key reason. The corresponding value could contain additional information about the error. When there is no additional information avaiable, the value is a dummy value such as none or [].

To obtain more information about the error, the format_error/4 function in module Module can be called to provide additional information about the error.

The arguments for format_error/4 are Function, Arguments, the exception reason (usually badarg for BIF calls), and ErrorInfoMap.

Thus, if a call to element/2 fails with a badarg exception and the first entry in the stacktrace is:

{erlang,element,[1,no_tuple],[{error_info,ErrorInfoMap}]}

the following call will provide provide additional information about the error:

erlang:format_error(element, [1,no_tuple], badarg, ErrorInfoMap)

The format_error/4 function should return a map. For each argument that was in error, there should be a map element with the argument number as the key (that is, 1 for the first argument, 2 for the second, and so on) and a unicode:chardata() term as the value.

As an example:

erlang:format_error(element, [1,no_tuple], badarg, ErrorInfoMap)

could return:

#{2 => <<"not a tuple">>}

And:

erlang:format_error(element, [0, b], badarg, ErrorInfo)

could return:

#{1 => <<"out of range">>, 2 => <<"not a tuple">>}

Note that the ErrorInfoMap term is only to be used by Module:format_error/4. It is not to be matched by code in other modules. The particular value for the key reason in the map ErrorInfoMap for a particular error could change at any time. Also note that Module:format_error/4 may choose not to use the reason key at all. For example, the reference implementation of erlang:format_error/4 only examines the arguments for element/2.

The value for the key reason term will typically have a meaningful value when an error occurs in a BIF that depends on the internal state in the runtime system (such as register/2 or the ETS BIFs), or for BIFs with complex arguments (such as system_flag/2) that would make it tedious and error prone to figure out which argument was in error.

Here is one way that format_error/4 for the erlang module could be implemented:

format_error(F, As, ExceptionReason, ErrorInfoMap) ->
    Reason = maps:get(reason, ErrorInfoMap, none),
    do_format_error(F, As, ExceptionReason, Reason).

do_format_error(_, _, system_limit, _) ->
    %% The explanation for system_limit is clear enough, so we don't
    %% need any detailed explanations for the arguments.
    #{};
do_format_error(F, As, _, Reason) ->
    do_format_error(F, As, Reason).

do_format_error(element, [Index, Tuple], _) ->
    Arg1 = if
               not is_integer(Index) ->
                   <<"not an integer">>;
               Index =< 0; Index > tuple_size(Tuple) ->
                   <<"out of range">>;
               true ->
                   []
           end,
    Arg2 = if
               not is_tuple(Tuple) -> <<"not a tuple">>;
               true -> []
           end,
    PotentialErrors = [{1, Arg1}, {2, Arg2}],
    maps:from_list([{ArgNum, Err} ||
                       {ArgNum, Err} <- PotentialErrors,
                       Err =/= []]);

do_format_error(list_to_atom, _, _) ->
    #{1 => <<"not a flat list of characters">>};

do_format_error(register, [Name,PidOrPort], Reason) ->
    [Arg1, Arg2] =
    case Reason of
        registered_name ->
            [[],<<"this process or port already has a name">>];
        notalive ->
            [[],<<"the pid does not refer to an existing process">>];
        _ ->
            Errors =
                [if
                     Name =:= undefined -> <<"'undefined' is not a valid name">>;
                     is_atom(Name) -> [];
                     true -> <<"not an atom">>
                 end,
                 if
                     is_pid(PidOrPort) -> [];
                     is_port(PidOrPort) -> [];
                     true -> <<"not a pid or a port">>
                 end],
            case Errors of
                [[],[]] ->
                    [<<"name is in use">>];
                [_,_] ->
                    Errors
            end,
    PotentialErrors = [{1, Arg1}, {2, Arg2}],
    maps:from_list([{ArgNum, Err} ||
                       {ArgNum, Err} <- PotentialErrors,
                       Err =/= []]);
      .
      .
      .

do_format_error(_, _, _) ->
    #{}.

The register/2 BIF will provide specific error reasons for two of the possible failure reasons. If the reason is not one of the two, format_error/4 will figure out the other reasons based on the arguments.

Supplying extended error information using `erlang:error/3`

A library or application can raise an error exception with extended error information by calling erlang:error(Reason, Arguments, Options). Reason should be the error reason (for example badarg), Arguments should be arguments for the calling function, and Options should be [{error_info,ErrorInfoMap}], where ErrorInfoMap with at least the key reason.

The module that raises the exception should export a format_error/4 function that behaves as described in the previous section.

Formatting stacktraces

To make it possible for applications and libraries to format stacktraces in the same style as the shell, the functions erl_error:format_exception/3 and erl_error:format_exception/4 are provided. Here is an example how erl_error:format_exception/3 can be used:

try
    .
    .
    .
catch
    C:R:Stk ->
        Message = erl_error:format_exception(C, R, Stk),
        io:format(LogFile, "~ts\n", [Message])
end.

The erl_error:format_exception/4 function is similar but has a fourth option argument to support customizing the message. See the documentation in the reference implementation for details.

Possible future extensions

Since the error_info tuple in the stacktrace contains a map, more data could be added to the map. For example, there could be module and function keys to allow pointing out a format_error/4 function with an arbitrary name. That could be useful if the compiler were to generate extended error information for badmatch or function_clause errors.

Since the return value of format_error/4 is a map, additional keys in the map could be assigned a meaning in the future.

For example, the value for the key hint could be a longer message that gives more context or provides concrete advice on how to investigate or avoid the error.

Additional examples

Let's look at some examples using ETS:

1> T = ets:new(table, []).
#Ref<0.2290824696.4161404930.5168>
2> ets:update_counter(T, k, 1).
** exception error: bad argument
     in function  ets:update_counter/3
        called as ets:update_counter(#Ref<0.2290824696.4161404930.5168>,k,1)
        *** argument 2: not a key that exists in the table

Note that when an error occurs while evaluating an expression entered in the shell, the evaluator process terminates and any ETS tables created by that process are deleted. Thus, calling update_counter a second time with the same arguments results in a different message:

3> ets:update_counter(T, k, 1).
** exception error: bad argument
     in function  ets:update_counter/3
        called as ets:update_counter(#Ref<0.2290824696.4161404930.5168>,k,1)
        *** argument 1: the table identifier does not refer to an existing ETS table

Starting over, creating a new ETS table:

4> f(T), T = ets:new(table, []).
#Ref<0.2290824696.4161404930.5205>
5> ets:insert(T, {k,a,0}).
true
6> ets:update_counter(T, k, 1).
** exception error: bad argument
     in function  ets:update_counter/3
        called as ets:update_counter(#Ref<0.2290824696.4161404930.5205>,k,1)
        *** argument 3: the value in the given position in the object is not an integer
7> ets:update_counter(T, k, bad).
** exception error: bad argument
     in function  ets:update_counter/3
        called as ets:update_counter(#Ref<0.2290824696.4161404930.5205>,k,bad)
        *** argument 1: the table identifier does not refer to an existing ETS table
        *** argument 3: not a valid update operation

Motivation

When a call to a BIF fails with the reason badarg it is not always obvious even to an experienced developer exactly which argument was "bad" and in which way. For a newcomer, having to figure out what a badarg means is another stumbling block standing in the way of mastering a new language.

Even for an experienced developer, figuring out the reason for a badarg exception for some BIFs is hard or impossible. For example, the documentation for ets:update_counter/4 at the time of writing lists 8 situations in which ets:update_counter/4 will fail. That number is too low. Missing from list are, for example, reasons such as the ETS table having been deleted or having insufficient access right.

Rationale

Why not change `badarg` to something more informational?

An alternative way to provide more information about errors would be to introduce additional exception reasons. For example, the call:

element(a, b)

could raise the exception:

{badarg,[{1,not_integer},{2,not_tuple}]}

That change could break code that expects that BIFs should raise a badarg exception. It is less likely that existing code would match the fourth entry in the stacktrace.

A related reason is the amount of work needed to revise the error handling code for all built-in functions. Implementing building of Erlang terms in C is tedious and error prone. There would always be a risk that bugs in that code would crash the runtime system when an error occurred. The test suite would have to be extremely thorough to ensure that all bugs were found, because error handling code is typically infrequently executed.

Why can't the stacktrace contain the complete error reason?

We did consider modifying the implementation of all BIFs so that they would produce complete error information in the stacktrace when they failed. However, as mentioned earlier, building Erlang terms in C is tedious and error prone.

With the approach we have taken to let Erlang code do most of the analysis of the error reason, there is a much lower risk that error handling would crash the application or runtime system.

Why are the the reasons in the `ErrorInfoMap` undocumented?

The reason in the ErrorInfoMap is not meant to be used for programmatically figure out why an error occurred, but only to be used by Module:format_error/4 to produce a human-readable message.

Also, for many BIFs the reason will not have a meaningful value, as the Module:format/4 function will produce the messages based solely on the name of the BIF and its arguments.

Backwards Compatibility

All exceptions from BIFs will now have a ExtraInfo element (called Location in the documentation for OTP 23) in the call-stack back trace (stacktrace) that includes an error_info tuple. In previous releases the ExtraInfo element would be an empty list for a failed BIF call.

Applications that explicitly do matching on the stacktrace and do assumptions of the layout of the ExtraInfo element (for example, assuming that Location is either an empty list or a list of file and line tuples in a specific order) may need modifications. Note that such assumptions have never been safe and that the documentation for error handling strongly discourages developers to rely on stacktrace entries for purposes other than debugging.

Implementation

The reference implementation includes extended error information for most BIFs implemented in C in the erlang and ets modules. It can be found in PR #2849.

Copyright

This document has been placed in the public domain.