Author: Fred Hebert <mononcqc(at)ferd(dot)ca>
Status: Draft
Type: Standards Track
Created: 31-Aug-2018
Post-History:
This EEP adds a contextual <~
operator to begin ... end
expressions,
which allows them to be usable for value-based error handling, based on
standard {ok, term()} | {error, term()}
return value types.
This lets begin ... end
become a control flow construct to replace or
simplify deeply-nested case ... end
expressions, and prevent using
exceptions for control flow.
This document has been placed in the public domain.
The current syntax for a begin ... end
expression is:
begin
Exprs
end
The expression does not have a restricted scope, and is mostly used to
group multiple distinct expressions as a single block. We propose a new
type of expressions (denoted UnwrapExprs
), only valid within a
begin ... end
expression:
begin
Exprs | UnwrapExprs
end
UnwrapExprs
are defined as having the following form:
Pattern <~ Expr
This definition means that UnwrapExprs
are only allowed at the
top-level of begin ... end
expressions.
The <~
operator takes the value return by Expr
and inspects it.
If the value is a tuple of the form {ok, Val}
, it unwraps Val
from
the tuple, and matches it against Pattern
.
If the pattern matches, all variables from Pattern
are bound in the
local environment, and the full value {ok, Val}
is returned by the
UnwrapExpr
. If the value does not match, a {badunwrap, Val}
error
is raised.
A special case exists when Pattern
is the match-all
variable (_
), which on top of allowing the value to be considered a successful
unwrapping if the returned value from Expression
is {ok, term()}
,
it also considers the atom ok
to be valid as well.
If the value is a tuple of the form {error, Reason}
, then the entire
begin ... end
expression is short-circuited and returns {error,
Reason}
. The variables that were bound in there remain bound, the
rest are undefined.
The compiler should warn about any variable that is used after the
begin ... end
expression that was bound in or after the first
UnwrapExpr
encountered within the block.
If the value returned does not match any of {ok | error, term()}
as a
type, a {badunwrap, Val}
error is raised.
Given the structure described here, the final expression may look like:
begin
Foo = bar(),
X <~ id({ok, 5}),
[H|T] <~ id({ok, [1,2,3]}),
...
end
Do note that to allow easier pattern matching and more intuitive usage,
the <~
operator should have associativity rules lower than =
, such that:
begin
X = [H|T] <~ exp()
end
is a valid UnwrapExp
equivalent to the non-infix form '<~'('='(X, [H|T]),
exp())
, since reversing the priorities would give '='('<~'(X, [H|T]),
exp())
, which would create an UnwrapExp
out of context and be invalid.
Erlang has some of the most flexible error handling available across a large number of programming languages. The language supports:
throw
, error
, exit
)
catch Exp
try ... [of ...] catch ... [after ...] end
exit/2
, and trap_exit
{ok, Val} | {error, Term}
, {ok, Val} |
false
, or ok | {error, Val}
So why should we look to add more? There are various reasons for this, incuding trying to reduce deeply nested conditional expressions, cleaning up some messy patterns found in the wild, providing a better separation of concern when implementing functions, and encouraging more standard and idiomatic interfaces.
One common pattern that can be seen in Erlang is deep nesting of case
... end
expressions, to check complex conditionals.
Take the following code taken from Mnesia, for example:
commit_write(OpaqueData) ->
B = OpaqueData,
case disk_log:sync(B#backup.file_desc) of
ok ->
case disk_log:close(B#backup.file_desc) of
ok ->
case file:rename(B#backup.tmp_file, B#backup.file) of
ok ->
{ok, B#backup.file};
{error, Reason} ->
{error, Reason}
end;
{error, Reason} ->
{error, Reason}
end;
{error, Reason} ->
{error, Reason}
end.
The code is nested to the extent that shorter aliases must be introduced
for variables (OpaqueData
renamed to B
), and half of the code just
transparently returns the exact values each function was given.
By comparison, the same code could be written as follows with the new construct:
commit_write(OpaqueData) ->
begin
_ <~ disk_log:sync(OpaqueData#backup.file_desc),
_ <~ disk_log:close(OpaqueData#backup.file_desc),
_ <~ file:rename(OpaqueData#backup.tmp_file, OpaqueData#backup.file),
{ok, OpaqueData#backup.file}
end.
The semantics of this call are entirely identical, except that it is now much easier to focus on the flow of individual operations.
Frequent ways in which people work with sequences of failable operations include folds over lists of functions, and abusing list comprehensions. Both patterns have heavy weaknesses that makes them less than ideal.
Folds over list of functions use patterns such as those defined in posts from the mailing:
pre_check(Action, User, Context, ExternalThingy) ->
Checks =
[fun check_request/1,
fun check_permission/1,
fun check_dispatch_target/1,
fun check_condition/1],
Args = {Action, User, Context, ExternalThingy},
Harness =
fun
(Check, ok) -> Check(Args);
(_, Error) -> Error
end,
case lists:foldl(Harness, ok, Checks) of
ok -> dispatch(Action, User, Context);
Error -> Error
end.
This code requires declaring the functions one by one, ensuring the entire context is carried from function to function. Since there is no shared scope between functions, all functions must operate on all arguments.
By comparison, the same code could be implemented with the new construct as:
pre_check(Action, User, Context, ExternalThingy) ->
begin
_ <~ check_request(Context, User),
_ <~ check_permissions(Action, User),
_ <~ check_dispatch_target(ExternalThingy),
_ <~ check_condition(Action, Context),
dispatch(Action, User, Context)
end
And if there was a need for derived state between any two steps, it would be easy to weave it in:
pre_check(Action, User, Context, ExternalThingy) ->
begin
_ <~ check_request(Context, User),
_ <~ check_permissions(Action, User),
_ <~ check_dispatch_target(ExternalThingy),
DispatchData <~ dispatch_target(ExternalThingy),
_ <~ check_condition(Action, Context),
dispatch(Action, User, Context)
end
The list comprehension hack, by comparison, is a bit more rare. In fact, it is mostly theoretical. Some things that hint at how it could work can be found in Diameter test cases or the PropEr plugin for Rebar3.
Its overal form uses generators in list comprehensions to tunnel a happy path:
[Res] =
[f(Z) || {ok, W} <- [b()],
{ok, X} <- [c(W)],
{ok, Y} <- [d(X)],
Z <- [e(Y)]],
Res.
This form doesn't see too much usage since it is fairly obtuse and I suspect most people have either been reasonable enough not to use it, or did not think about it. Obviously the new form would be cleaner:
begin
W <~ b(),
X <~ c(W),
Y <~ d(X),
Z = e(Y),
f(Z)
end
which on top of it, has the benefit of returning an error value if one is found.
This form is not necessarily obvious at a first glance. To better
expose it, let's take a look at some functions defined in the
release_handler
module in
OTP:
write_releases_m(Dir, NewReleases, Masters) ->
RelFile = filename:join(Dir, "RELEASES"),
Backup = filename:join(Dir, "RELEASES.backup"),
Change = filename:join(Dir, "RELEASES.change"),
ensure_RELEASES_exists(Masters, RelFile),
case at_all_masters(Masters, ?MODULE, do_copy_files,
[RelFile, [Backup, Change]]) of
ok ->
case at_all_masters(Masters, ?MODULE, do_write_release,
[Dir, "RELEASES.change", NewReleases]) of
ok ->
case at_all_masters(Masters, file, rename,
[Change, RelFile]) of
ok ->
remove_files(all, [Backup, Change], Masters),
ok;
{error, {Master, R}} ->
takewhile(Master, Masters, file, rename,
[Backup, RelFile]),
remove_files(all, [Backup, Change], Masters),
throw({error, {Master, R, move_releases}})
end;
{error, {Master, R}} ->
remove_files(all, [Backup, Change], Masters),
throw({error, {Master, R, update_releases}})
end;
{error, {Master, R}} ->
remove_files(Master, [Backup, Change], Masters),
throw({error, {Master, R, backup_releases}})
end.
At a glance, it is very difficult to clean up this code: there are 3 multi-node operations (backing up, updating, and moving release data), each of which relies on the previous one to succeed.
You'll also notice that each error requires special handling, reverting or removing specific operations on success or on failure. This is not a simple question of tunnelling values in and out of a narrow scope.
Another thing to note is that this module, as a whole (and not just the
snippet presented here) uses throw
expressions to operate non-local
return. The actual point of return handling these is spread through
various locations in the file:
create_RELEASES/4
,
and
write_releases_1/3
for example.
The case catch Exp of
form is used throughout the file because
value-based error flow is painful in nested structures.
So let's take a look at how we could refactor this with the new construct:
write_releases_m(Dir, NewReleases, Masters) ->
RelFile = filename:join(Dir, "RELEASES"),
Backup = filename:join(Dir, "RELEASES.backup"),
Change = filename:join(Dir, "RELEASES.change"),
begin
_ <~ backup_releases(Dir, NewReleases, Masters, Backup, Change,
RelFile),
_ <~ update_releases(Dir, NewReleases, Masters, Backup, Change),
_ <~ move_releases(Dir, NewReleases, Masters, Backup, Change, RelFile)
end.
backup_releases(Dir, NewReleases, Masters, Backup, Change, RelFile) ->
case at_all_masters(Masters, ?MODULE, do_copy_files,
[RelFile, [Backup, Change]]) of
ok ->
ok;
{error, {Master, R}} ->
remove_files(Master, [Backup, Change], Masters)
{error, {Master, R, backup_releases}}
end.
update_releases(Dir, NewReleases, Masters, Backup, Change) ->
case at_all_masters(Masters, ?MODULE, do_write_release,
[Dir, "RELEASES.change", NewReleases]) of
ok ->
ok;
{error, {Master, R}} ->
remove_files(all, [Backup, Change], Masters),
{error, {Master, R, update_releases}}
end.
move_releases(Dir, NewReleases, Masters, Backup, Change, RelFile) ->
case at_all_masters(Masters, file, rename, [Change, RelFile]) of
ok ->
remove_files(all, [Backup, Change], Masters),
ok;
{error, {Master, R}} ->
takewhile(Master, Masters, file, rename, [Backup, RelFile]),
remove_files(all, [Backup, Change], Masters),
{error, {Master, R, move_releases}}
end.
The only reasonable way to rewrite the code was to extract all three major multi-node operations into distinct functions. The improvements are:
throw()
would be needed in write_release_m
,
therefore separating the flow control details from specific function
implementations.As a control experiment, let's try reusing our shorter functions with the previous flow:
%% Here is the same done through exceptions:
write_releases_m(Dir, NewReleases, Masters) ->
RelFile = filename:join(Dir, "RELEASES"),
Backup = filename:join(Dir, "RELEASES.backup"),
Change = filename:join(Dir, "RELEASES.change"),
try
ok = backup_releases(Dir, NewReleases, Masters, Backup, Change,
RelFile),
ok = update_releases(Dir, NewReleases, Masters, Backup, Change),
ok = move_releases(Dir, NewReleases, Masters, Backup, Change, RelFile)
catch
{error, Reason} -> {error, Reason}
end.
backup_releases(Dir, NewReleases, Masters, Backup, Change, RelFile) ->
case at_all_masters(Masters, ?MODULE, do_copy_files,
[RelFile, [Backup, Change]]) of
ok ->
ok;
{error, {Master, R}} ->
remove_files(Master, [Backup, Change], Masters)
throw({error, {Master, R, backup_releases}})
end.
update_releases(Dir, NewReleases, Masters, Backup, Change) ->
case at_all_masters(Masters, ?MODULE, do_write_release,
[Dir, "RELEASES.change", NewReleases]) of
ok ->
ok;
{error, {Master, R}} ->
remove_files(all, [Backup, Change], Masters),
throw({error, {Master, R, update_releases}})
end.
move_releases(Dir, NewReleases, Masters, Backup, Change, RelFile) ->
case at_all_masters(Masters, file, rename, [Change, RelFile]) of
ok ->
remove_files(all, [Backup, Change], Masters),
ok;
{error, {Master, R}} ->
takewhile(Master, Masters, file, rename, [Backup, RelFile]),
remove_files(all, [Backup, Change], Masters),
throw({error, {Master, R, move_releases}})
end.
Very little changes in the three distributed functions. However, the weakness
of this approach is that we have intimately tied implementation details of the
small functions to their parent's context. This makes it hard to reason about
these functions in isolation or to reuse them in a different context.
Furthermore, the parent function may capture throws
not intended for it.
It is my opinion that using value-based flow control, through similar refactorings, yields safer and cleaner code, which also happens to have far more reduced levels of nesting. It should therefore be possible to express more complex sequences of operations without making them any harder to read, nor reason about in isolation.
That is in part due to the nesting, but also because we take a more compositional approach, where there is no need to tie local functions' implementation details to the complexity of their overall pipeline and execution context.
It is also the best way to structure code in order to handle all exceptions and to provide the context they need as close as possible to their source, and as far as possible from the execution flow.
In Erlang, true
and false
are regular atoms that only gained special
status through usage in boolean expressions. It would be easy to think
that more functions would return yes
and no
were it not from control
flow constructs.
Similarly, undefined
has over years of use become a kind of default
"not found" value. Values such as nil
, null
, unknown
, undef
,
false
and so on have seen some use, but a strong consistency in format
has ended up aligning the community on one value.
When it comes to return values for various functions, {ok, Term}
is
the most common one for positive results that need to communicate a
value, ok
for positive results with no other value than their own
success, and {error, Term}
is most often uses for errors. Pattern
matching and assertions have enforced that it is easy to know whether a
call worked or not by its own structure.
However, many success values are still larger tuples: {ok, Val,
Warnings}
, {ok, Code, Status, Headers, Body}
, and so on. Such
variations are not problematic on their own, but it would likely not
hurt too much either to use {ok, {Val, Warnings}}
or {ok, {Code,
Status, Headers, Body}}
.
In fact, using more standard forms could lead to easier generalizations and abstractions that can be applied to community-wide code. By choosing specific formats for control flow on value-based error handling, we explicitly encourage this form of standardization.
This section will detail the decision-making behind this EEP, including:
begin ... end
as a construct and its scope{badunwrap, Val}
as a default exceptionThere's a lot of content to cover here.
Multiple languages have value-based exception handling, many of which have a strong functional slant.
The most famous case is possibly Haskell with the Maybe
monad, which
uses either Nothing
(meaning the computation returned nothing) or
Just x
(their type-based equivalent of {ok, X}
). The union of both
types is denoted Maybe x
. The following examples are taken from
Haskell/Understanding
monads/Maybe.
Values for such errors are tagged in functions as follows:
safeLog :: (Floating a, Ord a) => a -> Maybe a
safeLog x
| x > 0 = Just (log x)
| otherwise = Nothing
Using the type annotations directly, it is possible to extract values (if any) through pattern matching:
zeroAsDefault :: Maybe Int -> Int
zeroAsDefault mx = case mx of
Nothing -> 0
Just x -> x
One thing to note here is that as long as you are not able to find a
value to substitute for Nothing
or that you cannot take a different
branch, you are forced to carry that uncertainty with you through all
the types in the system.
This is usually where Erlang stops. You have the same possibilities (albeit dynamically checked), along with the possibility of transforming invalid values into exceptions.
Haskell, by comparison, offers monadic operations and its do notation to abstract over things:
getTaxOwed name = do
number <- lookup name phonebook
registration <- lookup number governmentDatabase
lookup registration taxDatabase
In this snippet, even though the lookup
function returns a Maybe x
type, the do notation abstracts away the Nothing
values, letting the
programmer focus on the x
part of Just x
. Even though the code is
written as if we can operate on discrete value, the function
automatically re-wraps its result into Just x
and any Nothing
value
just bypasses operations.
As such, the developer is forced to acknowledge that the whole function's flow is conditional to values being in place, but they can nevertheless write it mostly as if everything were discrete.
OCaml supports exceptions, with constructs such as raise (Type
"value")
to raise an exception, and try ... with ...
to handle them.
However, since exceptions wouldn't be tracked by the type system,
maintainers introduced a Result
type.
The type is defined as
type ('a, 'b) result =
| Ok of 'a
| Error of 'b
which is reminiscent of Erlang's {ok, A}
and {error, B}
. OCaml users
appear to mostly use pattern matching, combinator libraries, and monadic
binding to deal with value-based error handling, something similar to
Haskell's usage.
Rust defines two types of errors: unrecoverable ones (using panic!
)
and recoverable ones, using the Error<T, E>
values. The latter is of
interest to us, and defined as:
enum Result<T, E> {
Ok(T),
Err(E),
}
Which would intuitively translate to Erlang terms {ok, T}
and {error,
E}
. The simple way to handle these in Rust is through pattern matching:
let f = File::open("eep.txt");
match f {
Ok(file) => do_something(file),
Err(error) => {
panic!("Error in file: {:?}", error)
},
};
Specific error values have to be well-typed, and it seems that the Rust community is still debating implementation details about how to best get composability and annotations within a generic type.
However, their workflow for handling these is well-defined already. This
pattern matching form has been judged too cumbersome. To automatically
panic on error values, the .unwrap()
method is added:
let f = File::open("eep.txt").unwrap();
In Erlang, we could approximate this with:
unwrap({ok, X}) -> X;
unwrap({error, T}) -> exit(T).
F = unwrap(file:open("eep.txt", Opts)).
Another construct exists to return errors to caller code more directly,
without panics, with the ?
operator:
fn read_eep() -> Result<String, io::Error> {
let mut h = File::open("eep.txt")?;
let mut s = String::new();
h.read_to_string(&mut s)?;
Ok(s)
}
Any value Ok(T)
encountering ?
is unwrapped. Any value Err(E)
encountering ?
is returned to the caller as-is, as if a match
with
return
had been used. This operator however requires that the
function's type signature use the Result<T, E>
type as a return value.
Prior to version 1.13, Rust used the try!(Exp)
macro to the same
effect, but found it too cumbersome. Compare:
try!(try!(try!(foo()).bar()).baz())
foo()?.bar()?.baz()?
Swift supports exceptions, along with type annotations declaring that a
function may raise exceptions, and do ... catch
blocks.
There is a special operator try?
which catches any thrown exception
and turns it into nil
:
func someThrowingFunction() throws -> Int {
// ...
}
let x = try? someThrowingFunction()
Here x
can either have a value of Int
or nil
. The data flow is
often simplified by using let
assignments in a conditional expression:
func fetchEep() -> Eep? {
if let x = try? fetchEepFromDisk() { return x }
if let x = try? fetchEepFromServer() { return x }
return nil
}
Go has some fairly anemic error handling. It has panics, and error values. Error values must be assigned (or explicitly ignored) but they can be left unchecked and cause all kinds of issues.
Nevertheless, Go exposed plans for new error handling in future versions, which can be interesting.
Rather than changing semantics of their error handling, Go designers are mostly considering syntactic changes to reduce the cumbersome nature of their errors.
Go programs typically handled errors as follows:
func main() {
hex, err := ioutil.ReadAll(os.Stdin)
if err != nil {
log.Fatal(err)
}
data, err := parseHexdump(string(hex))
if err != nil {
log.Fatal(err)
}
os.Stdout.Write(data)
}
The new proposed mechanism looks as follows:
func main() {
handle err {
log.Fatal(err)
}
hex := check ioutil.ReadAll(os.Stdin)
data := check parseHexdump(string(hex))
os.Stdout.Write(data)
}
The check
keyword asks to implicitly check whether the second return
value err
is equal to nil
or not. If it is not equal to nil
, the
latest defined handle
block is called. It can return the result out to
exit the function, repair some values, or simply panic, to name a few
options.
Elixir has a slightly different semantic approach to error handling compared
to Erlang. Exceptions are discouraged for control flow (while Erlang
specifically uses throw
for it), and the with
macro is introduced:
with {:ok, var} <- some_call(),
{:error, _} <- fail(),
{:ok, x, y} <- parse_name(var)
do
success(x, y, var)
else
{:error, err} -> handle(err)
nil -> {:error, nil}
end
The macro allows a sequence of pattern matches, after which the ˋdo ...ˋ block is called. If any of the pattern matches fails, the failing value gets re-matched in the optional ˋelse ... end` section.
This is the most general control flow in this document, being fully flexible with regards to which values it can handle. This was done in part because there is not a strong norm regarding error or valid values in either the Erlang nor Elixir APIs, at least compared to other languages here.
This high level of flexibility has been criticized in some instances as being a bit confusing: it is possible for users to make error-only flows, success-only flows, mixed flows, and consequently the ˋelseˋ clause can become convoluted.
The OK library was released to
explicitly narrow the workflow to well-defined errors. It supports three forms,
the first of which is the for
block:
OK.for do
user <- fetch_user(1)
cart <- fetch_cart(1)
order = checkout(cart, user)
saved_order <- save_order(order)
after
saved_order
end
It works by only matching on {:ok, val}
to keep moving forwards when
using the <-
operator: the fetch_user/1
function above must return
{:ok, user}
in order for the code to proceed. The =
operator is
allowed for pattern matches the same way it usually does within Elixir.
Any return value that matches {:error, t}
ends up returning directly
out of the expression. The after ... end
section takes the last value
returned, and if it isn't already in a tuple of the form {:ok val}
, it
wraps it as such.
The second variant is the try
block:
OK.try do
user <- fetch_user(1)
cart <- fetch_cart(1)
order = checkout(cart, user)
saved_order <- save_order(order)
after
saved_order
rescue
:user_not_found -> {:error, missing_user}
end
This variant will capture exceptions as well (in the rescue
block),
and will not re-wrap the final return value in the after
section.
The last variant for the library is the pipe:
def get_employee_data(file, name) do
{:ok, file}
~>> File.read
~> String.upcase
end
The goal of this variant is to simply thread together operations that
could result in either a success or error. The ~>>
operator matches
and returns an {:ok, term}
tuple, and the ~>
operator wraps a value
into an {:ok, term}
tuple.
begin ... end
ExpressionsAbstractions over error flow requires to define a scope limiting the
way flow is controlled. Before choosing the begin ... end
expression,
the following items needed consideration:
begin ... end
In the languages mentioned earlier, two big error handling categories seem to emerge.
The first group of language seems to track their error handling at the
function level. For example, Go uses return
to return early from the
current function. Swift and Rust also scope their error handling
abstractions to the current function, but they also make use of their
type signatures to keep information about the control flow
transformations taking place. Rust uses the Result<T, E>
type
signature to define what operations are valid, and Swift asks of
developers that they either handle the error locally, or annotate the
function with throws
to make things explicit.
On the other hand, Haskell's do notation is restricted to specific expressions, and so are all of Elixir's mechanisms.
Erlang, Haskell, and Elixir all primarily use recursion as an iteration
mechanism, and (outside of Haskell's monadic constructs) do not support
return
control flow; it is conceptually more difficult for a return
(or break
) to be useful when iteration requires recursion:
"returning" by exiting the current flow may not bail you out of what the
programmer might consider a loop, for example.
Instead, Erlang would use throw()
exceptions as a control flow
mechanism for non-local return, along with a catch
or a try ...
catch
. Picking a value-based error handling construct that acts at the
function level would not necessarily be very interesting since almost
any recursive procedure would still require using exceptions.
As such, it feels simpler to use a self-contained construct built to specifically focus on sequences of operations that contain value-based errors.
Prior attempts at abstracting value-based error handling in Erlang overloaded special constructs with parse transforms in order to provide specific workflows.
For example, the fancyflow
library tried to abstract the following code:
sans_maybe() ->
case file:get_cwd() of
{ok, Dir} ->
case
file:read_file(
filename:join([Dir, "demo", "data.txt"]))
of
{ok, Bin} ->
{ok, {byte_size(Bin), Bin}};
{error, Reason} ->
{error, Reason}
end;
{error, Reason} ->
{error, Reason}
end.
as:
-spec maybe() -> {ok, non_neg_integer()} | {error, term()}.
maybe() ->
[maybe](undefined,
file:get_cwd(),
file:read_file(filename:join([_, "demo", "data.txt"])),
{ok, {byte_size(_), _}}).
And Erlando would replace:
write_file(Path, Data, Modes) ->
Modes1 = [binary, write | (Modes -- [binary, write])],
case make_binary(Data) of
Bin when is_binary(Bin) ->
case file:open(Path, Modes1) of
{ok, Hdl} ->
case file:write(Hdl, Bin) of
ok ->
case file:sync(Hdl) of
ok ->
file:close(Hdl);
{error, _} = E ->
file:close(Hdl),
E
end;
{error, _} = E ->
file:close(Hdl),
E
end;
{error, _} = E -> E
end;
{error, _} = E -> E
end.
With monadic constructs in list comprehensions:
write_file(Path, Data, Modes) ->
Modes1 = [binary, write | (Modes -- [binary, write])],
do([error_m ||
Bin <- make_binary(Data),
Hdl <- file:open(Path, Modes1),
Result <- return(do([error_m ||
file:write(Hdl, Bin),
file:sync(Hdl)])),
file:close(Hdl),
Result]).
Those cases specifically aimed for a way to write sequences of operations where pre-defined semantics are bound by a special context, but are limited to overloading constructs rather than introducing new ones.
By comparison, most of Erlang's control flow expressions follow similar structures. See the following most common ones:
case ... of
Pattern [when Guard] -> Expressions
end
if
Guard -> Expressions
end
begin
Expressions
end
receive
Pattern [when Guard] -> Expressions
after % optional
IntegerExp -> Expressions
end
try
Expressions
of % optional
Pattern [when Guard] -> Expressions
catch % optional
ExceptionPattern [when Guard] -> Expressions
after % optional
Expressions
end
It therefore logically follows that if we were to add a new construct, it should be of the form
<keyword>
...
end
The questions remaining are: which keyword to choose, and which clauses to support.
begin ... end
Initially, a format similar to Elixir's with
expression was being
considered:
<keyword>
Expressions | UnwrapExpressions
of % optional
Pattern [when Guard] -> Expressions
end
With this construct, the basic <keyword> ... end
form would follow the
currently proposed semantics, but the of ...
section would allow
pattern matching on any return value from the expression, whether
{error, Reason}
or any non-exception value returned by the last
expression in the main section.
This form would be in line with what try ... of ... catch ... end
allows: once the main section is covered, more work can be done within
the same construct.
However, try ... of ... catch ... end
has a specific reason for
introducing the patterns and guards: protected code impacting tail
recursion.
In a loop such as:
map_nocrash(_, []) -> [];
map_nocrash(F, [H|T]) ->
try
F(H)
of
Val -> [Val | map_nocrash(F, T)]
catch
_:_ -> map_nocrash(F, T)
end.
The of
section allows to continue doing work in the case no exception
has happened, without having to protect more than the current scope of
the function, nor preventing tail-recursion by forcing a presence of
each iteration on the stack.
No such concerns exist for value-based error handling, and while the
of ... end
section might be convenient at times, it is strictly not
necessary for the construct to be useful.
What was left was to choose a name. Initially, the <keyword>
value
chosen was maybe
, based on the Maybe monad. The problem is that
introducing any new keyword carries severe risks to backwards
compatibility.
For example, all of the following words were considered:
======= ================= =========================================
Keyword Times used in OTP Rationale
as a function
======= ================= =========================================
maybe 0 can clash with existing used words,
otherwise respects the spirit
option 88 definitely clashes with existing code
opt 68 definitely clashes with existing code
check 49 definitely clashes with existing code
let 0 word is already reserved and free, but
makes no sense in context
cond 0 word is already reserved and free, may
make sense, but would prevent the
addition of a conditional expression
given 0 could work, kind of respects the context
when 0 reserved for guards, could hijack in new
context but may be confusing
begin 0 carries no conditional meaning, mostly
free for overrides
Initially, this proposal expected to use the maybe
keyword:
maybe
Pattern <op> Exp,
...
of
Pattern -> Exp % optional
end
but for the reasons mentioned in the previous section, the of ...
section became non-essential.
Then, with the strong requirements for backwards compatibility making it
difficult to introduce new keywords, along with the possibility to reuse
begin
without changing any of its current behavior, this form became the
most interesting one.
The term begin
is also reminiscent of transactions and abortive
contexts, which means that although not an ideal fit for value-based
error flow, it is also not entirely outlandish and could accept the new
added optional semantics without being too out of place.
In order to form UnwrapExpr
, there is a need for a mechanism to
introduce pattern matching with distinct semantics from regular pattern
matching.
A naive parse transform approach with fake function calls would be the most basic way to go:
begin
unwrap(Pattern, Exp),
% variables bound in Pattern are available in scope
end
However, this would introduce pattern matches in non-left-hand-side positions and make nesting really weird to deal with without exposing parse transform details and knowing how the code is translated.
A prefix keyword such let <Pattern> = <Exp>
could also be used.
Such keywords unfortunately suffer the same issues as maybe
would
have, and let
typically has different implications.
An infix operator seems like a good fit since pattern matching already uses them in multiple forms:
=
is used for pattern matches. Overloading it in error flow would
prevent regular matching from being used:=
is used for maps; using it could work, but would certainly be
confusing when handling nested maps in a pattern<-
could make sense. It is already restricted in scope to list and
binary comprehensions and would therefore not clash nor be confused.
However, the existing semantics of the operator imply a literal
pattern match working like a filter. We're looking for the filter-like
approach, but want to introduce implicit elements ({ok|error, ...}
)<=
same as <-
but for binary generatorsIt would make sense to check for new operators specifically for this context given the semantics:
======= ===========================================================
Operator Description
======= ===========================================================
#= no clash with other syntax (maps, records, integers), no
clash with abstract patterns EEP either.
!= No clash with message passing, but is sure to annone used
to C-style inequality checks
<~ Works with no known conflict; shouldn't clash with ROK's
frame proposals (uses infix ~ and < > as delimiters).
<| reverse pipe operator. No obvious clash either
There is no strong argument for or against most of these. The choice of
<~
mostly comes down to having similarity to list comprehensions' <-
operator both in semantics and appearance, although being different
overall.
Within the expected usage of the unwrap expressions, the <~
operator
needs to have a precedence rule such that:
X = {Y,X} <~ <Exp>
Is considered a valid pattern match operation with X = {Y,X}
being the
whole left-hand-side pattern, such that operation priorities are:
lhs <~ rhs
Instead of
lhs = rhs <~ <...>
In all other regards, the precedence rules should be the same as =
in
order to provide the most unsurprising experience possible.
Other approaches were considered in making this proposal, and ultimately disregarded.
with
The Elixir approach is fairly comprehensive, and rather powerful. Rather than handling success or errors, it generalizes over pattern matching as a whole.
To explore bringing these semantics into the current proposed construct,
we will use the <-
operator from list comprehensions to mean "match
the whole pattern or exit the block". So instead of
begin
{X,Y} <~ id({ok, {X,Y}})
...
end
We would have to write:
begin
{ok, {X,Y}} <- id({ok, {X,Y}})
...
end
While this mechanism is fine to handle skipping pattern, it has some problematic weaknesses in the context of error handling.
One example of this could be taken from the OTP pull request that adds new return value to packet reading based on inet options: #1950.
This PR adds a possible value for packet reception to the current form:
{ok, {PeerIP, PeerPort, Data}}
To ask make it possible to alternatively get:
{ok, {PeerIP, PeerPort, AncData, Data}}
Based on socket options set earlier. So let’s put it in context for the current proposal:
begin
{X,Y} <~ id({ok, {X,Y}}),
{PeerIP, PeerPort, Data} <~ gen_udp:recv(...),
...
end
If AncData
is received, an exception is raised: the value was not an
error but didn’t have the shape or type expected for the successful
pattern to match. Errors are still returned properly by exiting the
begin ... end
block, and we ensure correctness in what we handle and
return.
However, had we used this generalized form:
begin
{ok, {X,Y}} <- id({ok, {X,Y}}),
{ok, {PeerIP, PeerPort, Data}} <- gen_udp:recv(...),
...
end
Since the <-
operator would force a return on any non-matching value,
the whole expression, if the socket is misconfigured to return
AncData
, would return {ok, {PeerIP, PeerPort, AncData, Data}}
on a
failure to match.
Basically, an unexpected but good result could be returned from a
function using the begin ... end
construct, which would look like a
success while it was actually a complete failure to match and handle the
information given. This is made even more ambiguous when data has the
right shape and type, but a set of bound variables ultimately define
whether the match succeeds or fails (in the case of a UDP socket,
returning values that comes from the wrong peer, for example).
In worst cases, It could let raw unformatted data exit a conditional
pipeline with no way to detect it after the fact, particularly if later
functions in begin ... end
apply transformations to text, such as
anonymizing or sanitizing data. This could be pretty unsafe
and near impossible to debug well.
Think for example of:
-spec fetch() -> iodata().
fetch() ->
begin
{ok, B = <<_/binary>>} <- f(),
true <- validate(B),
{ok, sanitize(B)}
end.
If the value returned from f()
turns out to be a list (say it’s a
misconfigured socket using list
instead of binary
as an option), the
expression will return early, the fetch()
function will still return
{ok, iodata()}
but you couldn’t know as a caller whether it is the
transformed data or non-matching content. It would not be obvious to
most developers either that this could represent a major security risk
by allowing unexpected data to be seen as clean data.
This specific type of error is in fact possible in Elixir, but no such warning appears to have been circulating within its community so far.
It is basically a risky pattern if you want your code to be strict or future-proof in the context of error handling. The current proposal, by comparison, would raise an exception on unexpected good values, therefore preventing ways to sneak such data into your control flow:
-spec fetch() -> iodata().
fetch() ->
begin
B = <<_/binary>> <~ f(),
_ <~ validate(B), % returns ok if valid
{ok, sanitize(B)}
end.
Here misconfigured sockets won’t result in unchecked data passing trough your app.
The only way to give a similar amount of safety to the general pattern
approach is through an else
clause which handles all known patterns to
implicitly exclude all unknown patterns:
-spec fetch() -> iodata().
fetch() ->
begin
{ok, B = <<_/binary>>} <- f(),
true <- validate(B),
{ok, sanitize(B)}
else
{error, _} = E -> E;
false -> false
end.
This is the solution Elixir uses as well. Unless the clause is mandatory
(it is not in Elixir), this level of additional matching is purely
optional; the developer has no obvious incentive to go and handle these
errors, and if they do, the exception raised will be through a missing
clause in the else
section, which will obscure its origin and line
nubmer.
It would also allow some functions to return unexpected values from
other ones. In the previous example, f()
must be allowed to return
false
if validate(B)
may return it. There is no way to separate such
clauses.
None of these problems exist as long as we normalize the matching
mechanism on well-defined "good" and "bad" values (ok | {ok, Term}
and
{error, Term}
). This separation between good and bad values allows to
know what needs to return early without conflicts with what is a valid
or invalid pattern.
From the moment we decide to pick such values, unwrapping them in
patterns can make code clearer: {error, X} <- exp()
would be a pattern
that can never match by definition, since only good values are allowed
to go through and all errors return early. Automatically unwrapping good
values prevents such nonsensical expressions.
These tricky corner cases explain why the <~
pattern is preferred to
the general <-
pattern's semantics in this proposal.
One approach or pain point frequently brough up about Erlang concern pipelining of operations. Could it be possible to make some operations easier to chain?
If we take a set of functions f()
, g()
, and h()
that all return
{ok | error, _}
tuples, current day Erlang requires:
{ok, X} = f(),
{ok, Y} = g(X),
{ok, Z} = h(Y),
Z
Could there be an easier way to handle this type of chaining, based on
say, an unwrap
function:
unwrap({ok, X}) -> X.
main() ->
unwrap(h(unwrap(g(unwrap(f()))))).
And it appeared that generally, this turns out to be simple enough to do with the earlier fold approach we had mentioned.
Overall, the various existing mechanisms appeared slightly inconvenient, but not inconvenient enough to be worth adding a whole new language mechanism just for it.
cond
and cond let
Anthony Ramine recommended looking into reusing the already reserved
cond
and let
keywords. He mentioned Rust planning something based on
these and how it could be ported to Erlang based on his prior work on
supporting the cond
construct within the language.
The proposed mechanism would look like:
cond
X > 5 -> % regular guard
Exp;
f() < 18 -> % function used in guard, as originally planned
Exp;
let {ok, Y} = exp(), Y < 5 ->
Exp
end
The last clause would allow Y
to be used in its own branch only if it
matches and all guards succeed; if the binding fails, a switch is
automatically made to the next branch.
As such, more complex sequences of operations could be covered as:
cond
let {ok, _} = call1(),
let {ok, _} = call2(),
let Res = call3() ->
Res;
true ->
AlternativeBranch
end
This mechanism is, in my opinion, worth exploring and maybe adding to the language, but on its own does not adequately solve error handling flow issues since errors cannot be exracted easily from failing operations.
Auto-wrapping return values is something the Elixir's OK
library does,
as well as Haskell's do notation, but that neither Rust nor Swift does.
It seems that there is no very clear consensus on what could be done.
Thus, for the simplicity of the implementation and backards
compatibility of the begin ... end
expression, just returning the
value as-is without auto-wrapping seems sensible.
It would therefore be up to the developer to just return whatever value best matches their function's type signature, making easier to still integrate return values with the system they have.
It also lets sequences of operations potentially return ok
on success,
even if their individual functions returned values such as true
, for
example, rather than {ok, true}
.
It is kind of straightforward why {ok, V}
and {error, T}
are used in
pattern matches as error values: they're the most standard way to
communicate a value and an error in non-overlapping patterns whichever
way you want to match.
On the other hand, it is less obvious why _ <~ Exp
should positively
match on ok
alone, and why, for example, error
as an atom would
raise an exception as not matching any patterns.
The reason ok
is considered valid can be found in comparing common
Erlang return values with their matches in other languages.
The following functions return ok
when everything went well but
nothing is worth reporting. The list is not exhaustive:
lists:foreach/2
file
moduledisk_log
io
modulelogger
moduleapplications
module interacting with config and
starting or loading applicationsThe pattern is fully entrenched as a core pattern in Erlang and OTP, and very attached to side-effectful operations.
The interesting aspect comes from seeing what Rust does for similar
functions, which is just return their own unit type, denoted as ()
.
When used with the Result
types, it is to be returned a OK(())
.
The Erlang equivalent would probably be {ok, undefined}
, but ok
as a
single atom currently plays that role fine, and so it was decided to
support it; it will let error flow integrate well with side-effectful
functions.
The same cannot be said of error
as an atom result. Most errors can
and should return context with them that qualifies the error result,
since they often have more than one reason to fail. As evidence for this
line of thought, it is currently not possible to raise exceptions
without a Reason
, whether done through throw/1
, error/1
,
exit/1-2
, or raise/3
.
Aligning with the standard practices in the Erlang language validate
using _ <~ Exp
as a pattern suitable for ok
, and only this pattern
since it allows to basically match on what would be a non-existing value
that wouldn't need to be bound in further contexts.
Discussions on earlier drafts of this proposal asked whether it would
make sense to choose all good values to be those in a tuple starting
with ok
(ok | {ok, _} | {ok, _, _} | ...
), and all error values all
those starting with error ({error, _} | {error, _, _} | ...
).
This approach would allow more flexibility on possible error values, but would make composition more difficult. Let's take the following three function signatures as an example:
-spec f() -> ok | {error, term()}.
-spec g() -> {ok, term()} | {error, term(), term()}.
-spec h() -> {ok, term(), [warning()]} | {error, term()}.
If a single begin ... end
block calls to these as the potential return
value of a function, the caller now has to have the following type
specification:
-spec caller() -> ok | {ok, term()} | {ok, term(), [warning()]}
| {error, term()} | {error, term(), term()}.
As you call more and more functions and compose them together, the cross-section of what is a valid returning function grows in complexity and may even end up giving more trouble to tools such as Dialyzer.
By comparison, the currently suggested mechanism can never get more complex than:
-spec caller() -> ok | {ok, term()} | {error, term()}.
Or, if we prefer parametrized types:
-type result(E) :: ok | {error, E}.
-type result(R, E) :: {ok, R} | {error, E}.
-spec caller() -> result(term()) | result(term(), term()).
By restricting the possible patterns (and therefore return values), we can ensure better long-term composability and easier understanding of various such expressions.
The exception format proposed here is {badunwrap, Value}
. This format
is chosen following Erlang/OTP standards:
badarg
badarith
badfun
{badmatch, Val}
Since "unwrapping" is how the kind of operation where X
is extracted
from {ok, X}
, the name badunwrap
was chosen, along with the
mismatching value being borrowed from {badmatch, _}
.
The possibility of an early exit from a begin ... end
expression
means that variables declared within its scope are now potentially
unsafe to use outside of it.
This is a change of behaviour that brings begin
in line with the
variables bound within a case ... end
branch, a try/catch
clause, or
a receive ... end
branch.
This lack of safety only needs to be started at the first UnwrapExpr
encountered, since all variables bound before respect the same semantics
as the existing begin ... end
expression. If this analysis is done
rather than just declaring all variables as unsafe wholesale, then there
is no backwards compatibility concern to be had.
The need for a new operator means code built with support for the new expressions won't be portable to older Erlang releases.
No reference implementation is usually required at this step, but one is nevertheless provided in the original repository for this EEP draft, at bitbucket.org/ferd/unwrap/. The implementation uses parse transforms rather than an operator, since it would be difficult to add custom operators at this point of the process.