Author: Richard A. O'Keefe <ok(at)cs(dot)otago(dot)ac(dot)nz>
Status: Draft
Type: Standards Track
Erlang-Version: R12B-4
Created: 27-Aug-2008
Post-History:
A module may request that bit fields be range checked.
A new directive is added.
-bit_range_check(Wanted).
where Wanted is 'false' or 'true'.
Recall that a segment of a bit string (or binary) has the form
Value [':' Size] ['/' Type_Specifier_List]
where Type_Specifier_List
includes such things as 'integer',
'signed', and 'unsigned'. Currently the documentation states
that
"Signedness ... Only matters for matching and when the type
is integer. The default is unsigned."
Combining the Size
with the Unit
gives a Size_In_Bits
.
The on-line Erlang manual does not state in section 6.16 that
in constructing a bit string the bottom Size_In_Bits
bits of
an integer are used with the rest quietly ignored, but it is so.
The directive -bit_range_check(false)
makes explicit the
programmer's intention that this C-like truncation should happen.
The directive -bit_range_check(true)
says that it is a checked
run-time error in
Value:Size/unsigned-integer-unit:1
or constructions otherwise equivalent to it if Value does not
lie in the range 0 <= Value < 2**Size
, and it is a checked
run-time error in
Value:Size/signed-integer-unit:1
or constructions otherwise equivalent to it if Value does not
lie in the range -(2**(Size-1)) <= Value < 2**(Size-1)
.
The error that is raised is like the error that would be raised
for (1//0):Size/Type_Specifier_List
except for using 'badrange'
instead of 'badarith'.
The behaviour of integer bit syntax segments in the absence of
a -bit_range_check
directive is implementation defined and
subject to change.
The BEAM system is extended with a new instruction or instructions
similar to the existing instruction or instructions for integer
segments but checking the range. The compiler is extended to
generate them for <<...>>
expressions in the range of a
-bit_range_check(true)
directive.
A -bit_range_check
directive may not appear after a bit syntax
pattern or expression or after another -bit_range_check
directive.
It keeps on coming as an unpleasant surprise to Erlang programmers that this truncation happens. Quiet destruction of information is otherwise alien to Erlang: integer arithmetic is unbounded, not wrapped as in some (but not all) C systems; element/2 doesn't take indices modulo tuple size but raises an exception if the index is out of range, and so on.
In any case where the truncation is wanted, an Erlang programmer can already write
(Value rem 256):unsigned-integer
and the Erlang compiler could notice this and optimise the 'rem' operation away, so the truncation is not only unusual in Erlang, it is also unexpected in this particular case.
It is not only unexpected, it removes a chance to find mistakes, so it would seem to be undesirable.
Edwin Fine asked "How difficult could it be to add optional run- time checking to detect this condition without a serious risk of adverse effects on the correctness of Erlang run-time execution?"
Björn Gustavsson replied "it would be better to add optional support in the compiler to turn on checks (either for an entire module, or for individual segments of a binary). If someone writes an EEP, we will consider implementing it."
This is that EEP.
The Erlang/OTP team regard the old behaviour as a feature, and wish to retain it. In particular, they wish modules that were written expecting the old behaviour to continue to work (for now) without modification.
One alternative would be to add new syntax, such as having a new 'checked' specifier, so that
Value/checked-unsigned-integer
would require a value in the range 0..255. But many Erlang programmers will want to use this as the normal case, and will not like the safe version being so much more effort to write than the unsafe version.
It appears that "truncation wanted/not wanted" is not a matter of this expression or that, but of this programmer or that, and we can expect that each module will be written by someone expecting only one behaviour or expecting only the other.
Adding a
-bit_range_check(true).
directive to a module is more work than doing nothing at all, but programmers who want this behaviour should be able to set up their editing environment to have this line in their template for creating new Erlang modules.
There are several questions:
Bit strings: Assume X = <<5:3/unsigned-integer-unit:1>>
.
Currently, <<X:2/bits>>
quietly truncates X
. This drops bits
from the right of X
, giving <<2:2>>
. If this worked the same
as integers, you would expect <<1:2>>
. This is certainly
very odd. Since we get truncation on the left and padding on
the left for integers, we naturally expect padding on the
right for bit strings to go with truncation on the right.
But <<X:4/bits>>
isn't <<10:4>>
, it's a runtime exception.
All very odd indeed. It would certainly be desirable to have
an easy way for the programmer to indicate whether they wanted
truncation on the left or the right and padding on the left or
the right. Perhaps a new built in function
set_bit_size(Bit_String, Desired_Width,
Truncation, Padding, Fill)
Bit_String : a bit string
Desired_Width : a non-negative integer, the width wanted
Truncation: 'left' | 'right' | 'error';
if bit_size(Bit_String) > Desired_Width
truncate on the left/truncate on the right/
report an error
Padding: 'left' | 'right' | 'error';
if bit_size(Bit_String) < Desired_Width
pad on the left/pad on the right/report an error
Fill: 0 | 1 | 'copy';
pad with 0/pad with 1/pad with a copy of the
last bit at the end where padding is done.
However, that idea is only partly baked, and is not part of the current proposal. As things currently stand, using the bit syntax and relying on implicit truncation is the simplest way to extract the leading bits of a bit string.
As long as the name of the directive is intention-revealing,
it doesn't matter very much what it is.
I proposed bit_range_check
because it is all about checking,
ranges in bit syntax, but since in this draft it does NOT apply
to bit string segments, perhaps bit_integer_range_check
would
be better.
The arguments false and true seem clear enough. Alternatives would be something like
-bit_integer_range(check).
-bit_integer_range(no_check).
That would be fine too.
Classical Pascal compilers let you do things like
{$I-} (* disable index checks *)
(* code with no index checks *)
{$I+} (* re-enable index checks *)
Allowing multiple -bit_range_check
directives in a module could
let you use code written for the old approach inside a module
that otherwise uses the new approach. I don't believe that we
want to encourage that sort of thing: it is MUCH easier when
reading a module if all of it follows the same rule.
It is also easier for an Erlang compiler that expects to be able
to process function definitions in any order. The compiler can
check for one of these directives anywhere in a module before it
handles any bit syntax forms anywhere. However, it is easier for
people reading a module if, when they first see a <<...>>
construction, they have already seen any directive that might
affect what it means.
The restrictions on the number and placement of these directives can always be relaxed later if necessary.
All existing Erlang code remains acceptable with unchanged semantics.
None, because I still can't find my way around the compiler.
This document has been placed in the public domain.