Author: Sergey Prokhorov <seriy(dot)pr(at)gmail(dot)com>
Status: Accepted/23.0 Proposal is to be implemented in OTP release 23.0
Type: Standards Track
Erlang-Version: 23.0
Created: 07-Oct-2019
Post-History:

EEP 51: Underscore digit separator in numeric literals

Abstract

This EEP extends numeric literal syntax, allowing underscore _ as digits separator. It extends syntax of integer, float and based integer literals.

Copyright

This document has been placed in the public domain.

Specification

Current syntax for numeric literals can be described by following scheme:

DIGIT = [0-9]
BASE_DIGIT = [0-9a-zA-Z]
SIGN = [+-]

INTEGER = SIGN? DIGIT+

FLOAT = SIGN? DIGIT+ '.' DIGIT+ ('e' SIGN? DIGIT+)?

BASED_INTEGER = SIGN? DIGIT+ '#' BASE_DIGIT+

EEP extends this syntax by allowing single underscore _ character as a separator between two DIGIT or BASE_DIGIT characters. I.e., characters from both left and right sides of _ should be digits. Leading, trailing or repeated underscores should not be allowed as well as underscores next to - . # e. Separator is only used to visually separate groups of characters. It doesn't change semantics of literal and is removed by lexer.

Following are examples of valid numeric literals:

123_456
1_2_3_4_5
123_456.789_123
1.0e1_23
16#DEAD_BEEF
2#1100_1010_0011

And following are examples of literals that are not allowed:

_123  % will be interpreted as variable name
123_
123__456  % only single underscore allowed
123_.456  % underscore can only separate digits, not other symbols
123._456
16#_1234
16#1234_

The underscores are to be lost when representing abstract forms or terms as strings (erl_pp, io_lib).

Motivation

Digit separator becomes especially handy for relatively long literals like thousands separator for decimals Million = 1_000_000, SecondsInDay = 86_400 byte separator for hexadecimal 16#DEAD_BEEF_CAFE. Group separator makes such literals more readable (when used wisely) and makes it easier to visualy spot typos like MicroSecond = Second * 100000. The digit separator feature was adopted by multiple programming languages starting from at least Ada83.

Rationale

Underscore character was choosen as separator because it is used by multiple other languages such as "Ada 83", "C#", "Java 7", "Perl 2.0", "Python 3.6", "Ruby", "Rust", "Swift" and "Go". Also it doesn't conflict with current Erlang syntax (unlike, for example, ,). The only ambiguous decision was about where exactly digit separators should be allowed. It was decided that underscore should be allowed only between digits as meant by being a "digit separator".

Backwards Compatibility

All existing Erlang code remains acceptable. The implementation is done entirely in the Erlang lexer, so, no changes are needed in parser and even tools that examine ASTs will be unaffected. External tools like code editors and syntax highlighters might need to be updated to be able to recognize new syntax for numeric literals.

Reference Implementation

The reference implementation is provided in a form of pull-request on GitHub

https://github.com/erlang/otp/pull/2324