Author: Richard A. O'Keefe <ok(at)cs(dot)otago(dot)ac(dot)nz>
Status: Draft
Type: Standards Track
Erlang-Version: R16A
Created: 07-Feb-2013
Post-History:
A new function erlang:set_process_info_limit/2
is added,
allowing a process to set limits on its own memory use.
erlang:set_process_info_limit(Item, Limit :: integer()) ->
Old_Limit :: integer()
An Old_Limit
of 0 means that no limit has been set.
A Limit of 0 means that no limit is to be set.
The Items that can be set are
memory
, the number of bytes that may be used for stack, heap, and
internal structures.
memory_words
, the same value as memory
, but expressed in
words to be consistent with heap_size
, total_heap_size
, and
stack_size
in erlang:process_info/[1,2]
. Those functions should
also be revised to accept this Item.
message_queue_len
, the number of unprocessed messages.
(Aside.) The documentation refers to message_queue_len
but the system generates and recognises message_queue_len
.
(End aside.)
The values that are actually set may be less than Limit as long as any physically realisable system in the node as configured cannot exceed the value used.
A non-zero memory limit is checked after each garbage
collection or other memory restructuring; setting the memory
limit to a non-zero value less than the current
process_info(self(), 'memory')
forces an immediate garbage
collection.
If the memory required by a process exceeds its limit, the
process is exited with reason memory
.
A non-zero message queue length limit is checked whenever the message queue length is about to be incremented, or when such a limit is set.
If the message queue length limit is exceeded, the process
is exited with reason message_queue_len
.
It is currently possible for an Erlang process to grab memory without limit and eventually take down the whole node. This problem is frequently reported in the Erlang mailing list.
It is also possible for the message queue of an Erlang process to grow without limit. This problem too is reported often enough to be recognised.
This is an EEP rather than a library change request because it requires low level runtime system changes to support it. For example, the limits have to be stored somewhere, suggesting a change in the data structure holding information about a process. The limits have to be checked, meaning that changes to the garbage collector and the message sending core are required. The limits have to be enforced, meaning that two new exit reasons have to be supported. And the new exit reasons and the new function have to be documented, requiring changes to more than one document.
The need is long-standing, and at least the presence of this EEP may provoke discussion leading to some resolution.
The function that sets a limit should have a name based
in the name of the function that reports current use.
That function is process_info
, hence the name I've
chosen here is set_process_info_limit
.
The erlang:process_info/[1,2]
functions can report on
any Erlang process. But it is clearly dangerous to let
a process set another process's limits. All we actually
need is for a process to be able to set its own limits
in a startup phase. That could be done by adding
{'memory',Size}
and {'message_queue_len',Count}
options
to spawn_opt/[2,3]
. However,
spawn_opt(Fun, [{memory,128*1024}])
can be mimicked by
spawn(fun () ->
erlang:set_process_info_limit(memory, 128*1024),
Fun()
end)
and set_process_info_limit/2
allows a process to set
different limits at different times. If you are trying
to protect the system from malice, setting limits in
spawn_opt/[2,3]
is the way to go. If you are trying to
protect the system from accident, letting a process
set its own limits is the way to go.
The names for the Items to be set clearly must be identical to the names used for the same thing elsewhere.
The memory_words
item is introduced because in a world where
some of my programs run 32-bit and some run 64-bit it is just
too confusing to count bytes, especially as stack size and so
on are measured in words, not bytes.
I considered allowing total_heap_size
and stack_size
and so on
to be given their own limits, but on finding that total_heap_size
"currently includes the stack of the process", decided that it
was "currently" a bad idea.
I would have liked to allow setting a limit on the number
of reductions, so that a process that doesn't intend to
live forever can ensure its own eventual death. However,
the documentation warns that erlang:bump_reductions/1
"might be removed", so presumably reduction counting
per se might well disappear.
The point of letting the value actually set for a limit
to be smaller is to allow an implementation to use only
values that will fit in a size_t
, so that low level
code does not need to deal with bignums. On a
POSIX-compliant operating system, an Erlang implementation
may use the RLIMIT_DATA
value for the UNIX process
if the memory limit is bigger, and may use RLIMIT_DATA
divided by the minimum size of a message for the message
queue length. Or it might use other appropriate system
limits.
The biggest question is "what happens if a limit is exceeded?"
For memory, we could exit with system_limit
as reason, but
that wouldn't make clear what limit had been exceeded. It
seems advisable to introduce a new reason, and making the
reason the same as the name of the limit seems the least
confusing approach.
For message queue length, I would prefer it if the process that sends the message were the one to get a run time error. However, the Erlang documentation guarantees that sending a message to a pid always succeeds, whether the process is live or dead, and I don't want to change too much. A message queue might build up because some process(es) is(are) sending junk; we would prefer exiting junk senders to exiting junk receivers. However, if the junk receiver isn't cleaning out the junk, that junk is never going away, so it's always going to be costing time to skip over as well as memory to store. That argument applies to another option that I considered: just discarding messages that would make the queue too long. The Erlang Way is to let subsystems that get into trouble crash. So let it be.
The new function is not exported by default so it cannot be called accidentally.
None in this draft.
This document has been placed in the public domain.