Author: Alceste Scalas <alceste(at)crs4(dot)it>
Status: Draft
Type: Standards Track
Created: 3-Sep-2007
Erlang-Version: R12B
Post-History:

EEP 7: Foreign Function Interface (FFI)

Abstract

This EEP describes a Foreign Function Interface (FFI) for Erlang/OTP, that allows to easily perform direct calls of external C functions. It introduces three new BIFs (ffi:raw_call/3, erl_ddll:load_library/3 and ffi:raw_call/2) that accomplish the main FFI tasks: loading generic C libraries, making external function calls and performing automatic Erlang-to-C and C-to-Erlang type conversions.

It also introduces two auxiliary BIFs for converting C buffers/strings into binaries (ffi:raw_buffer_to_binary/2 and ffi:raw_cstring_to_binary/1), a new ffi Erlang module that provides a higher-level API with stricter type checking, and some utility macros. Finally, it extends erl_ddll:info/2 with FFI information.

Motivation

The current Erlang extension mechanisms can be divided in two main categories:

  1. absolute stability at the price of speed (C nodes, pipe drivers);

  2. more speed at the (potential) price of stability (linked-in drivers).

Linked-in drivers have thus become the standard way for creating library bindings when efficiency is an issue. In both cases, however, the Erlang driver interface implies the development of relevant amounts of glue code, mostly because the communication between Erlang and C always requires data parsing and (de)serialization. Several tools have been created in order to autogenerate (at least part of) that glue: from the (now unmaintained) IG driver generation tool to the newer Erlang Driver Toolkit (EDTK) and Dryverl.

But, even with the help of these tools, developing an Erlang driver is a difficult and time-consuming task (especially when interfacing external libraries with tens or hundreds of functions), and the glue code itself increases the possibility to introduce bugs --- that, in the case of linked-in drivers, usually mean VM crashes. For all these reasons, the lack of libraries and the difficulty of interfacing them from other languages is one of the negative aspects that are usually associated with Erlang.

The same problems also arise when a developer needs to replace performance-critical portions of his/her Erlang code with optimized C functions. In this case, also the data serialization/deserialization overhead may be a significant issue.

An easier method for interfacing Erlang and C code could drastically extend the Erlang capabilities and open new usage scenarios.

Rationale

This EEP proposes a Foreign Function Interface (FFI) extension that would allow to easily perform direct C function calls. This concept is implemented in almost every language, with two main (non-exclusive) approaches:

  1. automatic type conversions between the host and the foreign language (examples: Python, Haskell);

  2. documented C interface for handling host language types from the foreign language (examples: Java, Python (API)).

This EEP follows the first approach, but (when possible) also reuses part of the existing C Driver API (and, thus, allows to manage ErlDrvBinary and ErlIOVec pointers in the external C functions).

The FFI has been designed not to require language changes or introduce incompatibilities.

The BIFs and functions proposed in this EEP don't give any access to the Erlang VM internals --- but the called C functions could leak memory and/or cause the Erlang VM to crash. The FFI is, thus, not intended for "casual" Erlang developers: this is a tool designed for library bindings developers (that should take care of hiding FFI calls from final users), and advanced programmers looking for an easy (and efficient) way to call C code from Erlang.

Overview

In order to call a C function, the FFI needs a port opened towards the required C code. Thus, with the current driver loading mechanism, a developer would be required to:

  1. create a C file with a void ErlDrvEntry structure and driver init function;

  2. compile it and possibly link it against the required C libraries, thus obtaining a void Erlang driver;

  3. load the driver in the Erlang VM, by using erl_ddll:load/2.

In order to simplify this procedure, this EEP proposes the erl_ddll:load_library/3 function, that allows to load a generic library in the Erlang VM --- even if it lacks the structure of an Erlang linked-in driver.

erl_ddll:load_library/3 also offers an option to preload a list of C function symbols and signatures, thus precompiling the internal structures needed for performing dynamic function calls. Information about preloaded data can be retrieved with erl_ddll:info/2.

Once a library or driver has been loaded, erlang:open_port/2 or erlang:open_port/1 could be used to get a port for the FFI functions, and perform calls either through the low-level or the high-level APIs.

Low-level API

The low-level FFI methods are denoted by the raw_ prefix. The main function is the ffi:raw_call/3 BIF, that performs a direct C function call through an open port. It converts C types to/from Erlang types.

When taken alone, ffi:raw_call/3 has got a major drawback: it introduces great call overhead, due to the C symbol lookup and the dynamic construction of the function call.

In order to exploit preloading option of erl_ddll:load_library/3, the ffi:raw_call/2 BIF is introduced: it avoids symbol lookup and call structure compilation, thus guaranteeing a lower call overhead than ffi:raw_call/3.

Furthermore, the low-level interface provides two BIFs for creating an Erlang binary from a C pointer (possibly returned by a FFI call). These BIFs are ffi:raw_buffer_to_binary/2 and ffi:raw_cstring_to_binary/1.

High-level API

The high-level interface is built upon the low-level one. It introduces the concept of type-tagged values: any value passed to or returned from FFI calls has the form of a {Type, Value} tuple. This allows to:

  1. increase the readability of FFI calls;

  2. make the C calls safer: the consistency of tagged values is checked before the values themselves are passed to the low-level API. Furthermore, the preload information given to erl_ddll:load_library/3 is used (when available) to ensure that the tagged values actually match the function signature;

  3. simulate the static typing of C code, thus requiring proper and explicit "casts" when a tagged value needs to be converted to another type.

These checks are performed by ffi:call/3, ffi:buffer_to_binary/2 and ffi:cstring_to_binary/1 (the type-tagged equivalents of the low-level BIFs). Type-tagged values can also be checked with ffi:check/1. Furthermore, the allowed minimum and maximum value of each FFI type can be examined with ffi:min/1 and ffi:max/1.

Utility macros

The FFI defines a series of utility macros in the ffi_hardcodes.hrl header file, that could be used for binary matching of C buffers and structures.

Specifications

Types

c_func_name()

c_func_name() = atom() | string()

Name of a C function.

type_tag()

type_tag() = atom()

Valid FFI type atom. For the list of allowed values, see the Appendix.

tagged_value()

tagged_value() = tuple(type_tag(), term())

Type-tagged value used for FFI calls.

tagged_func_name()

tagged_func_name() = tuple(type_tag(), c_func_name())

C function name with return type.

func_index()

func_index() = integer()

Function position on the list of preloads given to erl_ddll:load_library/3.

tagged_func_index()

tagged_func_index() = tuple(type_tag(), func_index())

C function index with return type.

signature()

signature() = tuple(type_tag(), ...)

Signature of a C function: return type followed by arguments types (if any).

erl_ddll:load_library/3

erl_ddll:load_library(Path, Name,
                      OptionsList) -> ok | {error, ErrorDesc}

Types:

  • Path = Name = string() | atom()

  • OptionList = [Option]

  • Option = tuple(preload, [Preload])

  • Preload = tuple(c_func_name(), signature())

Load a generic shared library.

If an ErlDrvEntry structure and a driver init function are found when loading the library, this BIF will behave like erl_ddll:load/2. The function parameters are also the same of erl_ddll:load/2, with the following addition:

OptionList is a list of options for library/driver loading. The supported options are:

  • {preload, PreloadList} Preload the given list of functions, and prepare their call structures. Each PreloadList element is a tuple in the form:

    tuple(c_func_name(), `signature())
    

    i.e. the function name followed by its return and arguments types.

The function return values are the same of erl_ddll:load/2.

Once a library has been loaded, it is possible to use erlang:open_port/2 to get a port. That port could always be used with ffi:call/3, ffi:raw_call/3 or ffi:raw_call/2. However, if the loaded library does not contain a proper ErlDrvEntry structure and a driver init function, the port will not be usable with erlang:port_command/2, erlang:port_control/3 etc.

The following example loads the C standard library and preloads some functions: ::

ok = erl_ddll:load_library("/lib", libc,
                           [{preload,
                             [{puts, {sint, nonnull}},
                              {putchar, {sint, sint}},
                              {malloc, {nonnull, size_t}},
                              {free, {void, nonnull}}]}]).

erl_ddll:load_library/2

erl_ddll:load_library(Path, Name)

Utility function that calls erl_ddll:load_library/3 with an empty OptionsList.

erlang:open_port/1

erlang:open_port(Library)

Types:

  • Library = string() | atom()

Open a port towards the specified shared library, possibly loaded with erl_ddll:load_library/3. Calling this function is equivalent to:

erlang:open_port({spawn, Library}, [binary])

erl_ddll:info/2

This EEP proposes a new parameter for the erl_ddll:info/2 BIF: the 'preloads' atom. It allows to retrieve information about FFI preloads for the given library.

The preload information is a list of proplists, one for each preloaded function. Each proplist, in turn, has the following format:

[ { index,     integer()   },     % Position in the preload list
  { name,      string()    },     % Function name
  { address,   integer()   },     % Function address
  { signature, signature() } ]    % Function signature

This information would be made available also through erl_ddll:info/0 and erl_ddll:info/1.

ffi:raw_call/3

ffi:raw_call(Port, CallArgs, Signature) -> term()

Types:

  • Port = port()

  • CallArgs = tuple(cfuncname(), Arg1, ...)

  • Arg1, ... = term()

  • Signature = signature()

Call the specified C function.

This BIF accepts the following parameters:

  • Port

    A port opened towards the required driver/library.

  • CallArgs

    A tuple with the function name (atom or string) followed by its arguments (if any).

  • Signature

    Function signature.

This BIF returns the return value of the C function being called (or 'void' if the return type is void). It automatically converts Erlang terms to/from C values. The supported C types and conversions are reported in the Appendix.

The following example calls the malloc() and free() functions from the standard C library (it should work with any Erlang linked-in driver): ::

Pointer = ffi:raw_call(Port, {malloc, 1024}, {pointer, size_t}),
ok = ffi:raw_call(Port, {free, Pointer}, {void, pointer}).

WARNING: bugs and/or misuses of the external C functions can affect the Erlang VM, possibly making it crash. Use this BIF with extreme care.

ffi:raw_call/2

ffi:raw_call(Port, OptimizedCall) -> term()

Types:

  • Port = port()

  • OptimizedCall = {FuncIndex, Arg1, ...}

  • FuncIndex = func_index()

  • Arg1, ... = term()

Call a function preloaded with the 'preload' option of erl_ddll:load_library/3.

This BIF accepts the following parameters:

  • Port

    A port opened towards the required driver/library (that must have been loaded with erl_ddll:load_library/3).

  • OptimizedCall A tuple with the function index (i.e. its position in the preload list) followed by its arguments (if any).

This BIF returns the return value of the C function being called (or 'void' if the return type is void). It automatically converts Erlang terms to/from C values. The supported C types and conversions are reported in the Appendix.

The following example calls malloc() and free(), after they have been preloaded with the code sample shown in erl_ddll:load_library/3:

Port = open_port({spawn, "libc"}, [binary]),
Pointer = ffi:raw_call(Port, {3, 1024}),
ffi:raw_call(Port, {4, Pointer})

WARNING: bugs and/or misuses of the external C functions can affect the Erlang VM, possibly making it crash. Use this BIF with extreme care.

ffi:raw_buffer_to_binary/2

ffi:raw_buffer_to_binary(Pointer, Size) -> binary()

Types:

  • Pointer = integer()

  • Size = integer()

Return a binary with a copy of Size bytes read from the given C pointer (represented by an integer, possibly returned by a FFI call).

WARNING: passing the wrong pointer to this BIF may cause the Erlang VM to crash. Use with extreme care.

ffi:raw_cstring_to_binary/1

ffi:raw_cstring_to_binary(CString) -> binary()

Types:

  • CString = integer()

Return a binary with a copy of the given NULL-terminated C string (an integer representing a pointer, possibly returned by a FFI call). The binary will include the trailing 0.

WARNING: passing a wrong pointer to this BIF may cause the Erlang VM to crash. Use with extreme care.

ffi:call/3

call(Port, CFunc, Args) -> RetVal

Types:

  • Port = port()

  • CFunc = c_func_name() | func_index() | tagged_func_name()_ |taggedfuncindex()`

  • `Args = [tagged_value()]

  • RetVal = tagged_value()

Call the C function CFunc with the given list of arguments, using the port Port. If the function was preloaded with ffi:load_library/3, all the type tags will be matched against the preloaded signature before performing the call.

Return the return value of the C function, with the proper type tag.

Note: if CFunc is not of type tagged_func_name(), the C function will be called if and only if it was preloaded with erl_ddll:load_library/3 (it is required in order to determine its return type).

As an example, the following malloc() calls are all valid and equivalent when executed after the code sample shown in erl_ddll:load_library/3:

%% Use function name, but require preloads for return type
{nonnull, Ptr1} = ffi:call(Port, "malloc", [{size_t, 1024}]),
{nonnull, Ptr2} = ffi:call(Port, malloc, [{size_t, 1024}]),

%% Use function index from preloads list
{nonnull, Ptr3} = ffi:call(Port, 3, [{size_t, 1024}]),
{nonnull, Ptr4} = ffi:call(Port, {nonnull, 3}, [{size_t, 1024}]),

%% These calls do not require any preload information
{nonnull, Ptr5} = ffi:call(Port, {nonnull, "malloc"}, [{size_t, 1024}]),
{nonnull, Ptr6} = ffi:call(Port, {nonnull, malloc}, [{size_t, 1024}]),

WARNING: bugs and/or misuses of the external C functions can affect the Erlang VM, possibly making it crash. Use this BIF with extreme care.

ffi:buffer_to_binary/2

ffi:buffer_to_binary(TaggedNonNull, Size) -> binary()

Types:

  • TaggedNonNull = tuple(nonnull, integer())

  • Size: integer()

Return a binary with a copy of Size bytes read from the given C pointer.

WARNING: passing a wrong pointer to this function may cause the Erlang VM to crash. Use with extreme care.

ffi:cstring_to_binary/1

ffi:cstring_to_binary(TaggedCString) -> binary()

Types:

  • TaggedCString = tuple(cstring, integer())

Return a binary with a copy of the given NULL-terminated C string.

WARNING: passing a wrong pointer to this function may cause the Erlang VM to crash. Use with extreme care.

ffi:sizeof/1

ffi:sizeof(TypeTag) -> integer()

Types:

  • TypeTag: type_tag()

Return the size (in bytes) of the given FFI type, on the current platform.

ffi:check/1

ffi:check(TaggedValue) -> true | false

Types:

  • TaggedValue = tagged_value()

Returns 'true' if the given type-tagged value is well-formed and consistent (i.e. it falls in the allowed range for its type, on the current platform). Otherwise, returns 'false'.

ffi:min/1

ffi:min(TypeTag) -> integer()

Types:

  • TypeTag = type_tag()

Return the minimum value allowed for the given FFI type, on the current platform.

ffi:max/1

ffi:max(TypeTag) -> integer()

Types:

  • TypeTag = type_tag()

Return the maximum value allowed for the given FFI type, on the current platform.

ffi_hardcodes.hrl

The ffi_hardcodes.hrl file is part of the Erlang ffi library. It defines a set of macros for handling FFI types sizes, and for easy binary matching on C buffers and structures:

  • FFI_HARDCODED_<TYPE>

    An Erlang bit-syntax snippet (Size/TypeSpecifier) that could be used to match the given FFI type inside a binary (possibly obtained from a C buffer). For example, the following binary matching:

    <<ULong:?FFI_HARDCODED_ULONG, _Rest/binary>> = Binary
    

    on x86-64 will expand to:

    <<ULong:64/native-unsigned-integer, _Rest/binary>> = Binary
    
  • FFI_HARDCODED_SIZEOF_<TYPE>

    The type size in bytes

  • FFI_HARDCODED_<TYPE>_BITS

    The type size in bits

As implied by their name, the ffi_hardcodes.hrl contents are specific to the build platform, and when they are used, they will be hard-coded in the resulting .beam files. Thus, these macros should be avoided if a developer expects his/her FFI-based code to be portable without recompilation. The recommended method for getting FFI type sizes in a portable way is the ffi:sizeof/1 function.

Further notes

Notes on FFI preloading

When a library is loaded with erl_ddll:load_library/3, it may be reloaded or unloaded just like any Erlang linked-in driver. If the 'preload' option is used, then two additional behaviors arise:

  • if erl_ddll:load_library/3 is called two or more times with the same library, then the associated preload list must be rebuilt according to the last call. If no 'preload' option is used, then the last preloads (if any) must be kept intact;

  • if an erl_ddll:reload/2 is issued, then the last preloads must be refreshed by performing a new symbol lookup in the loaded library. If one or more symbols could not be found anymore, then they must be disabled (and an error must raised when trying to use them with ffi:raw_call/2).

Notes on vararg functions

ffi:call/3 and ffi:raw_call/3 may be used to call vararg C functions, simply by providing the desired number of arguments.

In order to exploit the preloading optimizations, however, it is necessary to use a different preload for each different function call signature. For example, if a developer is going to call printf() with different arguments, he/she will need to use a preloading list like the following one:

ok = erl_ddll:load_library("/lib", libc,
                           [{preload,
                             [{printf, {sint, cstring}},
                              {printf, {sint, cstring, double}},
                              {printf, {sint, cstring, uint, sint}},
                              {printf, {sint, cstring, cstring}}]}]).

Notes on C pointers and Erlang binaries

As reported in the Appendix, an Erlang binary can be passed to a C function as a 'pointer' value. In this case, the C function will receive a pointer to the first byte of binary data.

That pointer will be valid only until the C function returns. If the C side needs to access the pointer data later, then it should use the 'binary' FFI type (see next paragraph) or copy the data itself in a safe place.

Notes on Erlang binaries and reference counting

As reported in the Appendix, when the 'binary' FFI type is used as argument, the C function will also receive a binary (in the form of an ErlDrvBinary pointer). Correspondingly, a C function with 'binary' FFI return type must return an ErlDrvBinary pointer. Furthermore, an 'erliovec' argument type will cause the conversion of an Erlang iolist() into an ErlIOVec (and its pointer will be passed to the C function).

There are three rules for properly handling the refcount of binaries passed to, or returned from, the C side through a FFI call.

  1. when a binary is received as argument (either directly, or inside an ErlIOVec), and the C side needs to keep a reference, then the refcount must be increased;

  2. when a binary is created with driver_alloc_binary(), it will have the refcount value of 1. It is considered to be still referenced by the C side;

  3. as a consequence of the previous point, if the C side wants to return a newly-crated binary without keeping references, it must call driver_binary_dec_refc() before returning.

Notes on type-tagged values

As reported above, the high-level FFI API is based on type-tagged values. Type tags, however, may introduce yet another way to annotate/represent the types of Erlang function parameters --- and it may become an annoying redundancy, expecially now that type contracts are (probably) going to be introduced in Erlang.

Thus, the high-level FFI API should be considered highly experimental and subject to change, depending on how type contracts will allow to achieve the same tasks (see [High-level API][]). This issue will need to be explored if/when contracts will be available in the standard Erlang/OTP distribution.

Backwards Compatibility

This EEP, and the proposed FFI patches (see below), do not introduce incompatibilities with the standard OTP release. However, three (possibly) relevant internal changes are required:

  1. the driver_binary_dec_refc() function must be allowed to reach the refcount of 0 without errors or warnings (even when debugging). This is necessary in order to allow a C function to create a binary, drop its references and return it to the Erlang VM (see 'Notes on Erlang binaries and reference counting');

  2. as a consequence of the previous point, driver_binary_inc_refc() must be allowed to reach a minimum refcount of 1 without errors or warnings (the current minimum value is 2);

  3. the iolist() -> ErlIOVec conversion code in io.c needs to be exposed as a stand-alone function, to be used by the FFI.

Reference implementation

An implementation of this EEP is available on muvara.org as a set of patches against OTP R11B-5.

The code is based on the GCC FFI library (libffi). libffi is multi-platform, can be packaged and used separately from the GCC source code, and is released under a very permissive license (compatible with the Erlang Public License). It has been used to implement the FFI interface of several applications and languages, including Python.

The current EEP implementation looks for libffi on the build system, and links the Erlang emulator against it (preferring the libffi shared library, when available). It may be a "good enough" approach, since libffi is usually pre-packaged and easily available on GNU/Linux, BSD and Solaris distributions. However, this approach may create troubles for developers that compile everything from scratch, could not install a precompiled libffi package, or just want to force static linking between the Erlang emulator and libffi. In order to address these issues, it is customary that a copy of libffi is distributed together with the host language, and possibly kept in sync with the upstream version. This is what Python actually does, and Erlang/OTP could possibly adopt the same approach depending on the developers' feedback.

Appendix

Erlang-to-C automatic type conversions

The following table reports the Erlang-to-C conversions, used for passing Erlang terms as C function call arguments.

====================== ===============================
 C argument type        Supported Erlang types
====================== ===============================
uchar                  integer()
schar                  integer()
ushort                 integer()
sshort                 integer()
uint                   integer()
sint                   integer()
ulong                  integer()
slong                  integer()
uint8                  integer()
sint8                  integer()
uint16                 integer()
sint16                 integer()
uint32                 integer()
sint32                 integer()
uint64                 integer()
sint64                 integer()
float                  float()
double                 float()
longdouble             float()
pointer                binary() | integer()
cstring                binary() | integer()
nonnull                binary() | integer()
size_t                 integer()
ssize_t                integer()
pid_t                  integer()
off_t                  integer()
binary                 binary()
erliovec               iolist()
====================== ===============================

C-to-Erlang automatic type conversions

The following table reports the C-to-Erlang conversions, used for converting C function return values into Erlang terms.

====================== ===============================
 C return type          Resulting Erlang type
====================== ===============================
uchar                  integer()
schar                  integer()
ushort                 integer()
sshort                 integer()
uint                   integer()
sint                   integer()
ulong                  integer()
slong                  integer()
uint8                  integer()
sint8                  integer()
uint16                 integer()
sint16                 integer()
uint32                 integer()
sint32                 integer()
uint64                 integer()
sint64                 integer()
float                  float()
double                 float()
longdouble             float()
pointer                integer()
cstring                integer()
nonnull                integer()
size_t                 integer()
ssize_t                integer()
off_t                  integer()
pid_t                  integer()
binary                 binary()
====================== ===============================

Copyright

Copyright (C) 2007 by CRS4 (Center for Advanced Studies, Research and Development in Sardinia) - http://www.crs4.it/

Author: Alceste Scalas

This EEP is released under the terms of the Creative Commons Attribution 3.0 License. See http://creativecommons.org/licenses/by/3.0/