Author: Walter Weinmann <walter.weinmann@gmail.com>
Status: Draft
Type: Standards Track
Created: 06-Dec-2016
Erlang-Version: OTP 20.0
Post-History: 6-Dec-2016
This EEP proposes the creation of a new module named b_trees for the administration of b-trees. Both the optional persistence and the sort order should be implemented by pluggable functionality.
This document has been placed in the public domain.
{MinimumSubtrees,
MaximumKeys,
SizeKeyValues,
SortFunction/2,
State,
Tree}
Tree
is composed of nodes of the form
{KeyNumber,
SubtreeNumber,
[{Key, Value}],
[Tree]}
and the "empty b-tree" node
nil
State
is a tuple composed of the following parameters:
{StateTarget,
DeleteFunction/3,
InsertFunction/3,
LookupFunction/3}
Since the b-trees are always balanced, there is no need for a balance operation.
b_tree() = {pos_integer(),
pos_integer(),
non_neg_integer(),
sort_function(),
state(),
tree()}
A general balanced tree.
iterator() = [{key_values(), subtrees()}]
A general balanced tree iterator.
Types:
Tree1 = Tree2 = Tree3 = b_tree() | gb_trees:tree()
Copies tree Tree1 to an empty tree Tree2. Both trees may be either of type b-tree or binary tree (gb_trees). Returns the new tree Tree3 of the same type as tree Tree2.
Types:
Key = any()
B-Tree1 = B-Tree2 = b_tree()
Removes the node with key Key from b-tree B-Tree1 and returns the new b-tree B-Tree2. Assumes that key Key is present in b-tree B-Tree1, crashes otherwise.
Types:
Key = any()
B-Tree1 = B-Tree2 = b_tree()
Removes the node with key Key from b-tree B-Tree1 if key Key is present in b-tree B-Tree1, otherwise does nothing. Returns the new b-tree B-Tree2.
Types:
Order = pos_integer()
B-Tree = b_tree()
Returns a new empty b-tree. The order is defined as the maximum number of children nodes a non-leaf node may hold.
Types:
Key = any()
Value = any()
B-Tree1 = B-Tree2 = b_tree()
Inserts key Key with value Value into b-tree B-Tree1 if key Key is not present in b-tree B-Tree1, otherwise updates the current value of key Key to value Value in b-tree B-Tree1. Returns a new b-tree B-Tree2.
Types:
B-Tree1 = B-Tree2 = b_tree()
List = [{Key, Value}]
Turns an ordered list List of key value tuples into a b-tree. The given b-tree B-Tree1 must be empty. The list must not contain duplicate keys.
Types:
Key = any()
B-Tree = b_tree()
Value = any()
Retrieves the value stored with key Key in b-tree B-Tree. Assumes that key Key is present in b-tree B-Tree, crashes otherwise.
Types:
B-Tree = b_tree()
Returns the height of b-tree B-Tree as an integer. Assumes that b-tree B-Tree is non-empty.
Types:
Key = any()
Value = any()
B-Tree1 = B-Tree2 = b_tree()
Inserts key Key with value Value into b-tree B-Tree1 and returns the new b-tree B-Tree2. Assumes that key Key is not present in b-tree B-Tree1, crashes otherwise.
Types:
Key = any()
B-Tree = b_tree()
Returns true
if key Key is present in b-tree B-Tree, otherwise false
.
Types:
B-Tree = b_tree()
Returns true
if b-tree B-Tree is an empty b-tree, otherwise false
.
Types:
B-Tree = b_tree()
Iterator = iterator()
Returns iterator Iterator that can be used for traversing the entries
of b-tree B-Tree; see next/1
. The implementation of this iterator is
very efficient; traversing the whole b-tree using next/1
is only slightly
slower than getting the list of all key-value pairs using to_list/1
and traversing that. The main advantage of the iterator approach is that
it does not require the complete list of all key-value pairs to be built
in memory at one time.
Types:
Key = any(9
B-Tree = b_tree()
Iterator = iterator()
Returns iterator Iterator that can be used for traversing the entries
of b-tree B-Tree; see next/1
. The difference, as compared to the
iterator returned by iterator/1, is that the first key greater than
or equal to key Key is returned.
Types:
B-Tree = b_tree()
Key = any()
Returns the keys in b-tree B-Tree as an ordered list.
Types:
B-Tree = b_tree()
Key = any()
Value = any()
Returns a tuple {Key, Value}, where Key is the largest key in b-tree B-Tree, and Value is the value associated with this key. Assumes that b-tree B-Tree is not empty.
Types:
Key = any()
B-Tree = b_tree()
Value = any()
Looks up key Key in b-tree B-Tree. Returns {value, Value}, or none if key Key is not present.
Types:
Function = fun((Key, Value1) -> Value2)
B-Tree1 = B-Tree2 = b_tree()
Key = any()
Value1 = Value2 = any()
Maps function Function(Key, Value1) -> Value2 to all key value pairs of b-tree B-Tree1. Returns the new b-tree B-Tree2 with the same set of keys as b-tree B-Tree1 and the new set of values.
Types:
Iterator1 = Iterator2 = iterator()
Key = any()
Value = any()
Returns the tuple {Key, Value, Iterator2}, where Key is the smallest key referred to by iterator Iterator1, and iterator Iterator2 is the new iterator to be used for traversing the remaining nodes, or the atom 'none' if no nodes remain.
Types:
B-Tree1 = B-Tree2 = b_tree()
Name : Value = sort : Function = fun((Key1, Key2) -> equal |
greater |
less)
| state : {StateTarget,
Function = fun(StateTarget, delete, Key)
-> true,
Function = fun(StateTarget, insert, Subtrees)
-> Key,
Function = fun(StateTarget, lookup, Key)
-> Subtrees}
Sets the parameter Name to value Value in the empty b-tree B-Tree1 and returns the new b-tree B-Tree2. This function can only be used in conjunction with an empty b-tree.
Types:
B-Tree = b_tree()
Returns the number of key value pairs in b-tree B-Tree as an integer.
Returns 0 (zero) if b-tree B-Tree is empty.
Types:
B-Tree = b_tree()
Returns the number of total nodes and the number of leaf nodes in b-tree B-Tree as a tuple of two integers. Returns {0, 0} (zero) if b-tree B-Tree is empty.
Types:
B-Tree = b_tree()
Key = any()
Value = any()
Returns tuple {Key, Value}, where Key is the smallest key in b-tree B-Tree, and Value is the value associated with this key. Assumes that b-tree B-Tree is not empty.
Types:
Key1 = Key2 = any()
equal = greater = less = atom()
Returns the atom 'greater' if Key1 > Key2, the atom 'less' if Key1 < Key2 and otherwise the atom 'equal'.
Types:
Key1 = Key2 = any()
equal = greater = less = atom()
Returns the atom 'less' if Key1 > Key2, the atom 'greater' if Key1 < Key2 and otherwise the atom 'equal'.
Types:
Key = any()
B-Tree1 = B-Tree2 = b_tree()
Removes the node with key Key from b-tree B-Tree1 and returns the new b-tree B-Tree2. Assumes that key Key is present in b-tree B-Tree1, crashes otherwise.
Types:
Key = any()
B-Tree1 = B-Tree2 = b_tree()
Removes the node with key Key from b-tree B-Tree1 if key Key is present in b-tree B-Tree1, otherwise does nothing. Returns the new b-tree B-Tree2.
Types:
B-Tree1 = B-Tree2 = b_tree()
Key = any()
Value = any()
Returns tuple {Key, Value, B-Tree2}, where Key is the largest key in
b-tree B-Tree1, Value is the value associated with this key, and b-tree
B-Tree2 is this b-tree with the corresponding key value pair deleted.
Assumes that b-tree B-Tree1 is not empty.
Types:
B-Tree1 = B-Tree2 = b_tree()
Key = any()
Value = any()
Returns tuple {Key, Value, B-Tree2}, where Key is the smallest key in
b-tree B-Tree1, Value is the value associated with this key, and b-tree
B-Tree2 is this b-tree with the corresponding key value pair deleted.
Assumes that b-tree B-Tree1 is not empty.
Types:
B-Tree = b_tree()
Key = any()
Value = any()
Converts b-tree B-Tree into an ordered list of key value tuples.
Types:
Key = any()
Value = any()
B-Tree1 = B-Tree2 = b_tree()
Updates key Key to value Value in b-tree B-Tree1 and returns the new b-tree B-Tree2. Assumes that key Key is present in b-tree B-Tree1.
Types:
B-Tree = b_tree()
Value = any()
Returns the values in b-tree B-Tree as an ordered list, sorted by their corresponding keys. Duplicates are not removed.
{StateTarget, DeleteFunction, InsertFunction, LookupFunction}
StateTarget = any()
DeleteFunction(StateTarget, delete, Key) -> true
InsertFunction(StateTarget, insert, Subtrees) -> Key
LookupFunction(StateTarget, lookup, Key) -> Subtrees
Examples for state targets are a Dets table or a Mnesia table. The delete
function takes a state target, the atom delete
and a key as arguments
and returns the atom true
if successful. The insert function takes a
state target, the atom insert
and a subtrees data structure as arguments
and returns a key if successful. The lookup function takes a state target,
the atom lookup
and a key as arguments and returns a subtrees data
structure if successful.
The following examples are based on Mnesia.
persistence_by_mnesia(_, delete, SubtreesKey)
when is_list(SubtreesKey) ->
true;
persistence_by_mnesia(StateTarget, delete, SubtreesKey) ->
F = fun() ->
ok = mnesia:delete({StateTarget, SubtreesKey}),
true
end,
mnesia:activity(transaction, F);
persistence_by_mnesia(_, insert, []) ->
[];
persistence_by_mnesia(StateTarget, insert,
[{_, _, [{Key, _} | _], _} | _] = Subtrees) ->
SubtreesKey = list_to_binary(Key),
F = fun() ->
ok = mnesia:write(StateTarget,
#subtrees{subtreesKey = SubtreesKey,
subtrees = Subtrees}, write),
SubtreesKey
end,
mnesia:activity(transaction, F);
persistence_by_mnesia(_, lookup, SubtreesKey)
when is_list(SubtreesKey) ->
SubtreesKey;
persistence_by_mnesia(StateTarget, lookup, SubtreesKey) ->
F = fun() ->
[{subtrees, SubtreesKey, Subtrees}] = mnesia:read(StateTarget,
SubtreesKey),
Subtrees
end,
mnesia:activity(transaction, F).
Creating the Mnesia table:
-record(subtrees, {subtreesKey, subtrees}).
{atomic, ok} = mnesia:create_table(StateTargetName, [{record_name,
subtrees}]),
Creating the b-tree:
BTree1 = b_trees:empty(500),
BTree2 = b_trees:set_parameter(BTree1, state,
{StateTargetName,
fun persistence_by_mnesia/3,
fun persistence_by_mnesia/3,
fun persistence_by_mnesia/3}),
FunctionName(Key1, Key2) -> equal | greater | less
Key1 = Key2 = any()
The sort function takes two keys as arguments and returns the atom less
if Key1 < Key2, the atom greater
if Key1 > Key2 and otherwise the
atom equal
.
-spec sort_descending(key(), key()) -> sort_result().
sort_descending(Key_1, Key_2) ->
if
Key_1 < Key_2 -> greater;
Key_1 > Key_2 -> less;
true -> equal
end.
BTree1 = b_trees:empty(500),
BTree2 = b_trees:set_parameter(BTree1, sort, fun sort_descending/2),
B-trees are self-balancing tree data structures that keep data sorted and allow searches, sequential access, insertions, and deletions in logarithmic time. B-trees are a generalization of a binary search trees in that a node can have more than two children. Unlike self-balancing binary search trees, the b-tree is optimized for systems that read and write large blocks of data. B-trees are a good example of a data structure for external memory.
The functional design of the module b_trees is based on the module gb_trees:
b_trees | gb_trees
------------------|---------
n/a | balance/1
copy/2 | n/a
delete/2 | delete/2
delete_any/2 | delete_any/2
empty/1 | empty/0
enter/3 | enter/3
from_dict/2 | from_orddict/1
get/2 | get/2
height/1 | n/a
insert/3 | insert/3
is_defined/2 | is_defined/2
is_empty/1 | is_empty/1
iterator/1 | iterator/1
iterator_from/2 | iterator_from/2
keys/1 | keys/1
largest/1 | largest/1
lookup/2 | lookup/2
map/2 | map/2
next/1 | next/1
set_parameter/3 | n/a
size_key_values/1| size/1
size_nodes/1 | n/a
smallest/1 | smallest/1
sort_ascending/2 | n/a
sort_descending/2| n/a
take/2 | take/2
take_any/2 | take_any/2
take_largest/1 | take_largest/1
take_smallest/1 | take_smallest/1
to_list/1 | to_list/1
update/3 | update/3
values/1 | values/1
The functions delete/2
and insert/3
are implementations of the algorithms
of Cormen, Thomas; Leiserson, Charles; Rivest, Ronald; Stein, Clifford (2009),
Introduction to Algorithms (Third ed.), MIT Press and McGraw-Hill, pp. 484-504,
ISBN 0-262-03384-4. Chapter 18: B-Trees.
No issues - except module name collisions.
The reference implementation can be fetched from Github:
https://github.com/walter-weinmann/b_trees