Notes #12 --- B-Trees and related topics
AVL trees
Summarizing
previous notes :
An AVL tree has max depth O (lg n). Thus a find operation has complexity O (lg n).
An insertion of a node into an AVL tree requires O (lg n) time to find the insertion point, and then an O (lg n) traversal back toward the root.
- At each level, there is a constant amount of work updating and comparing heights, and at most a constant amount of work redirecting pointers during rotation.
- The reverse traversal terminates when
- (a) the height of a node has not changed, or
- (b) the height of a node changes, but its sibling has the same height (so its parent would satisfy (a)), or
- (c) we perform a (single or double) rotation --- there is at most rotation per insertion, or
- (d) we reach the root, and increment its height.
- A deletion requires: O (lg n) time to find the node to be deleted and exchange it with the immediately following or preceding node, plus an O (lg n) traversal back toward the root, updating heights and checking the balance condition.
- Unlike the case of insertion, in deletion a rotation or having siblings with equal height does not terminate the process --- there may be a rotation on every level (at constant cost); the process only terminates if some node on the path has its height unchanged, or we reach the root and decrement its height.
We can generalize AVL trees by:
- Using k-ary rather than binary trees, or
- Allowing a height difference as large as d, instead of 1.
For k-ary trees, we can get fancier conditions, such as
- "no two children's heights can differ by more than d", or
- "adjacent heights cannot differ by more than d".
Exercise
Write recursions for the minimum number of nodes:
- In a binary AVL-like tree with d = 2.
- In a ternary AVL tree with the usual balance condition (d = 1).
- In a ternary AVL tree in which adjacent heights can differ by at most 1.
B-Trees
B-trees are the data structure on which VSAM and SIS files are built. The key notions are as follows:
- Nodes can contain more than one key (and one more pointer than keys).
- The number of keys in a node is bounded below, except at the root.
- All leaves are on the same level of the tree.
- The tree grows upward at the root, by splitting, rather than downward at the leaves (by adding children).
Standard B-trees
An (m,n)-B-tree has at least m and at most n pointers at any node except the root (or is the empty tree).
A B-tree of order n has m = ceil (n/2).
The root in such a tree has between 2 and n pointers.
This is usually what is meant by "B-tree" without qualifications.
find in a B-tree is as in any k-ary search tree.
insert finds the location point as usual.
- If the (leaf) node is not full, it inserts the key in that node, adjusting pointers.
- If the node is full, it splits the node (which now contains n keys) into two with m-1 keys, plus the middle node, which it now attempts (recursively) to insert in the parent node.
- Splitting the root increases the height of the tree. The work in adjusting pointers at any level is proportional to the number of keys in the node.
delete finds the key as usual, and as usual (if the key is not in a leaf) exchanges it with its immediate successor (which must always exist).
- If there are more than m-1 keys in the (leaf) node, it can delete the key without a problem.
- If there are exactly m-1 keys, then --- and this is the tricky part --- it tries to borrow from its right sibling (from the left sibling if it is rightmost); if that sibling node has more than m-1 keys, it moves the largest up to the parent, and moves the key in the parent down into the current node.
- Otherwise, the sibling node has m-1 keys, the current node has m-2, after deletion, and we can combine them with the key in the parent separating them into a node with 2m - 3 + 1 <= n - 1 nodes. We then (recursively) try to delete the key we just put into the new node from the parent.
Miller has a formal definition and an extended example on pages 285--308.
B*-trees
In addition to using rotations in deletion, we can try to use them in insertion. This can be done with regular B-tree insertion, but we can also use it to allow the minimum number of pointers to be closer to 2n/3 rather than n/2. In a B*-tree,
- m = ceil ((2n-1)/3).
- The root can have between 2 and 1 + 2 floor ((2m-2)/3) pointers. This allows the root to be split into three nodes when the height of the tree increases.
- Insertion now uses rotation, checking the left or right sibling.
- Note that the leftmost and rightmost children have to check their unique adjacent sib. A split divides the node and the sib just checked (together with their dividing key in the parent) into three nodes, and promotes the two dividing sibs (and now the parent has to be checked to see if it is overfull).
- Deletion checks both left and right sibs.
- In the case of the leftmost and rightmost children, the two nearest sibs are checked, and a double rotation may be needed. If all three nodes are short, all three (and the two dividing keys) are merged into two nodes, and one dividing key returned to the parent (which now has to be checked for underflow).
Modifications
As usual, store pointers to records rather than records in nodes.
B+-trees: don't store records or pointers in the internal nodes, but only keys (as in ISAM or VSAM); use a consistent decision about whether the internal keys are stored to the left (typical) or to the right of the key.
Use a different blocking factor in leaves than used above (especially for B+-trees in which data is stored in the leaves).
In a B+-like tree, thread either the leaves (as in SIS files), or the level above (as in VSAM). In an ordinary tree, thread as for threaded binary trees.
Remarks
In practice, n is typically around 20--25.
The depth of any B-tree-like structure is no more than 2 + log_m (n/2) where leaves have depth 0.
Exercises
Insert the following single sequence of keys into
39, 48, 61, 85, 17,
21, 8, 25, 42, 70,
77, 31, 99, 96, 88,
74, 66, 3, 15, 21
- (a) a binary search tree
- (b) an AVL tree
- (c) a ternary search tree
- (d) a (2,4)-B-tree --- that is, a standard B-tree of order 4.
- In each case above, determine
- (a) the mean internal path length --- the average depth of a key in the tree.
- (b) the mean external path length --- the average depth of a failure node.