Notes #12 --- B-Trees and related topics

AVL trees

Summarizing  previous notes : 

  1. An AVL tree has max depth O (lg n). Thus a find operation has complexity O (lg n).
  2. An insertion of a node into an AVL tree requires O (lg n) time to find the insertion point, and then an O (lg n) traversal back toward the root.
  1. A deletion requires: O (lg n) time to find the node to be deleted and exchange it with the immediately following or preceding node, plus an O (lg n) traversal back toward the root, updating heights and checking the balance condition.

We can generalize AVL trees by:

  1. Using k-ary rather than binary trees, or
  2. Allowing a height difference as large as d, instead of 1.

For k-ary trees, we can get fancier conditions, such as

Exercise

Write recursions for the minimum number of nodes:

  1. In a binary AVL-like tree with d = 2.
  2. In a ternary AVL tree with the usual balance condition (d = 1).
  3. In a ternary AVL tree in which adjacent heights can differ by at most 1.

B-Trees

B-trees are the data structure on which VSAM and SIS files are built. The key notions are as follows:

  1. Nodes can contain more than one key (and one more pointer than keys).
  2. The number of keys in a node is bounded below, except at the root.
  3. All leaves are on the same level of the tree.
  4. The tree grows upward at the root, by splitting, rather than downward at the leaves (by adding children).

Standard B-trees

find in a B-tree is as in any k-ary search tree.

insert finds the location point as usual.

delete finds the key as usual, and as usual (if the key is not in a leaf) exchanges it with its immediate successor (which must always exist).

Miller has a formal definition and an extended example on pages 285--308.

B*-trees

In addition to using rotations in deletion, we can try to use them in insertion. This can be done with regular B-tree insertion, but we can also use it to allow the minimum number of pointers to be closer to 2n/3 rather than n/2. In a B*-tree,

  1. m = ceil ((2n-1)/3).
  2. The root can have between 2 and 1 + 2 floor ((2m-2)/3) pointers. This allows the root to be split into three nodes when the height of the tree increases.
  3. Insertion now uses rotation, checking the left or right sibling.
  1. Deletion checks both left and right sibs.

Modifications

  1. As usual, store pointers to records rather than records in nodes.
  2. B+-trees: don't store records or pointers in the internal nodes, but only keys (as in ISAM or VSAM); use a consistent decision about whether the internal keys are stored to the left (typical) or to the right of the key.
  3. Use a different blocking factor in leaves than used above (especially for B+-trees in which data is stored in the leaves).
  4. In a B+-like tree, thread either the leaves (as in SIS files), or the level above (as in VSAM). In an ordinary tree, thread as for threaded binary trees.

Remarks

  1. In practice, n is typically around 20--25.
  2. The depth of any B-tree-like structure is no more than 2 + log_m (n/2) where leaves have depth 0.

Exercises

  1. Insert the following single sequence of keys into
          39, 48, 61, 85, 17,
          21,  8, 25, 42, 70,
          77, 31, 99, 96, 88,
          74, 66,  3, 15, 21
  1. In each case above, determine