Hybrid structures

As a rule, we would like to use multilists if the number of records per (K,KV) pair is small, principally because of the overhead cost for inverted lists, and inverted lists if it is large, principally because of the reduced cost of intersection queries. Two schemes try to compromise.

Hybrid list organization uses multilists for short (K,KV) lists, and inverted lists for long lists. This requires making more use of the count field in the KV header node, and keeping a tag there to signal which type of list is used. In the simplest implementation, the decision is made on a single length threshold t ; this can cause problems if insertions and deletions make the length flip-flop across the threshold. A more sophisticated implementation would use two thresholds, hi and lo --- a list which gets longer than hi is converted to an inverted list, and one which gets shorter than lo to a multilist --- guaranteeing that at least hi - low insertions/deletions occur between changes.

In a hybrid list organization, queries entirely involving inverted lists are handled by inverted list methods; those involving multilists by multilist methods.

Exercise: How should a query involving both inverted lists and multilists be handled? Your answer may depend on the form of the list.

Cellular multilists attempt to increase the efficiency of multilists by maintaining a list of lists, where each list belongs to a particular "cell". The cells used are independent of the key and key value, and typically correspond to physical structures such as cylinders of a disk, or pages of virtual memory. Intersections are speeded up in two ways. First, instead of using the list with global minimum size, we can use the smaller list in each cell --- conceivably even determining that an intersection is empty. Second, by keeping all pointers into the same physical region, we reduce the cost of the physical access to memory.