B-Tree Index Files

Primary disadvantage of index-sequential file organization is that performance degrades as the file grows. This can be remedied by costly re-organizations.
B-tree file structure maintains its efficiency despite insertions and deletions, but it also imposes some overhead.
A B-tree index is a balanced tree in which every path from the root to a leaf is of the same length.
- Each node in the tree, except the root, must have between and children, where is fixed for a particular tree.
- A typical node (Figure 8.7) contains up to search key values , and pointers .
  
  Figure 8.7: Typical node of a B+-tree.
- Search key values in a node are kept in sorted order.
- For leaf nodes, points to either a file record with search key value , or a bucket of pointers to records with that search key value.
- Bucket structure is used if search key is not a primary key, and file is not sorted in search key order.
- Pointer (th pointer in the leaf node) is used to chain leaf nodes together in linear order (search key order). This allows efficient sequential processing of the file.
- The range of values in each leaf do not overlap.
- Non-leaf nodes form a multilevel index on leaf nodes.
- Pointer points to a subtree containing search key values and .
- Pointer points to a subtree containing search key values .
- Pointer points to a subtree containing search key values .
Figures 8.8 (textbook fig. 8.9) and textbook fig. 8.10 show B-trees for the deposit file with =3 and =5.

Figure 8.8: B+-tree for file with .
Processing Queries in a B-Tree
Suppose we want all records with a search key value of .
- Examine the root node and find the smallest search key value .
- Follow pointer to another node.
- If follow pointer .
- Otherwise, find the appropriate pointer to follow.
- Continue down through non-leaf nodes, looking for smallest search key value and following the corresponding pointer.
- Eventually we arrive at a leaf node, where pointer will point to the desired record or bucket.
In processing a query, we traverse a path from the root to a leaf node. If there are search key values in the file, this path is no longer than .
This means that the path is not long, even in large files. Reasonable values for are from 10 to 100 (or even larger). This allows a 3 to 9 level lookup for a file with a million search key values.
Insertions and Deletions:
Insertion and deletion are more complicated, as they may require splitting or combining nodes to keep the tree balanced. If splitting or combining are not required, insertion works as follows:
- Find leaf node where search key value should appear.
- If value is present, add new record to the bucket.
- If value is not present, insert value in leaf node (so that search keys are still in order).
- Create a new bucket and insert the new record.
If splitting or combining are not required, deletion works as follows:
- Deletion: Find record to be deleted, and remove it from the bucket.
- If bucket is now empty, remove search key value from leaf node.
Insertions Causing Splitting:
When insertion causes a leaf node to be too large, we split that node. In figure 8.9, assume we wish to insert a record with a bname value of ``Clearview''.
- There is no room for it in the leaf node where it should appear.
- We now have values (the search key values plus the new one we wish to insert).
- We put the first values in the existing node, and the remainder into a new node. (NOTE: earlier printings of the text say , which is inconsistent with their example.)
- Figure 8.11 shows the result.
- The new node must be inserted into the B-tree.
- We also need to update search key values for the parent (or higher) nodes of the split leaf node. (Except if the new node is the leftmost one)
- Order must be preserved among the search key values in each node.
- If the parent was already full, there will now be children, and so it will have to be split.
- When a non-leaf node is split, the children are divided among the two new nodes, with the left one getting children, and the right one getting the remainder.
- In the worst case, splits may be required all the way up to the root. (If the root is split, the tree becomes one level deeper.)
- Note: when we start a B-tree, we begin with a single node that is both the root and a single leaf. When it gets full and another insertion occurs, we split it into two leaf nodes, requiring a new root.
- Point to remember: We split leaf nodes based on the number of search key values, but we split non-leaf nodes based on the number of pointers (children).
Deletions Causing Combining:
Deleting records may cause tree nodes to contain too few pointers. Then we must combine nodes.
- If we wish to delete ``Downtown'' from the B-tree of figure 8.12, this occurs.
- In this case, the leaf node is empty and must be deleted.
- If we wish to delete ``Perryridge'' from the B-tree of figure 8.13, the parent is left with only one pointer, and must be coalesced with a sibling node.
- Sometimes higher-level nodes must also be coalesced.
- If the root becomes empty as a result, the tree is one level less deep (figure 8.14).
- Sometimes the pointers must be redistributed to keep the tree balanced.
- Deleting ``Perryridge'' from figure 8.12 produces figure 8.15.
To summarize:
- Insertion and deletion are complicated, but require relatively few operations.
- Number of operations required for insertion and deletion is proportional to logarithm of number of search keys.
- B-trees are fast as index structures for database.

Next: B-Tree Index Files Up: Indexing & Hashing Previous: Secondary Indices

Page created and maintained by Osmar R. Zaï ane
Last Update: Wed Nov 15 11:12:38 PST 1995