-
Primary disadvantage of index-sequential file organization is that
performance degrades as the file grows.
This can be remedied by costly re-organizations.
-
B
-tree file structure maintains its efficiency despite insertions and
deletions, but it also imposes some overhead.
-
A B
-tree index is a balanced tree in which every path from the root
to a leaf is of the same length.
-
Figures 8.8 (textbook fig. 8.9) and textbook fig. 8.10
show B
-trees for the deposit file with
=3 and
=5.
Figure 8.8: B+-tree for
file with
.
-
Processing Queries in a B
-Tree
Suppose we want all records with a search key value of
.
-
In processing a query, we traverse a path from the root to a leaf node.
If there are
search key values in the file, this path is no longer than
.
This means that the path is not long, even in large files.
Reasonable values for
are from 10 to 100 (or even larger).
This allows a 3 to 9 level lookup for a file with a million search key values.
-
Insertions and Deletions:
Insertion and deletion are more complicated, as they
may require splitting or combining nodes to keep the tree balanced.
If splitting or combining are not required, insertion works
as follows:
- Find leaf node where search key value should appear.
- If value is present, add new record to the bucket.
- If value is not present, insert value in leaf node (so that search keys
are still in order).
- Create a new bucket and insert the new record.
If splitting or combining are not required, deletion works as follows:
- Deletion: Find record to be deleted, and remove it from the bucket.
- If bucket is now empty, remove search key value from leaf node.
-
Insertions Causing Splitting:
When insertion causes a leaf node to be too large, we split that node.
In figure 8.9, assume we wish to insert a record with a bname
value of ``Clearview''.
- There is no room for it in the leaf node where it should appear.
- We now have
values (the
search key values plus the
new one we wish to insert).
- We put the first
values in the existing node,
and the remainder into a new node.
(NOTE: earlier printings of the text say
, which is
inconsistent with their example.)
- Figure 8.11 shows the result.
- The new node must be inserted into the B
-tree.
- We also need to update search key values for the parent (or higher)
nodes of the split leaf node.
(Except if the new node is the leftmost one)
- Order must be preserved among the search key values in each node.
- If the parent was already full, there will now be
children, and so
it will have to be split.
- When a non-leaf node is split, the children are divided among the
two new nodes, with the left one getting
children,
and the right one getting the remainder.
- In the worst case, splits may be required all the way up to the root.
(If the root is split, the tree becomes one level deeper.)
- Note: when we start a B
-tree, we begin with a single node
that is both the root and a single leaf.
When it gets full and another insertion occurs, we split it into two
leaf nodes, requiring a new root.
- Point to remember: We split leaf nodes based on the number of
search key values, but we split non-leaf nodes based on the number of
pointers (children).
-
Deletions Causing Combining:
Deleting records may cause tree nodes to contain too few pointers.
Then we must combine nodes.
- If we wish to delete ``Downtown'' from the B
-tree of figure 8.12,
this occurs.
- In this case, the leaf node is empty and must be deleted.
- If we wish to delete ``Perryridge'' from the B
-tree of
figure 8.13, the parent is left with only one pointer, and must be coalesced
with a sibling node.
- Sometimes higher-level nodes must also be coalesced.
- If the root becomes empty as a result, the tree is one level less deep
(figure 8.14).
- Sometimes the pointers must be redistributed to keep the tree balanced.
- Deleting ``Perryridge'' from figure 8.12 produces figure 8.15.
-
To summarize:
- Insertion and deletion are complicated, but require relatively few
operations.
- Number of operations required for insertion and deletion is
proportional to logarithm of number of search keys.
- B
-trees are fast as index structures for database.