If we process tuples on a per-block basis we can save many accesses.
The idea is that, if both relations have tuples stored together physically,
we can examine all the tuple pairs for a block of each relation at one time.
We still need to read all the tuples of one relation for a block of the other
relation.
The block method algorithm is:
afor each block Bd of deposit do
begin
for each block Bc of customer do
begin
for each tuple d in Bd do
begin
for each tuple c in Bc do
begin
test pair (d, c) to see if a tuple
should be added to the result
end
end
end
end
- Instead of reading the entire customer relation for each
tuple of deposit, we read the entire customer relation once
for each block of deposit.
- Since there are 500 blocks of deposit tuples and 10 blocks of
customer tuples, reading customer once for each block of
deposit requires 10 * 500 = 5000 accesses to customer blocks.
- Total cost is then 5000 + 500 (for accesses to deposit blocks)
= 5500.
- This is obviously a significant improvement over the non-block method,
which required roughly 100,000 or 2,000,000 accesses.
- Choice of customer for the inner loop is arbitrary, but does
provide a potential advantage.
- Being the smaller relation, it may be possible to keep it all in main
memory.
- If this was the case, we would only require 500 blocks to read
deposit plus 10 blocks to read customer into main memory, for a
total of 510 block accesses.