next up previous
Next: Structure of the Query Up: Join Strategies for Parallel Previous: Pipelined Multiway Join

Physical Organization

  1. In order to reduce contention for disk access, the database can be partitioned over several disks, allowing several disk accesses to be serviced in parallel.
  2. In order to exploit the potential for parallel disk access, we must choose a good distribution of data among the disks.
  3. For the parallel 2-way join, it is useful to distribute tuples of individual relations among several disks (disk stripping). For example, assign tuples to disks based on the hash function value of the hash-join algorithm. All groups of tuples that share a bucket are assigned to the same disk. Each group is assigned to the same disk, if possible, or the groups are distributed uniformly among the available disks. This allows the parallel 2-way hash-join to exploit parallel disk access.
  4. For the pipeline-join, it is desirable that each relation be kept on one disk and the distinct relations be assigned to separate disks to the degree possible.

    For example, for computing tex2html_wrap_inline1226 , if each relation is on a different disk, contention is eliminated between processors tex2html_wrap_inline870 and tex2html_wrap_inline872 .

  5. The optimal physical organization differs for different queries. The DBA must choose a physical organization that is believed to be good for the expected mix of database queries.
  6. The query optimizer must choose from the various parallel and sequential techniques by estimating the cost of each technique on the given physical organization.

Osmar Zaiane
Sun Jul 26 17:45:14 PDT 1998