Next: Index Definition in SQL Up: Indexing & Hashing Previous: Dynamic Hashing

Comparison of Indexing and Hashing

To make a wise choice between the methods seen, database designer must consider the following issues:
- Is the cost of periodic re-organization of index or hash structure acceptable?
- What is the relative frequence of insertion and deletion?
- Is it desirable to optimize average access time at the expense of increasing worst-case access time?
- What types of queries are users likely to pose?
The last issue is critical to the choice between indexing and hashing. If most queries are of the form
```
 aaaaaaaaaaaa¯select   

		 from r

		 where    
```
then to process this query the system will perform a lookup on an index or hash structure for attribute with value c.
For these sorts of queries a hashing scheme is preferable.
- Index lookup takes time proportional to log of number of values in R for .
- Hash structure provides lookup average time that is a small constant (independent of database size).
However, the worst-case favors indexing:
- Hash worst-case gives time proportional to the number of values in R for .
- Index worst case still log of number of values in R.
Index methods are preferable where a range of values is specified in the query, e.g.
```
 aaaaaaaaaaaa¯select   

		 from r

		 where    and    
```
This query finds records with values in the range from to .
- Using an index structure, we can find the bucket for value , and then follow the pointer chain to read the next buckets in alphabetic (or numeric) order until we find .
- If we have a hash structure instead of an index, we can find a bucket for easily, but it is not easy to find the ``next bucket''.
- A good hash function assigns values randomly to buckets.
- Also, each bucket may be assigned many search key values, so we cannot chain them together.
- To support range queries using a hash structure, we need a hash function that preserves order.
- For example, if and are search key values and then .
- Such a function would ensure that buckets are in key order.
- Order-preserving hash functions that also provide randomness and uniformity are extremely difficult to find.
- Thus most systems use indexing in preference to hashing unless it is known in advance that range queries will be infrequent.

Next: Index Definition in SQL Up: Indexing & Hashing Previous: Dynamic Hashing

Osmar Zaiane
Mon Jul 13 13:28:03 PDT 1998