Static Hash Functions

Index schemes force us to traverse an index structure. Hashing avoids this. Hashing involves computing the address of a data item by computing a function on the search key value.
A hash function h is a function from the set of all search key values to the set of all bucket addresses .
- We choose a number of buckets to correspond to the number of search key values we will have stored in the database.
- To perform a lookup on a search key value , we compute , and search the bucket with that address.
- If two search keys and map to the same address, because , then the bucket at the address obtained will contain records with both search key values.
- In this case we will have to check the search key value of every record in the bucket to get the ones we want.
- A good hash function gives an average-case lookup that is a small constant, independent of the number of search keys.
- We hope records are distributed uniformly among the buckets.
- The worst hash function maps all keys to the same bucket.
- The best hash function maps all keys to distinct addresses.
Ideally, distribution of keys to addresses is uniform and random.
Suppose we have 26 buckets, and map names beginning with th letter of the alphabet to the th bucket.
- Problem: this does not give uniform distribution.
- Many more names will be mapped to ``A'' than to ``X''.
- Typical hash functions perform some operation on the internal binary machine representations of characters in a key.
- For example, compute the sum, modulo # of buckets, of the binary representations of characters of the search key.
- See figure 8.18, using this method for 10 buckets (assuming the th character in the alphabet is represented by integer ).
Insertion and deletion are simple.
Open hashing occurs where records are stored in different buckets. Compute the hash function and search the corresponding bucket to find a record. Closed hashing occurs where all records are stored in one bucket. Hash function computes addresses within that bucket. (Deletions are difficult.) Not used much in database applications.
Drawback to our approach: Hash function must be chosen at implementation time.
- Number of buckets is fixed, but the database may grow.
- If number is too large, we waste space.
- If number is too small, we get too many ``collisions'', resulting in records of many search key values being in the same bucket.
- Choosing the number to be twice the number of search key values in the file gives a good space/performance tradeoff.

Next: Dynamic Hash Functions Up: Indexing & Hashing Previous: B-Tree Index Files

Page created and maintained by Osmar R. Zaï ane
Last Update: Wed Nov 15 11:12:38 PST 1995