Bonus Lab 5: Hash Tables
Hashing
Given a key, we can use a hash function to compute an index in a hash table.
INDEX = hash_function( KEY )
Searching is extremely fast. Therefore, hashing is suitable for applications like compilers, spell checkers, and security tools.
But it has a limitation: The table size is limited.
Collisions can occur: some keys are mapped to the same location by the hash function.
Critical Issues
1. Design a good hash function
- It should be fast to compute
- It should minimize the number of collisions
2. Design a method to resolve collisions when they occur
Open Addressing
- Linear probing
- hi(K) = (hash(K) + i) mod m
- Any key that hashed into a cluster (a block of contiguously occupied table entries) will require several attempts to resolve the collision
- Blocks of occupied cells start forming: primary clustering.
- Quadratic probing
- hi(K) = ( hash(K) + i^2 ) mod m, where i is the number of collisions
- Two keys with different home positions will have different probe sequences
- But keys that hash to the same home position will probe the same alternative cells: secondary clustering.
- Double hashing to alleviate the problem of clustering
- hi(K) = ( hash(K) + i * hash2(K)) mod m
- hi(K) = ( hash(K) + i * hash2(K)) mod m
Separate Chaining
- Keep a linked list of keys that hash to the same value
- Advantage: deletion is easy
- Disadvantage: memory allocation in linked list manipulation will slow down the program
Lab Task
Implement a hash table structure based on the following files:
link2.h: A list class that supports various list operations and is used in the hash table class. You must NOT modify this file.
table2.h: A hash table structure. Finish all parts marked with "===== TO-DO =====":
- insert()
- remove()
- find_node()
grader.cpp: A grader program to test your hash table implementation. It should show that you get full marks after you have finished table2.h correctly. Do NOT modify this file. Run the program and show your codes to your TA.
References
- Hash applet