Hash Table Collisions
Occurs when 2 different keys are mapped to the same index
2 main ways of dealing with them:
When insertion results in collision, look for somewhere else to put the value
Start at index to which hash function mapped the inserted item
Look for free space in array following a search patten (probing)
Linear probing (bad)
Quadratic probing (fine)
Double hashing (good)
Linear: table searched sequentially
Each time probed for free location, add one to index.At end, wrap around. Just look for next empty
space in array.
Leads to primary clustering. Table contains groups of consecutively occupied locations, clusters get
larger as time goes on, reduces efficiency.
Searching with linear probing:
Searching similar to insertion.
Search for 59, 59 mod 23 = 13, index 13 does not contain 59, but it contains something else. Cannot yet
conclude that table does not contain 59. Cannot conclude until you get to a free space. Probe, check,
add one to index, continue until hit 59 or free space.
Refinement of linear, prevents primary clustering
Don't just add one to index, add p^2. p is number of times you've had to do this probing.Always go
from original index.
1st probe: h(x) + 1^2
Results in secondary clustering. Same sequence of probes is used when two different values hash to
same location. Delays collision resolution.
Problem theoretically - analysis shows that it isn't really a problem in practice.
After some time, a sequence of probes repeats itself. If you loop like this, cannot insert that new value.
Generally does not cause problems if data not significantly skewed, hash table large enough (nearest
prime to 2*items to insert), function scatters data evenly across table
Not likely to happen unless you write a really really bad hash function.
Also think about table size 50000~ compared to table size 7
Also if your table is full of stuff. But don't want it to get over half full.
Performance of hash tables using open addressing degrades if over half full
If more than 2/3 full, need to do something.
Double Hashing: In linear and quadratic, probing sequence is independant of key.
Double hashing is key dependant probe sequence
Asecond hash function determine's the probes sequence.
Will get rid of secondary clusters, hopefully primary too.