Class Notes (1,052,067)
CA (601,920)
U of S (3,567)
CMPT (50)
Lecture 17

CMPT 115 Lecture Notes - Lecture 17: Perfect Hash Function, Pseudorandom Number Generator, Hash Table

16 pages57 viewsWinter 2016

Department
Computer Science
Course Code
CMPT 115
Professor
Jason Bowie
Lecture
17

This preview shows pages 1-3. to view the full 16 pages of the document.
Hashing (Hashed Searching)
CMPT 115 lecture notes
Notes written by Michael Horsch, Mark Eramian, Ian McQuillan, Lingling Jin, and Dmytro Dyachuk
Objectives
After this topic, students are expected to be able to
1. deﬁne what hash functions and collisions are in your own words;
2. calculate the indices of given elements using the combination of diﬀerent hash functions and collision
resolutions;
3. implement the Hash Table Class.
Contents
1 Introduction 1
2 Hash Functions 3
2.1 DirectHashing ............................................ 4
2.2 ModuloDivision ........................................... 4
2.3 Folding ................................................ 4
2.4 Extraction............................................... 5
2.5 PseudorandomHashing ....................................... 5
3 Collision Resolution 5
3.2 Chaining................................................ 9
4 Hash Table Design Issues 9
5 The Hash Table Class 10
5.1 UsingChaining............................................ 10
5.2 UsingRehashing ........................................... 13
1 Introduction
We’ve seen diﬀerent ways to search arrays:
Linear Search (O(n))
Binary Search ((log n))
Binary search was more eﬃcient, but required the table to be sorted.
We can do better...without requiring a sorted array.
A table is just a list of records with diﬀerent ﬁelds:
1

Unlock to view full version

Only half of the first page are available for preview. Some parts have been intentionally blurred.

Team W L
Huskies 8 0
Rams 4 4
Lancers 5 3
Dinos 7 1
Like an array, each table row has an index starting at 0.
A table can easily be represented by an array of structures.
Team
refToString teamName
int wins
int losses
end Team
Team westernConference[8]
A simple array of, say, integers is just a table with a single column.
If the table is an array:
Linear search — O(n)worst case time complexity
Binary search (if the table is sorted already) — O(log n)worst case time complexity
If the table is a linked-list:
Linear search — O(n)worst case time complexity
If the table is a binary search tree:
Binary search — O(n)worst case time complexity (but O(log n)if the tree is not degenerate).
Question: Can we do better?
. . . each data element had a speciﬁc place in the table reserved for it
. . . you could retrieve an element without searching
. . . adding to, and searching in, the table required O(1) time, worst-case
We can get close to this perfect world with a technique called hash tables.
Every student has a unique record at the university.
Every student has a unique student number.
(The student number is ideal as a key!)
Idea: Create an array that uses student number as the index.
Algorithm search(bigArray, studentNumber)
Pre: bigArray::an array of student records
studentNumber::integer
Return: a student record
return bigArray [studentNumber ]
What is wrong with this idea?
2

Unlock to view full version

Only half of the first page are available for preview. Some parts have been intentionally blurred.

The array would be far too big!
The array would be mostly empty!
. . . but search and insertion would be fast!
What if there were a way to
translate a student number to a unique index
use a much smaller array
still have O(1) worst case insertion and search
2 Hash Functions
Ahash function transforms (translates) a data key into an array index.
Given a key, we use the hash function to know where to put or ﬁnd the data.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0
1
Element
key h
hash function h(key)This element gets stored at
index 8.
We usually denote the hash function using the letter h, and h(key)as the resulting index for a given
key.
Question: What is the time complexity of the search?
Any function hto transform a key to an index is called a hash function.
A hash function is called perfect if it transforms every unique key to a unique address.
If his a perfect hash function, the search always takes exactly the same amount of time, regardless of
how many elements are in the table (O(1)).
But this leads to familiar problems. With a perfect hash function:
table capacity must be greater than the number of possible keys and this isn’t always known
most of the table will be unused if only a small number of keys are used.
Sometimes perfection is not desirable!
Let Nrepresent the number of possible uSask student numbers.
Let nrepresent the number of real uSask students, past and present.
Obviously: N >> n.
We could try to ﬁnd a function hthat maps all nreal student numbers to an array of exactly the right
size, so that every index is unique.
3