# CMPT 115 Lecture Notes - Lecture 17: Perfect Hash Function, Pseudorandom Number Generator, Hash Table

16 pages57 viewsWinter 2016

Department

Computer ScienceCourse Code

CMPT 115Professor

Jason BowieLecture

17This

**preview**shows pages 1-3. to view the full**16 pages of the document.**Hashing (Hashed Searching)

CMPT 115 lecture notes

Notes written by Michael Horsch, Mark Eramian, Ian McQuillan, Lingling Jin, and Dmytro Dyachuk

Objectives

After this topic, students are expected to be able to

1. deﬁne what hash functions and collisions are in your own words;

2. calculate the indices of given elements using the combination of diﬀerent hash functions and collision

resolutions;

3. implement the Hash Table Class.

Contents

1 Introduction 1

2 Hash Functions 3

2.1 DirectHashing ............................................ 4

2.2 ModuloDivision ........................................... 4

2.3 Folding ................................................ 4

2.4 Extraction............................................... 5

2.5 PseudorandomHashing ....................................... 5

3 Collision Resolution 5

3.1 OpenAddressing ........................................... 5

3.2 Chaining................................................ 9

4 Hash Table Design Issues 9

5 The Hash Table Class 10

5.1 UsingChaining............................................ 10

5.2 UsingRehashing ........................................... 13

1 Introduction

•We’ve seen diﬀerent ways to search arrays:

–Linear Search (O(n))

–Binary Search ((log n))

•Binary search was more eﬃcient, but required the table to be sorted.

•We can do better...without requiring a sorted array.

•A table is just a list of records with diﬀerent ﬁelds:

1

###### You're Reading a Preview

Unlock to view full version

Only half of the first page are available for preview. Some parts have been intentionally blurred.

Team W L

Huskies 8 0

Rams 4 4

Lancers 5 3

Dinos 7 1

•Like an array, each table row has an index starting at 0.

•A table can easily be represented by an array of structures.

Team

refToString teamName

int wins

int losses

end Team

Team westernConference[8]

•A simple array of, say, integers is just a table with a single column.

•If the table is an array:

–Linear search — O(n)worst case time complexity

–Binary search (if the table is sorted already) — O(log n)worst case time complexity

•If the table is a linked-list:

–Linear search — O(n)worst case time complexity

•If the table is a binary search tree:

–Binary search — O(n)worst case time complexity (but O(log n)if the tree is not degenerate).

Question: Can we do better?

•. . . each data element had a speciﬁc place in the table reserved for it

•. . . you could retrieve an element without searching

•. . . adding to, and searching in, the table required O(1) time, worst-case

We can get close to this perfect world with a technique called hash tables.

•Every student has a unique record at the university.

•Every student has a unique student number.

•(The student number is ideal as a key!)

•Idea: Create an array that uses student number as the index.

Algorithm search(bigArray, studentNumber)

Pre: bigArray::an array of student records

studentNumber::integer

Return: a student record

return bigArray [studentNumber ]

•What is wrong with this idea?

2

###### You're Reading a Preview

Unlock to view full version

Only half of the first page are available for preview. Some parts have been intentionally blurred.

•The array would be far too big!

•The array would be mostly empty!

•. . . but search and insertion would be fast!

•What if there were a way to

–translate a student number to a unique index

–use a much smaller array

–still have O(1) worst case insertion and search

2 Hash Functions

•Ahash function transforms (translates) a data key into an array index.

•Given a key, we use the hash function to know where to put or ﬁnd the data.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0

1

Element

key h

hash function h(key)This element gets stored at

index 8.

•We usually denote the hash function using the letter h, and h(key)as the resulting index for a given

key.

•Question: What is the time complexity of the search?

•Any function hto transform a key to an index is called a hash function.

•A hash function is called perfect if it transforms every unique key to a unique address.

•If his a perfect hash function, the search always takes exactly the same amount of time, regardless of

how many elements are in the table (O(1)).

•But this leads to familiar problems. With a perfect hash function:

–table capacity must be greater than the number of possible keys and this isn’t always known

ahead of time.

–most of the table will be unused if only a small number of keys are used.

•Sometimes perfection is not desirable!

•Let Nrepresent the number of possible uSask student numbers.

•Let nrepresent the number of real uSask students, past and present.

•Obviously: N >> n.

•We could try to ﬁnd a function hthat maps all nreal student numbers to an array of exactly the right

size, so that every index is unique.

3

###### You're Reading a Preview

Unlock to view full version

#### Loved by over 2.2 million students

Over 90% improved by at least one letter grade.