INB371 Week 8 Lecture
Records and Keys
If we have a record class that stores information for students, we want to save the data according to a
predetermined key value. The key could be any of the fields but you’ll want it to be a unique thing like
name or ID. Because we’ve got a list of integers, we can map the integers to the student records. IF
there was 500 students at the Jedi Academy, we could create a vector of size 500 for the records. We
can put the student with classrank i into location studentData[i], so that the key is the index. Many of
the records are left blank, so it wastes memory, but you can look up a record and insert/delete in O(1)
complexity, because you can just use indexes.
Hash tables are also called maps, and they are a list of records so that the key of the record is mapped to
an index in the list. Generating an index from the key is called a ‘hash function’. Each location in the
hash table is called a ‘bucket’. Buckets can be empty, so it uses a lot of memory, but search, insert and
delete are all O(1). The assumption that all generated hash keys are unique isn’t valid. Two students
with the same GPA would be mapped to the same bucket. This is called a “Collision”. Therefore, in the
implementation of your hash table, you need to figure out a solution as to what happens when a
Open Addressing maps each key to a particular index, and if it’s used, then it moves to the next bucket
in the sequence. Then, when searching, if the index doesn’t match the one you’re looking for, it will
move to the next bucket and check that, and keep going until it either finds it or there is an empty
bucket. If you’re trying to an insert a record into the index 4, but only 3 is available, it has to iterate
through the entire table, then start at the start again. This degenerates the complexity to O(N). If you
delete “C3PO”, and then try to find R2D2, it will go to the index 4 and see it’s empty and the result will
be that R2D2 isn’t in the table. To combat this, you put a special value in Index 4 saying that something’s
been deleted from that index, which means you will search through the table looking for R2D2. Open
Addressing presents a problem when keys are clustered around one part of the table, giving you
nowhere to place a new record. To solve this, you can use Quadratic Probing, which makes larger steps
between indexes each time.
Another solution is to allow each bucket to store multiple records, which is called separate chaining. By
putting a vector in each bucket, you’d have to have a hash table implemented as a vector of vectors of
If we were to want to use an ID# or name as the key, we’d have to convert it to an integer before
making it an index. To do this, we have to use a hash function. The function H(x) maps the key x to some
index in the table.
A good hash function that distributes keys across a narrower space is imperative for good performance.
As an exam