CSI 2110
Summary
Fall 2012 CSI 2110 Summary, Fall 2012
Analysis of Algorithms
Algorithm - a step by step procedure for solving a problem in a finite amount of time. Analyzing an
algorithm means determining its efficiency.
Primitive operations - low-level computations independent from the programming language that can be
identified in the psuedocode.
Big-Oh - given two functions and , we say is if and only if there are positive
constant and such that for . It means that an algorithm has the complexity of
AT MOST .
Just multiply all terms by the highest degree of , add, and you have your (the coefficient) and
(the function).
Logarithms alwaysin base 2 in this class unless otherwise stated
Always want the lowest possible bound, approximation should be as tight as possible
Drop lower order terms and constant factors
Big-Omega - is such that for all . It means an algorithm has
the complexity of AT LEAST .
Big-Theta - is if it is( ) ( ) It means the complexity IS EXACTLY .
Properties of logarithms and exponentials:
( )
Remember the following equations:
∑
∑
( )
∑
∑
∑ Only if the graph is implemented as an adjacency list
Stacks and Queues
Stacks - First in, last out.
Queues - First in, first out.
1 | P a g e CSI 2110 Summary, Fall 2012
Extendable Arrays
Growth function is
Regular push: 1
Special push: Cost of creating the new array ( ) + copying the old elements into the new array
(length of old array) + 1
Phase: starts at creation of the new array and ends with the last element being pushed. The one pushed
after that is a special push, and starts a new phase.
Complexities
Method ArrayList LinkedList Unsorted Seq SortedSeq
size
isEmpty
get
replace
insert
remove
minKey/minElement
removeMin
Selection Sort
Using an external data structure - insert your elements into the first data structure, find the smallest
one, add it into your second data structure. Takes time.
In place (Not using an extra data structure) - search through the array, find the smallest, add it to the
front of the array. Repeat until it's sorted.
Complexity: Average and worst case: performs in operations regardless of input, since
is always executed in .
Insertion Sort
Using an external data structure - insert the first element of the first data structure into the other one,
but adding each element before or after the existing elements according to size. Then,
and add it into the old ADT.
In place - Take the second element and switch it with the first if needed. Take the third element and
switch it with the second, and then the first if need be, and so on.
Complexity: Average and worst case is .
Trees
Graph - consists of vertices and edges.
Tree - a graph that contains no cycles.
Root - a node without a parent.
Internal node - a node with at least one child.
External node - a node without any children.
Ancestor - parent, grandparent, great-grandparent, etc.
Descendent - child, grandchild, great-grandchild.
Subtree - a tree consisting of a node and its descendants.
Distance - number of edges between two nodes.
2 | P a g e CSI 2110 Summary, Fall 2012
Depth - number of ancestors.
Height - the maximum depth.
Preorder - visit, left, right.
Postorder - left, right, visit.
Inorder - left, visit, right.
Binary Trees
Each node has at most two children.
Examples: decision trees (yes/no), arithmetic expressions,
Full binary tree - each node is either a leaf, or has two children.
In the book, children are completed with dummy nodes, and all trees are considered full.
Perfect binary tree - a full binary tree with all the leaves at the same level.
Complete binary tree - perfect until the level , then one or more leaves at level .
Properties of Binary Trees
o
In Perfect Binary Trees:
Maximum number of nodes at each level is
Properties of Height
Binary:
Binary (Full):
Binary(Complete): (integer part of )
Binary(Perfect):
Implementing Complete Binary Trees with ArrayList
, , , , , ,
, , all have complexity .
Left child of is if .
Right child of is if .
Parent of is if .
Root is if
Is a leaf? True if .
Representing General Trees
A tree T is represented by a binary tree T' with the following algorithm, seen in class:
.
First child in in is the left child of in
First sibling of in is right child of in
If is a leaf and has no siblings, then the children of are leaves.
3 | P a g e CSI 2110 Summary, Fall 2012
If is internal and v is its first child theis the left child of in
If has a sibling immediately following it, is the right child of in .
Heaps
MinHeap - All children are larger than its parent.
MaxHeap - All children are smaller than its parent.
Height of a heap: a heap storing keys has a height of
Removal -
Remove the top element
Replace with last key in the heap.
Begin the downheap.
Downheap
Compares the parent with the smallest child.
If the child is smaller, switch the two.
Keep going.
Stops when the key is greater than the keys of both its children or the bottom of the heap is
reached.
Insertion
Add key into the next available position.
Begin upheap.
Upheap
Similar to downheap.
Swap parent-child keys out of order.
Regular Heap Construction -We could insert items one at a time with a sequence of heap insertions,
∑ . But we can do better with bottom-up heap construction.
Bottom-up Heap Construction (WILL be a question on midterm and/or final exam!)
Idea: recursively rearrange each subtree in the heap starting with the leaves.
Switch starting from the bottom, work your way up to bigger subtrees and rearrange them.
To construct a heap bottom-up, construct a tree with the number of nodes you need to insert.
Start from the back of the array, and start adding from the right of the subtree, filling all the
leaves.
These are all okay. Move on to the right of the level and fill in the entire level (again,
working backwards from the array). Rearrange those subtrees.
Continue to the next level. Fill, rearrange, etc.
Number of Swaps: How many nodes at level ? There are . Will do swaps.
Complexity is which is the best we can possibly get.
Heaps are implemented the same way as binary trees.
Heap Sort - Cost is
4 | P a g e CSI 2110 Summary, Fall 2012
Heap Sort In Place
Phase 1 - We build a max-heap to occupy the whole structure
Phase 2 - We start the part "sequence" empty and we grow it by removing at each step
( ) the max value from the heap and by adding it to the part "sequence", always
maintaining the heap properties for the "heap" part.
Take the root and switch it with the rightmost child. Reheap. Keep going along the last row the
same way. Switch with root, reheap, etc.
Maps and Dictionaries
Map - multiple items with the same key are NOT allowed
Dictionary - multipleitems with the same key ARE allowed
Unordered Sequence
Searching and removing takes time
Inserting takes time.
Applications to log files (frequent insertions, rare searches and removals)
Ordered Sequence - Array-based
Searching takes with binary search.
Inserting and removing takes time.
Applications to look-up tables (frequent searches, rare insertions and removals).
Start in the middle, if the key we're searching for if larger, go to the right. If it's smaller, go to the
left.
Binary Search Trees
A binary tree such that keys stored to the left are less than , and keys stored to the right are
greater than or equal to .
External nodes do not hold elements but serve as place holders.
Inorder traversal gives you the keys in increasing order
5 | P a g e CSI 2110 Summary, Fall 2012
Complexity:
o Worst case: where all keys are to the right or to the left.
o Best case: where leaves are on the same level or on an adjacent level.
Insertion
Always insert at the end of the search tree based on the correct order
If you're adding a double, always insert to the right, not to the left of the tree.
Deletion
If it’s at the end, you can just remove it.
If it's not at the end, replace it with the next in the inorder traversal.
AVL Trees
AVL trees are balanced.
They are binary search trees that for every internal node of , the heights of the children can
differ by at most .
Height of an AVL tree is always .
The height is .
Insertion
Balanced - if for ever node , the height of the 's children differ by at most 1.
It tree becomes unbalanced, we need to rebalance.
Rebalance
o Identify three nodes (grandparent, parent, child) and the 4 subtrees attached to them.
o Find the node whose grandparent is unbalanced. This node (child) is , the parent is , and
grandparent is .
Choice of is not unique
o Identify the four subtrees, left to right, as and
o The first who comes in the inorder traversal is , second is , and third is .
Removal
Remove the same way as in a binary search tree. However, this may cause an imbalance.
6 | P a g e CSI 2110 Summary, Fall 2012
Complexity
Searching, inserting, and removing are all , that's what makes AVL trees so nice.
Trees
A tree is a multi-way search tree with the following properties:
o Node size property: every internal node has at most four children
o Depth property: all external nodes have the same depth
Can't have more than four children or less than two children.
Depending on the number of children, an internal node of a tree is called a -node, -node,
or -node.
Searching in a tree with items takes time
Min number of items: when all internal nodes have key and children: ,
Maximum number of items :when all internal nodes have three keys and four children.
∑
Insertion
Insert similar to a binary search tree. Insert at the end after (binary) searching where it goes.
May cause overflow, since you're only allowed three elements in one node.
o Take the third element, send it up, and make new nodes out of the first two (one) and the
fourth one (two).
Insertion takes time
Deletion
Replace the deleted key with the inorder successor
Can cause underflow, might need to fuse nodes together to fix this. To handle an underflow at
node with parent , we consider two cases:
o Case 1: the adjacent siblings ofare 2-nodes.
Fusion operation: we merge with an adjacent sibling and move an item from to
the merged node .
After a fusion, the underflow may propagate to the parent .
o Case 2: an adjacent sibling of is a -node or a -node.
Transfer operation:
1. We move a child from to
2. We move an item from to
3. We move an item from to
After a transfer, no underflow occurs.
Deleting takes time. Note that searching, inserting, and deleting all take time.
7 | P a g e CSI 2110 Summary, Fall 2012
Hash Tables
Problem A / Address Generation: Construction of the function . It needs to be simple to calculate,
and must uniformly distribute the elements in the table. For all keys , is the position of in the
table. This position is an integer. Also, ( ) if . Searching for a key and inserting a key
(all dictionary ADT operations) takes time. We have the function ( ) where we
have the two following sub-functions:
1. Hash code map
o They reinterpret a key as an integer. They need to:
i. Give the same result for the same key
ii. Provide a good "spread"
o Polynomial accumulation
We partition the bits of the key into a sequence of components of fixed length,
.
Then we evaluate the polynomial at a fixed value
, ignoring overflows.
Especially suitable for strings.
o Examples
Memory address (we reinterpret the memory address of the key object as an integer,
this is the default hash code for all Java objects)
Integer cast (Reinterpret the bits of the key as an integer)
Component sum (We partition the bits of the key into components of fixed length and
we sum the components).
2. Compression map
o They take the output of the hash code and compress into the desired range.
o If the result of the hash code was the same, the result of the compression map should be
the same.
o Compression maps should maximize "spread" so as to minimize collisions
o Examples
Division: where is usually chosen to be a prime number (number
theory).
Multiply Add Divide (MAD): where and are nonnegative
integers such that .
Problem B / Collision Resolution: What strategy do we use if two keys map to the same location ?
Load factor of a Hash table: where is the number of elements and is the number of cells. The
smaller the load factor, the better.
Linear Probing: We have . Consider a hash table A that uses linear probing.
In order to search, we do the following:
:We start at cell , and we probe consecutive locations until one of the
following occurs:
o An item with key is found
o An empty cell is found
o cells have been unsuccessfully probed
To handle insertions and deletions, we introduce a special object, called AVAILABLE, denoted , which
replaces deleted elements.
8 | P a g e CSI 2110 Summary, Fall 2012
: We search for an item with key . If such an item is found, re
replace it with the special object and we return element . Otherwise, we return
: We throw an exception if the table is full. We start at cell . We probe
consecutive cells until one of the following occurs:
o A cell is found that is either empty or labelled
o cells have been unsuccessfully probed
We store item in cell .
Performance of Linear Probing: Average number of probes is where is the loading factor. Even if
it'd 90% full, on average you'll find it in 5 or 6 tries, so it, which is very good. In the worst case,
hash is not better than AVL, when clustering occurs, it's .
Quadratic Probing: We have . The problem with this is that modulo is hard
to calculate and it only visits half of the table but it's not a big deal. Similar problem: you avoid linear
clustering, but every key that's mapped to the same cell will follow the same path. There's more
distributed clustering, which should be avoided. Called secondary clustering.
Double Hashing: We have where is the primary hashing function
and is the secondary hashing function.
Bubble Sort
You literally bubble up the largest element. Move from the front to the end, Bubble the largest
value to the end using pairwise comparisons and swapping. Once you go through the whole array,
you start from the first element again and go through the same way.
In order to detect that an array is already sorted so we don't have to go through it again
unnecessarily, we can use a boolean flag. If no swaps occurred, we know that the collection is
already sorted. The flag needs to be reset after each "bubble up".
Complexity is .
Recursive Sorts
Divide and Conquer paradigm:
Divide: div
More
Less