Class Notes (1,200,000)
CA (650,000)
UW (20,000)
CS (1,000)
CS240 (60)
Lecture

# CS240 Lecture Notes - Binary Search Tree, Avl Tree, Radix Sort

Department
Computer Science
Course Code
CS240
Professor
Therese Biedl

This preview shows half of the first page. to view the full 1 pages of the document.
Algorithm: A step-by-step process for
carrying out a series of computations.
Program: An implementation of an
algorithm.
Running time or memory requirement
of programs are dependent on
hardware, OS, and compiler.
Efficiency depend on(time,memory)
Worst-case:Tworst(n) = max {T(I)| |I| = n}
Best-case: Tbest(n) = min {T(I)| |I| = n}
Average-case: Tavg(n) = 1/{T(I)| |I| = n}
Notation Definition
f (n) is O(g(n)) there exist constants c > 0 and n0 > 0
s.t. ((all)n => n0)(0 <= f (n) <= cg(n))
f (n) is Omega(g(n)) there exist constants c > 0 and n0 > 0
s.t. ((all)n => n0)(0 _<= cg(n) <= f (n))
f (n) is equal(g(n)) there exist constants c1, c2 > 0 and n0 > 0
s.t. ((all)n => n0)(0 <= c1g(n) <= f (n) <= c2g(n))
f (n) is o(g(n)) for all constants c > 0, there exists a constant n0 > 0
s.t. ((all)n _=>n0)(0 <= f (n) < cg(n))
f (n) is smallomega(g(n)) for all constants c > 0, there exists
constant n0 > 0 >
s.t. ((all)n _ n0)(0 <= cg(n)) < f (n))
x=log(a)b -> a^x = b
log(a)b = log(c)a/log(c)b
log a + log b = log a*b
sum(0 to n) i = n(n+1)/2
sum(0 to n-1) q^i =(1-q^n)/(1-q)
sum(1 to n) 1/I = O(logn)
Priority queue
Unsorted array
1- (selection sort)
Insert: O(1)
Extract Max : O(n)
Sorted Array
2- (Insertion sort)
Insert : O(n)
Extract Max: O(1)
3- (heap)
Insert: O(logn)
EMax: O(logn)
Build PQ: O(n)
Little space used
Sorting in O(nlogn)
Heap: is a type of binary tree,stores items with keys,all full, last level to the
left, key of parent larger than children(max-heap), height: Equal(logn)
Advantage storing way: easier to implement, mores space efficiency
Indexes: Left( 2*i ), Right( 2*i + 1 ) , parent( floor(i/2))
Building heaps:
Solution1: start with empty heap & insert one by one & perform bubble
up (1toN) worst case time Equal(n*logn)
Solution2: heapify which bubbles down(n/2 to 1) worst case Equal(n)
Sorting using heaps O(n*logn)
QuickSort Running Time
BestCase splits into two Equal(n*logn)
WorstCase min or max Equal(n^2)
AverageCase split into (k-1)&(n-k) Equal(n*logn)
Auxiliary Space Equal(n) because recursion depth
Recurse on small parts causes auxiliary O(logn)
A randomized algorithm is an algorithm that uses
random numbers inits execution & depends on it
T(exp)(x) = E[T(x, R)] = Sum T(x, R).Pr [R].
QuickSort is the fastest comparison-based sorting
algorithm in practice in the average case.
Linear Time
Counting Sort: stable , but need to set value {0..k}
Running time : Equal(n+k) when k = O(n)
Radix Sort: elements with same number of digits(d)
Each digit has value in {0…k-1}
Running time: Equal(d(n+k)) when k = O(n) and d
is constant
Unordered array E(1) E(n) E(1)
Ordered array E(n) E(log n) E(n)
Binary search tree E(height) E(height) E(height)
Balanced node: A node is balanced if the heights of its
two subtrees differ by at most one (an empty subtree is
defined to have height 1).
AVL Tree: A Binary Search Tree such that every node is
balanced, Height AVL is O(logn)
Insert O(logn) perform at most one rotation or double
Delete O(logn) perform as many times as needed
Rotations in O(1)
B-Trees: A generalization of binary search trees
- Node x stores k keys in non-decreasing order and
references to its k + 1 children.
- Keys of subtrees at children are between the
corresponding boundary keys at x.
- all levels are the at the same level
- B-tree of order M: each node has at most M − 1
keys (and therefore atmost M children), also each
non-root node has at least M/2 1 keys
(and therefore at least M/2 children)
A B-tree of order M with n keys has
Height Equal(log(M/2) n) = Equal(log n).
Search Time: O (height. Log(2)M) = O (log(M/2)n)
*log(2)M = O (log n).
Insert: put in leaf, if leaf is full ( aka m-1 keys ) then
preemptively split into two ( roof(M-1) -1 ) &
floor(M/2) ) put median as parent
Delete: if leaf delete, if internal swap then delete, if
leaf has less than ( roof((M/2)-1) then Replenish
- Transfer: borrow take from sibling and give to
parent, and take from parent put in node
- Merge: if transfer doesn’t work, then just merge
with sibling and bring down parent
Smallest value of M is 4
Hashing:
- Keys range from {0…U-1} buckets or slots
- duplicates cause collisions
- Space O(U) ; insert, delete, search take Equal(1)
- Uniform hashing :make it equally likely
- interpret keys as natural numbers, or division method h(k) = k mod M and M is a prime, multiplication method
h(k) = floor(M(kA floor(kA) ) ) where 0 < A < 1 is a suitable constant
Dealing with collisions
Chaining : each bucket is a unsorted linked-list, load factor alfa = n/M where n is the # of elements ,
expected length is alfa, for search expected time is Equal(alfa + 1) , keep load factor constant, no explicit
need to rehash
- Opening addressing: fxns to store key, use probe sequence , keep load factor smaller than 1 , rehashing is
required often, fast computation(linear hashing) vs better average time( double hashing) , deletion is
problematic , and no space is wasted
-linear probing : h(k,i) = h(k) + i mod M ; easy to compute , log runs of occupied buckets tend to build
Up ( primary clustering )
-quadratic = h(k,i) = h(k) + c1i + c2i^2 mod M l fairly easy so compute, carefully chose c1,c2 to obtain
permutation ( no primary clustering but for two same hash values will have same probe sequence
-double hashing: h(k,i) = h1(k) + ih2(k) mod M ; must chose h2 carfully to avoid clustering, not too
hard to perform and compute in practice
Deleting will cause the slot to be tagged as deleted slot , this requires re-hashing to avoid over accumulation of
deleted cells
Average search in linear probing successful ½(1+1/(1-alfa))
-unsuccessful ½(1+1/(1-alfa)^2)
Average seach in double hashing: -successful 1/alfa * ln(1/(1-alfa))
-unsuccessful 1/ (1-alfa)
Primary clustering :it is the phenomeunon where the target blocks of consecutive non-empty buckets are most
likely to grow yet larger, it slows down running time for linear probing (long probe sequence)
In practice, usually hasing is preferred over balanced trees, & avoid patterns in the input data
Predecessor: at most right in leftsubtree Successor : at most left in rightsubtree
Extendable hashing:
-hashing is external memory ( we need to min the #
of memory blocks transfers) , build a hashtable
where we can add more external memory without
the need to rehash everything
-problems with hashing:
Open address incurs many block transfers
Chaining spreads to more than a single block
Rehashing
Hash Function h:U -> {0,1}^L(L is binary sequence)
Parts:
-Root: is a hash table with refrences to leaf blocks;
indexes by d leading bits of hash value
-leaves: leaf block in external memory, each of size
at most S(keys are ordered in leaf)
The directory ( root) has size 2^d ( d is the order)
where d <=L
Local depth K , keys within a block degree agree on
the first K bits , all bits with K leading bits are in the
block
For a block with local depth K, 2^(d-k) references
point to that block
f (n) = o(g(n)) if L = 0.
f (n) = equal(g(n)) if 0 < L < infinity
f (n) = smallomega(g(n)) if L = infinity
Preorder: root-L-R
Inorder:L-root-R
postOrder: L-R-root
Extendable hashing operations: Insert : normally or
- block is full = S and ( k > d ) : split into two blocks,
separate according to k+1 bits , update references ,
increase k for those blocks, repeat until k=d
- Block is full =S and ( k =d ) : double the size of
direct ( d=d+1) update references, then split
Disadvantage: wasteful space, but if uniform
distributed then pages expected 69% full
The hashtable is much smaller than # of stored
keyes therefore is may fit in main memory , most
operations should take access o(1) leaf pages, for
more space add a block( rarly need to copy to hash
table, never move all items in dictionary
Else if right left = 1 & A[left] = x then return left
Else step = 1
While (left + step <= right )
If A[left+step] < x then
Step= step x 2
Else GallopSearch( x, A , left+step/2, left+step)