false

Class Notes
(835,107)

Canada
(508,933)

University of Waterloo
(18,569)

Computer Science
(798)

CS 240
(49)

Therese Biedl
(26)

Lecture

Unlock Document

Computer Science

CS 240

Therese Biedl

Fall

Description

Dictionary as: Unsorted array: Multi-Dimensional data Dictionaries for text strings
Claim: sorting items by their probabilities of Storage: Binary Tries: time O(# of bits in x) space O( sum
access in non-increasing order minimizes the cost -Combine multi-dimensional key into one key: of the bits in all keys) Insert: 1- leaf: no action is
of Search. impossible to search by one aspect of keys required 2- internal: add nodes or ( - no bits left :
Proof Idea: If not in the right order, then if we - Use dictionary for each dimension: wastes space nothing - bits left :add )
exchange two items which are in the wrong -Use dictionaries for multi-dimensional data: Fix prefix issue: - allow interior nodes to store
order, then the cost decreases. quad, k-d, range keys – add $ at end of every word
Move-To-Front heuristic: Upon a successful Running Time 1D: Equal(n) reportall keys , Tries for String: alphabet sigma, store multiway
search, bring the element forward to the front. O(k+logn) where k is # of reported elements tree, each node has up to sigma + 1 children,
(used when not knowing prob ahead of time)The Quadtrees: Consider a square that encloses all might have unnecessary node
cost of shift is already dealt with in the search, points .Break it into four quadrants.Break each Compressed Tries: eliminate nodes with only one
works well in practice quadrant into quadrants recursively until every child , & nodes store an index indicating the
Sorted Array: square contains at most one point. Worst Case character it splits by
Interpolation Search: when dividing quadrants but not efficiently
Idea: take advantage of the value of the key to [x.x+x’/2] x [y,y+y’/2] , Range search Equal(n) String Matching
determine its location even if empty. very easy to compute & handle , Applications: information retrieval, compression, data-
mining, classification
Steps: - search(x) in array in index range[L,R] space wasteful , they don’t reflect the nodes, they
-compute ration t = (A[R]-x)/(A[R]- A[L]) just divide into quadrants Naïve alg (brute Force) : worst input P= (a^m-1)*b , T = a^n
- compare result with key at floor(t*L + (1-t)*R) KdTrees: kd-trees split rectangle into two part of , worst case performance O (n-m+1)*m , start with first char
and decide the part to recurse in median of the point in it. Build kd time in pattern and skip by one in Tif miss match
Run Time: O(n) worst case but if keys are random Equal(nlogn) height Equal(logn) assume no two Boyer-Moore ( Matching P backwards ) alg: heuristics :
then O(loglogn) search cost points with same x,y coordinates, fix by breaking - Character jump: check char occurrence in pattern & shift
- Suffix shift: check suffix occurrence in pattern & shift
Gallop Search: ties according to the other coordinate, costly after
Good for large arrays The number of comparison insert & delete causes unbalance, so do periodic No improvement in worst case to naïve P=ba..a T=aa.aa
is still O (logn) (we have at most doubled the rebalancing. Range search O(k+G) & G is the # of Worst case running time O(m*(n-m+1) + |alphabet| )
number of comparisons compared to binary regions unsuccessfully & not fully in R. # of On typical English text the alg probes 30% of chars in T
search. region that intersect a line satisfies recurrence KMP alg: Failing fxn run time O(m) , KMP run time : 1-
Ordered Lists: relation Q(n)=2*Q(n/4)+1 , and solves to increase I & j by one, 2- decrease j by atleast one, 3- increase
Insert/delete: O(1) , search O(n) ( no BS) Q(n)=O(root(n)) I by one .(1) and (3) execute in time at most n, and (2)
Skip Lists: Complexity RSKD O(k+root(n)), not that easy but executes no more often than (1),hence a total of 2n
Idea: each key (node) has more than one link ( we no bad cases like quadtree comparison. Hence total run time O(m+n)
make binary search work for lists , Nodes are Range Search: Each item will be stored Suffix tree: idea: Preprocess text T rather than pattern P.
ordered in increasing order and have multiple repeatedly O(logn) times. Use BST inside BS, Observation: P is a substring of T ifand only if P is a prefix
of some suffix of T. build a compressed trie that stores all
varied links . Expected time for all operations is space O(nlogn)
Equal(log n). In practice, performsbetter Insert point by x, from leaf walk back up & insert suffixes of text T. -insert suffixes in decreasing order of
than balanced BSTs. P[#links =i] = 1/ 2^I , lower point in subtrees of the nodes. Problem length.- if a suffix is a prefix of another suffix, we do not
or increase the dummy node heightwhen needed rebalancing at most liner time . Range search insert it. Store two indexes l , r on each node v (both
O(k+log^2(n)) internal nodes and leaves) where node v corresponds to
Naïve Boyer-Moore KMP Orthogonal Range Search : we specify a range for substring T[l ..r ]. How: goes throu the tree looking for
Preprocess P - O(m+|sigma|) O(m) - Brute-Force prefix matching P , if node reached is less than M then
Preprocess T - - - certain O(n^2)s of the Naive(P[0..m −1],T[0..n − 1])
Search P in T O(n*m) O(n*m) O(n+m) O(m) for i 0 to n − m do unsuccessful, only check for at least M for successful search
Extra

More
Less
Related notes for CS 240

Join OneClass

Access over 10 million pages of study

documents for 1.3 million courses.

Sign up

Join to view

Continue

Continue
OR

By registering, I agree to the
Terms
and
Privacy Policies

Already have an account?
Log in

Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.