CISC320 Lecture Notes - Lecture 14: Substring, Computer Forensics, The Algorithm
Document Summary
Input description: text string t of length n, pattern string p of length m. Problem: find the first (or all) instances of pattern p in text t. Check at each position of t for the pattern p. ((n m + 1) m) complexity worst case. Running time equals time to match strings since there is no preprocessing. Rabin and karp proposed an algorithm that works well for strings as well as other related objects. Uses preprocessing (m) time and worst-case runtime is ((n m + 1) m) The algorithm is based on hashing: we compute a hash function on p and every m-character substring of t. If these two strings are identical, hash values will be the same. Hash values will usually be different if strings are different. Given a pattern p = p[1 m] and a text t[1 n]: Let us denote the decimal value of the length m substring ts = t[s+1 s+m] for s=0, n-m.