CS61A Lecture 13: Strings and Sequence Processing

Strings are an Abstraction Representing data '200', '1.2e-5' 'False' '(1, 2)' Representing language & words Representing programs String Literals Have Three Forms Use one quote Use double quote Use non-latin characters Single quoted and double quoted are equivalent A backslash "escapes" following character (\n) "Line feed" character represent new line Strings are Sequences A element of string is itself a string, but only character Use eval to evaluate string Count non-overlapping strings with .count String Membership Differs from Other Sequence Types The "in" and "not in" operators match substrings Working with strings, care more about words than characters Count method also match substrings Encoding Strings American Standard Code for Information Interchange (ASCII) Layout chosen to support sorting by character code Rows 2-5 useful 6-bit (64 element) subset Control characters designed for transmission (top two rows) Unicode Standard 109,000 characters 93 scripts (organized) Enumeration of character properties, such as case Supports bidirectional display order A canonical name for every character U+0058 Latin Capital Letter X UTF-8 Encoding UTF (UCS (Universal Character Set) Transformation Format) Unicode: Correspondence between characters and integers UTF-8: Correspondence between integers and bytes A byte is 8 bits, encode 0-255 (using 0s and 1s) Variable-length encoding: integers vary in number of bytes required to encode them In Python: string length is measured in characters, bytes length in bytes Sequence Problems Sum even members of first n Fibonacci numbers List letters in acronym for a name, which include first letter of each capitalized word Mapping a Fu
