Introduction to Information Technology ITEC 1000 – Winter 2010 – Peter Khaiter Lecture 3 – Data Formats – Jan 18 Data Forms − Human communication o Includes language, images and sounds. − Computers o Process and store all forms of data in binary format. − Conversion to computer-usable representation using data formats o Define the different ways human data may be represented, stored and processed by a computer. − Proprietary formats o Unique to a product or company. o E.g. Microsoft Word, Word Perfect. − Standards (evolve in two ways): o Proprietary formats become de facto standards (e.g. Adobe PostScript). o Invented by an international standard organization (e.g. Motion Pictures Experts Group, MPEG). Alphanumeric Data - Characters (r, T), number digits (0-9), punctuation (!, ;), special purpose characters ($, &). - Four codes/standards to represent letters and numbers: o BCD (binary-coded decimal). o Unicode. o ASCII (American standard code for information interchange). o EBCDIC (extended binary coded decimal interchange code). ASCII Features - Developed by ANSI (American national standards institute). - Defined in ANSI document X3.4-1977 - 7-bit code - 8 bit is unused (or used for a parity bit or to indicate “extended” character set) 7 - 2 = 128 different codes - Two general types of codes: o 95 are “Printing” codes (displayable on a console) o 33 are “Control” codes (control features of the console or communications channel) - Represents o Latin alphabet, Arabic numerals, standard punctuation characters o Plus small set of accents and other European special characters (Latin-I ASCII) EBCDIC - 8-bit code - Developed by IBM - IBM and compatible mainframes only - Rarely used today (common in archival data) o Character codes differ from ASCII - Conversion software to/from ASCII available Unicode - Most common 16-bit form represents 65,536 characters - ASCII Latin-I subset of Unicode o Values 0 to 255 in Unicode table - Multilingual: defines codes for o Nearly every character-based alphabet o Large set of ideographs for Chinese, Japanese and Korean o Composite characters for vowels and syllabic clusters required by some languages - Allows software modifications for local-languages Collating Sequence - Collating Sequence – the order of the codes in the representation table - Determines sorting and selection of the alphanumeric data - Collating Sequences are different in ASCII and EBCDIC: o Small letters precede capitals in EBCDIC; reverse in ASCII o Numbers collate first in ASCII; in EBCDIC, last Two Classes of Codes - Printing characters o Produced output on the screen or printer - Control characters o Control position of output on screen or printer o Cause action to occur o Communicate status between computer and I/O device Alphanumeric Input: Keyboard - Scan code o Two different binary scan codes generated  when key is struck and when key is released o Converted to Unicode, ASCII or EBCDIC by software in terminal or PC o Received by the host as a stream of text and other characters, i.e. in the sequence typed - Advantage o Easily adapted to different languages or keyboard layout o Separate scan codes for key press/release for multiple key combinations  Examples: shift and control keys OCR (Optical Character Recognition) - Scans text and inputs it as character data - Special OCR software required - Used to read specially encoded characters o Example: magnetically printed check numbers - Attempts to recognize hand-written input (limited, only carefully printed) Bar Code Readers - Used in applications that require fast, accurate and repetitive input with minimal employee training - Examples: supermarket checkout counters and inventory control - Alphanumeric data in bar code (i.e., 780471 108801 90000) read optically using wand that converts them into electrical binary signals - A bar code translation module converts the binary input into a sequence of number codes, one code per digit, and then translated to Unicode or ASCII. Other Alphanumeric Input - Magnetic stripe reader – alphanumeric data from credit cards - Voice o Digitized audio recording common but conversion to alphanumeric data difficult  Requires knowledge of sound patterns in a language (phonemes) plus rules for pronunciation, grammar, and syntax Image Data - Photographs, figures, icons, drawings, charts and graphs - Two approaches: o Bitmap or raster images of photos and paintings with continuous variation (e.g., GIF, JPEG) o Object or vector images composed of graphical shapes like lines and curves defined geometrically - Differences include: o Quality of the image o Storage space required o Time to transmit o Ease of modification Image Input - Image scanning (moves over the image converting dot by dot into a stream of binary numbers, pixels, representing black or white, or levels of gray, or of a colour) – bitmap image - Digital/video cameras – bitmap image - Pointing devices (mouse, pen)- object image Bitmap Images - Each individual pixel (pi(x)cture element) in a graphic stored as a binary number o Pixel: A small area with associated coordinate location o Example: each point below represented by a 4-bit code corresponding to 1 of 16 shades of gray Bitmap Display - Monochrome: black or white o 1 bit per pixel - Gray scale: black, white or 254 shades of gray o 1 byte per pixel - Color graphics: 16 colors, 256 colors, or 24-bit true color (16.7 million colors) o 4, 8, and 24 bits respectively Storing Bitmap Images - Frequently large files o Example: 600 rows of 800 pixels with 1 byte for each of 3 colors  ~1.5MB file - File size affected by o Resolution (the number of pixels per inch) o Amount of detail affecting clarity and sharpness of an image - Levels: number of bits for displaying shades of gray or multiple colors o Palette: color
