Compsci Final Exam review.docx

17 Pages
277 Views
Unlock Document

Department
Computer Science
Course
Computer Science 1011A/B
Professor
Andrew Roberts
Semester
Winter

Description
Compsci Exam review Data Structures - Data Structure: organization of data on a storage medium (stores addresses of data) o Exact organization affects how efficiently operations can be done - Most common operations: 1. Add data to data structure 2. Search for data 3. Modify data 4. Delete data - Storage medium: something that stores bytes (ex. Memory chip, hard disk, cd…) o Every byte has an address (single number)  16 bit addresses = 2^16 bytes  32 bit addresses = > 4 million bytes (4gb) o For a block of memory with M bytes:  The first byte has an address of 0 and the last one has an address of (M-1) - Contiguous data structure: one possible data representation o Represented by length of item followed by the data for that item o Addition/deletion of element takes a long time o Immediately after the next item then occurs - Linked list data structure: every item has its own block of memory at any arbitrary location - Addition/deletion of item is much quicker - At the end of first item, you address the byte of the next item (next item addresses item of ending of previous item) - Pointer representation: shows Linked list data structure visually (addresses are seen as “pointers”) o \ at end is a null which terminates sequence o Could be used to decode Huffman codes Trees – allow for fast searching and sorting of data - Consist of: o Keys: the content we already know Inside a Node o Value: what we are searching for - When searching for data in a linked list you use Lexicographic order o String A is less than string B if A would be before B in a dictionary o String A is greater than string B if A would be after B in a dictionary  If you have Niger and Nigeria, then Niger would come first as it is in alphabetical order - Searching for data in linked list: o Start at first node (set currentPtr = startPtr) o While key in current node is less than k: o (set currentPtr to the nextPtr inside the node pointed to by currentPtr) o If key in current node is equal to k, success [Else (k is greater than key in current node): k can’t be in list] o To find average you add up how many steps it takes to locate each key and then divide by the number of keys (Arg=1 + Bol=2 + · · · + ven=13) / 13, i.e. 7 approximately n/2) where n = number of items in list Binary Search Trees - Every node has a key, value and two pointers o Left pointer points to subtree with keys less than that in the original node o Right pointer points to subtree with keys greater than that in original node - Finding the key: k o If key in current node is greater than k, move down and left (set currentPtr to theleftPtr inside the node) o If key in current node is less than k, move down and right (set currentPtr to the rightPtr inside the node) o To find the average: you take the # of steps it took to get to your destination * the # of nodes on that level of the tree  Average = (1*1 + 2*2 + 3*4 + 4*6)/ 13 - How many nodes can fit on each level? o In a binary tree with N levels, each level can hold: 2^N – 1 nodes  4 levels = 15 ( 2^4 – 1) Logarithms - Log base 2 = the inverse of 2 to the power of… - N= 2^m then m = log2(n) (log base 2 of n) o Log2(64) = 6, and log2(32) = 5 - To find the average of a balanced binary tree it is much more efficient to take log2(n) (n = number of items on tree) o Better than with a linked list Terminology - Root node = first node on tree (has no parents) - Leaf node = last nodes on tree (have no children) - Height of tree = maximum distance from root to leaf (in a Balanced binary search tree, the heights of the left and right sub-trees of a node differ by at most 1 Two other forms of trees 1. Tries (does not have to be binary, a node can have more than two children) - Each node contains a character (child nodes contain different character than one before it) - Child nodes are in lexicographical order (alphabetical order) - Node may be marked - Path from root to marked node contains a word associated with the node (net, newt, next …etc) - More efficient when: o Many children per node (dense branching) o Characters to search for come in one at a time - typically used for Google search – predicts what you are searching for as you type 2. Heaps (good for quickly sorting data in a different order) – A binary tree in which: - Every node has a smaller value than its parent - Tree is balanced (left and right nodes differ in height by at most 1) - Population is now the key and the name is the Data - No left to right ordering - Algorithm for Heap (most efficient sorting algorithm yet): (n nodes in a heap) o Put countries in heap where pop is the key (n · log2(n) operations) o Remove node and put it next in list o reform the heap (log2(n) operations) - Algorithm for re-forming heap (example in class where country took node with higher population and reformed the heap tree moving each country up to the previous spot of the old one) o While some node has children but no country:  See which child of that node has higher population  Take country from child node and give it to parent node Google - Founded by Larry Page and Sergey Brin o Marissa Mayer was VP until she left to become the CEO of Yahoo (2012)  “google is easy, making google is hard.” Mayer’s 4 key components to search: 1. Comprehensiveness (need to crawl and index large percentage of web) 2. Relevance (rank search results to put those most useful to users first) 3. Speed (must do everything quickly) 4. User experience (need to be easy and enjoyable) Web-Crawling (Comprehensiveness) - Many web crawlers, each one: o Visits multiple pages by following links o Caches copy of each page (saves it on google) and extracts the information  Extracted info makes searches faster and assigns importance to pages o When nothing to do the web-crawlers visit a queue of pages and puts every link on those pages into the queue o News sites and popular ones are crawled more frequently than others o Stores each page found in cache with a docID Information Extraction (speed) – one of main factors contributing to googles growth - One Hit: small package of data about word occurrence in a document (place in an inverted index) o The Doc ID o The word (unique ID) o Position: number of bytes from start of document that the occurrence happens o Emphasis: was word in capital letters, in a heading, in a bigger font? Page Rank Algorithm - Assigns importance rankining to each web document o Importance based on pages the pages that certain page is linked to - Invented by larry paige - Key to googles success - Example: 1. Assign equal rank to each document (red) 2. Split up rank evenly over all the links going out from the document (blue) 3. Delete old ranks 4. Recomputate ranks of documents as the sum of ranks coming into that certain document over the links 5. Store new ranks and repeat as many times as you want 6. Rankings eventually settle down to an equilibrium state (eventual ranking reflects true importance of documents) Google’s User experience – Home page - Simple - Contains: o HTML links, PNG (default logo), JPEG (logo, sometimes), animated GIF (logo), Javascipt (logo or search bar) Google Text Completion (uses a Trie tree) - For every character you type: o Javascript sends a query to google web server o Google returns a list of likely completions for your word - Pros: o Enhances user experience o Directs users to already prepared lists o Gives google a headstart on searching Ranking Search Results - Builds list of items that have all your search terms using page rank - Uses a heap to sort docs with highest score first Protocol – regional/international standards on sequences of bits sent to convey data o Basis of reliability of internet communication o Implemented in software + hardware Process of communicating a message on The Internetz 1. Tell http (hypertext transfer protocol) what you want a. HTTP: specializes in transferring web documents – Depends on TCP and IP 2. Http then tells the TCP (transmission control protocol) to send the request/message a. TCP reliably transfers data on internet 3. TCP breaks down request/data/message into blocks and transfers those to the IP (Internet Protocol) a. IP: simple flexible protocol for transferring data quickly 4. IP sends data in the right order to correct TCP where message is being delivered 5. TCP conveys it to correct HTTP Internet Protocol – basis of the internet - All machines on internet implement IP - Quickly routes short messages or “packets” - Unreliable o Doesn’t guarantee delivery of IP packets  Described as a datagram service (a datagram is a comp. protocol message that might not be delivered) - IP Packets: o total length is 2^16-1 (65,535 bytes) o contains IP packet header  destination address (32 bits) (where packet is going)  source address – 32 bits (numeric ip address of sender)  total length of packet – 16 bits (o – 2^16 -1)  version of IP being used (first 4 bits – usually 0100) o Contains IP payload  Actual stuff the user needs to transport (between 0 and 65,515 bytes (65,535 – 20)) - Process of IP packet o Sender fills out header + payload, then sends out packet o Packet is verified at every hop or stop (computer or router)  Routers receive packets, look at destination address and forward them o Forwarded to destination where it is copied into memory - Packets are dropped when: 1. There is a transmission error (different values in header than computer value) 2. If “time to live” TTL has been exceeded a. Packet header contains TTL (8 bits) b. At every hop 1is subtracted from TTL c. If TTL reaches 0 then packet is dropped 3. Traffic congestion a. Some packets have same destination address (must wait for one before to be sent) b. Some packets may be overwritten in memory by new incoming packets - IP payload consists of TCP Segment (maximum of 65,515 bytes) o Contains segment header (20 and 60 bytes) Consists of :  Destination port (16 bits) – port on receiving computer to direct segments to  Source port (16 bits) – port on sending computer  Sequence number (32 bits) – identifier of segment in a sequence of segments o Contains TCP Payload (0 – 65,495) (65,515 – 20 bytes) o Two Main types of segments:  Null Payload segments (segments with 0 bytes in payload) to help TCP agents communicate  User Segments (segments with >0 bytes in payload) with actual user data Transmission Control Protocol – handles disassembly, transmission, retransmission and reassembly of data - Controls flow of data on internet - Consists of TCP agent (TCP agents send TCP segments to other agents) - Allows computers to detect when packets are dropped and to regulate how fast/slow packets are sent - When a program wants to send data: o Breaks data up into TCP segments o Sends each segment as payload in separate IP packet o Each segment could travel by a different route - TCP’s main job (make sure all segments are received and reassembled in the correct order) Transmitting long messages - For a message of 200,00 bytes the TCP breaks it up as 3 segments with 60,000 bytes in payload and one with 20,000 bytes in payload (sends in order) - Sequence number in each segment is first byte in the segment (segment 0 = sequence number (0), segment 1: sequence number = 60,000) - Receiver looks at all segments received if and creates an ACK number (a number confirming it has all segments based on their sequence numbers) and sends it back to sender - Sender looks at ACK number, if correct then its good however if it’s not it re-sends all data that was missing Pros and cons of TCP and IP - Pro: Will transfer data very reliably - Pro: Guarantees no corruption or loss of data if all segments are received - Con: sometimes quick delivery loses reliable delivery - Con: sometimes over wireless many IP packets are lost HTTP (Hypertext Transfer Protocol) - Developed by Tim Berners Lee - The TCP payload carries a HTTP message Terms - User agent : e.g. a Web browser program; something used to carry out user’s wishes - Resource : anything someone might want to get from another computer - Web server : program running on a host, accepting requests from user agents for resources - Client: word used for a user agent when emphasizin
More Less

Related notes for Computer Science 1011A/B

Log In


OR

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


OR

By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.


Submit