CS431 Chapter Notes - Chapter 2: Mapreduce, Memory Management, Instruction Set

18 views20 pages

whitemammoth538

17 Apr 2019

School

University of Waterloo

Department

Computer Science

Course

CS431

Professor

Adam Roegiest

For unlimited access to Textbook Notes, a Class+ subscription is required.

Document Summary

Hadoop is still too low-level than we like. Example: find the top 10 most visited pages in each category. Pig is slower - higher level language built on mapreduce. Take advantage of pig as glue code for scale-out plumbing. We have a need to design a data processing language "from scratch" Data-parallel dataflow languages - pig, dryad(linq), flume(java), spark. Want to apply a bunch of operations to compute some result. Assumption: static collection of records (what"s the limitation?) However, map only solves some class of problem a. b. Problem: writing to hdfs is always slow - resiliency. Has to go through hdfs to do map-map. Read data and feed into reduce fundamental con of reduce-reduce. Size of input is not known until runtime. net constructs for combining imperative and declarative programming. Program compiled into computations that run on dryad. Want to apply a bunch of operations to compute some results.

CS431 Chapter Notes - Chapter 2: Mapreduce, Memory Management, Instruction Set

Document Summary

Get access