CS431 Chapter Notes - Chapter 2: Mapreduce, Memory Management, Instruction Set

18 views20 pages

Document Summary

Hadoop is still too low-level than we like. Example: find the top 10 most visited pages in each category. Pig is slower - higher level language built on mapreduce. Take advantage of pig as glue code for scale-out plumbing. We have a need to design a data processing language "from scratch" Data-parallel dataflow languages - pig, dryad(linq), flume(java), spark. Want to apply a bunch of operations to compute some result. Assumption: static collection of records (what"s the limitation?) However, map only solves some class of problem a. b. Problem: writing to hdfs is always slow - resiliency. Has to go through hdfs to do map-map. Read data and feed into reduce fundamental con of reduce-reduce. Size of input is not known until runtime. net constructs for combining imperative and declarative programming. Program compiled into computations that run on dryad. Want to apply a bunch of operations to compute some results.

Get access

Grade+20% off
$8 USD/m$10 USD/m
Billed $96 USD annually
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
40 Verified Answers
Class+
$8 USD/m
Billed $96 USD annually
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
30 Verified Answers