CS431 Chapter Notes - Chapter 8: Graph Partition, Big Data, Mysql
Document Summary
Schimmy: separate the two dataflows, shuffle only the messages. Basic idea: merge join between graph structure and messages. To coarsen the graph you need to identify dense local regions. To identify dense local regions quickly you to need traverse local edges. But to traverse local edges efficiently you need th e local structure! Computational model based on bulk synchronous parallel (bsp) Receives messages directed at it from previous superstep. Emits messages to other vertices (for the next superstep) Is woken up if new messages received computation halts when all vertices are inactive. Vertices are hash partitioned (by default) and assigned to workers. Master tells all workers to advance a single superstep overhead to ensure everything works correctly. Worker delivers messages from previous superstep, executing vertex computation. Messages sent asynchronously (in batches) - not perserving order. Worker notifies master of number of active vertices. Handle i/o reading and writing the graph computation/messaging of assigned partitions.