CS431 Chapter 7: Mutable Table

16 views24 pages

whitemammoth538

17 Apr 2019

School

University of Waterloo

Department

Computer Science

Course

CS431

Professor

Adam Roegiest

For unlimited access to Textbook Notes, a Class+ subscription is required.

Chapter 7 – Mutable Table

The Fundamental Problem - We want to keep track of mutable state in scalable manner

Assumptions

State organized in terms of logical records

•

State unlikely to fit on single machine, must be distributed

•

Use RDBMS

Relational model with schemas

•

Powerful, flexible query language

•

Transactional semantics: ACID

•

Rich ecosystem, lots of tool support

•

NoSQL for scalability of flexibility

Horizontally scale “simple operations”

Replicate/distribute data over many servers

Simple call interface

Weaker concurrency model than ACID

Efficient use of distributed indexes and RAM

Flexible schemas

NoSQL - core ideas

Partitioning (sharding) - increase scalability and to decrease latency

Do this on distributive search engine

Replication - to increase robustness (availability) and to increase throughput

Caching - to reduce latency

Key-Value Stores: Data Model

Stores associations between keys and values

Keys are usually primitives

Int, strings, raw bytes

○

Values can be primitive or complex

Tl;dr

Single row transactions

Easy to implement, obvious limitations

Implement global consensus protocol for every transaction

Guarantee consistency, slow

Eventual consistency

Local machine + external machine have to reach consensus

Entity groups -groups of entities that share affinity

User + user's photos + user's posts

Google spanner

Timestamp - global way, T1 will always happen before T21.

Full ACID translations across multiple datacenters, across continents! External

consistency (= linearizability):

system preserves

happens-before

relationship among transactions

How?

Given write transactions A and B, if A happens before B then timestamp (A)

< timestamp (B)

Chapter 7 -Mutable Table

Friday, April 5, 2019

21:51

Unlock document

This preview shows pages 1-3 of the document.
Unlock all 24 pages and 3 million more documents.

Already have an account? Log in

Tl;dr

Single row transactions

Easy to implement, obvious limitations

Implement global consensus protocol for every transaction

Guarantee consistency, slow

Eventual consistency

Local machine + external machine have to reach consensus

Entity groups -groups of entities that share affinity

User + user's photos + user's posts

Google spanner

Timestamp - global way, T1 will always happen before T21.

Full ACID translations across multiple datacenters, across continents! External

consistency (= linearizability):

system preserves

happens-before

relationship among transactions

How?

Given write transactions A and B, if A happens before B then timestamp (A)

< timestamp (B)

Unlock document

This preview shows pages 1-3 of the document.
Unlock all 24 pages and 3 million more documents.

Already have an account? Log in

Values can be primitive or complex

Primitives: ints, strings, etc.

○

Complex: JSON, HTML fragments, etc

○

Key-Value Stores: Operations

Get and put

Multi-get, multi-put

Range queries, secondary index lookups

Key-value Stores: Implementation

Non-persistent: In-memory hash table <- may lose all your data in memory,

Persistent: Wrapper around a traditional RDBMS

What if data doesn’t fit on a single machine - partition the data, hash the key

○

Simple Solution: Partition:

Partition the key space across multiple machines

Hash partitioning, for n machines, store key k at machine h(k) mod n

○

Q: how do we know which physical machine to contact?

Quasi-idea - hash the machine

○

Each machine holds pointers to predecessor and successor

Gets routed to correct one in o(n) hops

Each machine holds pointers to predecessor and successor + "finger table" (+2, +4, +8)

Service Entry - one node holds

The key-service entry

Query the service registry and return the node

Latency not so long

Common model

If there's a new node, the entry of the new node will be passed on to each node by gossip

Replicate: plus 1 and minus 1 case

Circles don’t fail, adjust things appropriately

Replication in predecessor and successor node

Virtual Nodes

Don’t directly hash servers

Create a large number of virtual nodes, map to physical servers

Better load redistribution in event of machine failure

When new server joins, evenly shed load from other servers

Tl;dr

Single row transactions

Easy to implement, obvious limitations

Implement global consensus protocol for every transaction

Guarantee consistency, slow

Eventual consistency

Local machine + external machine have to reach consensus

Entity groups -groups of entities that share affinity

User + user's photos + user's posts

Google spanner

Timestamp - global way, T1 will always happen before T21.

Full ACID translations across multiple datacenters, across continents! External

consistency (= linearizability):

system preserves

happens-before

relationship among transactions

How?

Given write transactions A and B, if A happens before B then timestamp (A)

< timestamp (B)

Unlock document

This preview shows pages 1-3 of the document.
Unlock all 24 pages and 3 million more documents.

Already have an account? Log in

CS431 Chapter 7: Mutable Table

Get access