CSE 150 Lecture Notes - Lecture 16: State Space, Indep

36 views2 pages

turquoiseplatypus250

30 May 2018

School

UCSD

Department

Computer Science & Engineering

Course

CSE 150

Professor

Lawrence Saul

For unlimited access to Class Notes, a Class+ subscription is required.

Markov Decision Processes (MDPs)

Definition:

-State space S with states s ∈ S

-Action space A with actions a ∈ A

-Transition probabilities for ALL state-action pairs (s,a)

P(s’|s,a) = P(st+1 = s’|st = s,at= a) ← prob moving s → s’ after taking action a

Assumptions on probs

1. P(st+1|st,at) = P(st+1|st,at,st-1,at-1,...,s1,a1) (conditional indep.)

2. P(st+1 = s’|st = s,at= a) = P(st+1+k|st+k = s,at+k = a) for ALL integers k (time-independent function)

Reward function

R(s,s’,a) = real-valued reward AFTER taking action a



in state s



and landing in state s’

Simplifications for CSE 150

1.) state AND action spaces are discrete and finite

2.) reward function R(s,s’,a) = R(s) = Rs depends ONLY on the current state!

3.) rewards are bounded |RS| < ∞ for ALL states (and deterministic)

Example: board-game w/ dice

S = board positions AND rolls of dice

A = set of possible “moves”

P(s’|s,a) = {HOW state changes to agent’s move, opponents roll of dice, opponents move,

and agent’s roll of dice}

R(s) = { +1 win

-1 lose

0 ALL other states }

Decision-making

-policy: “deterministic” mapping/assignment of states to actions

π: S → A

# policies: |A||S| ← exponentially large in state space!

Dynamics under policy π

P(s’|s,π(s)) ← action under π in state s

Experience under policy π

State s0→ a0 = π(s0) → s1→ a1 = π(s1) → ...

Reward r0 = R(s0) r1 = R(s1)

How to measure accumulated rewards over time?

Discount factor 0 <=  < 1

Long-term discounted return = Σt=0∞ t rt

Unlock document

This preview shows half of the first page of the document.
Unlock all 2 pages and 3 million more documents.

Already have an account? Log in

Document Summary

State space s with states s s. Action space a with actions a a. P(s"|s,a) = p(s t+1 = s"|s t = s,a t = a) prob moving s s" after taking action a ( conditional indep. ) Assumptions on probs: p(s t+1 |s t ,a t ) = p(s t+1 |s t ,a t ,s t-1 ,a t-1 ,,s 1 ,a 1 , p(s t+1 = s"|s t = s,a t = a) = p(s t+1+ k |s t+ k = s,a t+ k = a) for all integers k ( time-independent function) R(s,s",a) = real-valued reward after taking action a in state s and landing in state s". 1. ) state and action spaces are discrete and finite. 2. ) reward function r(s,s",a) = r(s) = r s depends only on the current state! 3. ) rewards are bounded |r s | < for all states ( and deterministic ) S = board positions and rolls of dice.

CSE 150 Lecture Notes - Lecture 16: State Space, Indep

Document Summary

Get access

Related Documents

CSE 150 Lecture Notes - Lecture 17: Identity Matrix, Asteroid Family

CS486 Lecture Notes - Lecture 9: Markov Decision Process, Discounting

COMPSCI 171 Quiz: 04s-quiz-1-key