CSE 150 Lecture Notes - Lecture 16: State Space, Indep

36 views2 pages
Markov Decision Processes (MDPs)
Definition:
-State space S with states s S
-Action space A with actions a A
-Transition probabilities for ALL state-action pairs (s,a)
P(s’|s,a) = P(st+1 = s’|st = s,at= a) ← prob moving s → s’ after taking action a
Assumptions on probs
1. P(st+1|st,at) = P(st+1|st,at,st-1,at-1,...,s1,a1) (conditional indep.)
2. P(st+1 = s’|st = s,at= a) = P(st+1+k|st+k = s,at+k = a) for ALL integers k (time-independent function)
Reward function
R(s,s’,a) = real-valued reward AFTER taking action a
in state s
and landing in state s’
Simplifications for CSE 150
1.) state AND action spaces are discrete and finite
2.) reward function R(s,s’,a) = R(s) = Rs depends ONLY on the current state!
3.) rewards are bounded |RS| < for ALL states (and deterministic)
Example: board-game w/ dice
S = board positions AND rolls of dice
A = set of possible “moves”
P(s’|s,a) = {HOW state changes to agent’s move, opponents roll of dice, opponents move,
and agent’s roll of dice}
R(s) = { +1 win
-1 lose
0 ALL other states }
Decision-making
-policy: “deterministic” mapping/assignment of states to actions
π: S → A
# policies: |A||S|exponentially large in state space!
Dynamics under policy π
P(s’|s,π(s)) ← action under π in state s
Experience under policy π
State s0 a0 = π(s0) → s1→ a1 = π(s1) → ...
Reward r0 = R(s0) r1 = R(s1)
How to measure accumulated rewards over time?
Discount factor 0 <=  < 1
Long-term discounted return = Σt=0∞ t rt
Unlock document

This preview shows half of the first page of the document.
Unlock all 2 pages and 3 million more documents.

Already have an account? Log in

Document Summary

State space s with states s s. Action space a with actions a a. P(s"|s,a) = p(s t+1 = s"|s t = s,a t = a) prob moving s s" after taking action a ( conditional indep. ) Assumptions on probs: p(s t+1 |s t ,a t ) = p(s t+1 |s t ,a t ,s t-1 ,a t-1 ,,s 1 ,a 1 , p(s t+1 = s"|s t = s,a t = a) = p(s t+1+ k |s t+ k = s,a t+ k = a) for all integers k ( time-independent function) R(s,s",a) = real-valued reward after taking action a in state s and landing in state s". 1. ) state and action spaces are discrete and finite. 2. ) reward function r(s,s",a) = r(s) = r s depends only on the current state! 3. ) rewards are bounded |r s | < for all states ( and deterministic ) S = board positions and rolls of dice.

Get access

Grade+20% off
$8 USD/m$10 USD/m
Billed $96 USD annually
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
40 Verified Answers
Class+
$8 USD/m
Billed $96 USD annually
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
30 Verified Answers

Related Documents