CSE 150 Lecture Notes - Lecture 18: Royal Institute Of Technology, Discounting, Markov Decision Process

50 views2 pages

turquoiseplatypus250

6 Jun 2018

School

UCSD

Department

Computer Science & Engineering

Course

CSE 150

Professor

Lawrence Saul

For unlimited access to Class Notes, a Class+ subscription is required.

πΣƔπ∞∈∞Δ

Policy Iteration

Thm: policy iteration converges in finite # steps to an optimal policy π*

Pros/cons:

(+) converges “fairly” quickly

(-) each step requires solving linear equations

O(n3) for MDP with n states

Q: What to do if n is “prohibitively” large?

Answer 2: Compute V*(s) directly

Optimality in MDPs

Thm: there is (at least) one “optimal” policy π* for which V*(s) >= Vπ(s) for ALL policies and

states

Optimal value functions:

V*(s) def Vπ*(s)

Q*(s,a) def Qπ*(s,a)

Given π*(s), we can compute V*(s) by solving linear equations!

V*(s) = R(s) + ƔΣs’ P(s’|s,π*(s)) V*(s’)

And we can compute

Q*(s,a) = R(s) + ƔΣs’ P(s’|s,π*(s)) V*(s’)

Given V*(s), can we recover π*(s) ? YES

How? Compute Q*(s,a) using above

Then π*(s) = argmaxa Q*(s,a)

Value iteration

How to compute V*(s) “directly”?

V*(s) = maxa Q*(s,a)

= maxa [R(s) + ƔΣs’ P(s’|s,π*(s)) V*(s’)]

= R(s) + Ɣ maxa [Σs’ P(s’|s,a) V*(s’)] [Bellman optimality equations]

for s = 1 to n where n = # states in MDP

Bellman Optimality Equations:

n equations s = 1 to n

n unknowns V*(s), s = 1 to n

These equations are NONLINEAR

How to solve?

Algorithm for value iteration

(1) Initialize V0(s) = 0 //array of guessed solutions = 0

(2) Iterate for s = 1 to n:

Vk+1(s) = R(s) + Ɣ maxa [Σs’=1

n P(s’|s,a) Vk(s’)]

Unlock document

This preview shows half of the first page of the document.
Unlock all 2 pages and 3 million more documents.

Already have an account? Log in

CSE 150 Lecture Notes - Lecture 18: Royal Institute Of Technology, Discounting, Markov Decision Process

Get access

Related Documents

CS486 Lecture Notes - Lecture 9: Markov Decision Process, Discounting

ECS 188 Final: sp11_final_solutions

MATH 232 Lecture Notes - Lecture 33: Cross Product, Orthogonal Transformation, Linear Map