CS188 - ALL.pdf

27 Pages
161 Views
Unlock Document

Department
Computer Science
Course
COMPSCI 188
Professor
Abbeel
Semester
Spring

Description
LECTUREMarkov Decision Processes IIFebruary 19 2013AnnouncementsHW4 out tomorrow 220P2 due Friday 5pmMarkov Decision Processes IIBellman EquationsValue IterationBellman equations characterize the optimal valueValue iteration computes themPolicy Methodsfixed policythe tree is much simpler because now there is only one action per state expected total discounted rewards starting in s and following piHow do we calculate these1 Same way as beforevalue iteration2 We now have a system of linear equationssolve that wayPolicy ExtractionImagine we already have the optimal values We need to do a onestep expectimaxPolicy extractionpolicy implied by valuesPolicy IterationValues iteration has its problems1 really slow2 policy converges long before the values doUse policy iteration insteadStep 1 Policy evaluationStep 2 Policy improvement onestep lookaheadStep 3 RepeatDouble BanditsOffline planning we know the probabilities of everything plan course before beginning the first actionOnline planning we dont know the probabilitiesplay to find probabilitiesreinforcement learning
More Less

Related notes for COMPSCI 188

Log In


OR

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


OR

By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.


Submit