CIS 140 Lecture 8: L8- reinforcement learning
Document Summary
Negative punishment- take away something good (time out) Conditioned stimulus (cs) -> unconditioned stimulus (us)-> unconditioned response (ur) Like operant conditioning, enables behavior based on prediction. Conditioned stimulus (cs) -> conditioned response (cr) Ex: pavlov"s dogs- learned to associate sound with food. Often, ur and cr are same action. Ex: immune sys, pancreas- respond based on predictions from environment. Ex: advertising- babies and tires and happiness. Involved placing neutral signal before reflex (ur: focuses on involuntary, automatic behaviors, helps predict when reflex (ur) will be useful, stimulus -> reaction. Operant: applying reinforcement or punishment after behavior, strengthens or weakens voluntary behaviors, helps predict which behaviors will be rewarded, behavior -> consequence. Mutant flies- some better with classical, some better with operant (double association) double association checking- two measures and two manipulations. Evidence that two measures are mediated by separate processes modeling classical conditioning typical learning curve. Association strength grows more slowly as number of trials increase, and reaches an asymptote.