PSY260H1F (Summer) Lecture 4

PSY260H1F L4; May 23, 13  Operant Conditioning: Ch. 7 (5 in 2 )nd Free-Operant Learning Instrumental Conditioning  Thorndike’s learning procedures involved discrete trials  Operant Conditioning: process whereby organisms learn to o Discrete Trials: operant conditioning paradigm make responses in order to obtain or avoid important conseqs where the experimenter defines the beginning and end o A form of associative learning, like classical of each trial conditioning  B.F. Skinner believed he could refine Thorndike’s techniques, o Dif than classical conditioning, since organism and devised the Skinner box to do this interacts w envt o Gives animal more control  Classical response is reflexive  Classical outcome can’t be stopped – in Skinner Box operant it can be avoided  Skinner Box: conditioning chamber where o aka instrumental conditioning reinforcement/punishment is automatically delivered when an  When determining whether a paradigm is classical or operant, animal makes a response (ex: lever pressing) focus on the conseq o aka operant chamber o Classical: conseq occurs regardless of response  Skinner’s paradigm = Free Operant Paradigm: operant given conditioning paradigm where the animal can operate the o Operant: response given affects the conseq apparatus “freely”, responding to obtain reinforcement/avoid  Similarities also exist btwn the 2 paradigms punishment, whenever it chooses o If no pairing btwn response & conseq  o Commonly called operant conditioning extinction  Operant conditioning is based on avoiding or obtaining a specific outcome o It requires an organism operate in its envt to determine an outcome Bhvr’al Processes  Thorndike: first to study bhvr’al outputs due to operant conditioning o Puzzle boxes – put cat inside box w latches, has to learn how to get out w trial & error; gradually figures it out  Initially escapes by accident   gets reward  Discrete trial paradigm – completely controlled by expt’er  The findings of the puzzle box work suggested that organisms: o More likely repeat actions they have experienced as producing satisfying conseqs o Less likely repeat actions they have experienced as producing undesirable consequences o = law of effect  Once you stop presenting the reward (lever pressing bhvr stops yielding food)  animal starts to get tired (can have quick drop in graph)  Can recondition the bhvr again o Animal is likely to check the lever again later o If food comes out  reconditioned Free Operant Learning  Adding a stimulus to free operant expts can be make them more elaborate  S (Light ON)  R (Lever Press)  O (Food Release)  W passage of more trials  quicker escape time  S (Light OFF)  R (Lever Press)  O (NO Food Release)  (-)ve accelerating  Providing conseqs to ↑ probability of a bhvr occurring again in future  reinforcement Law of Effect  Providing conseq to ↓ probability of a bhvr occurring again in  Law of Effect: probability that a particular bhvr’al response future  punishment increases or decreases depending on the conseqs that have  Thorndike & Skinner believed Reinforcement is more followed that response in the past effective than Punishment in learning  Stimulus S  Response R  Outcome O o However if Punishment administered properly,  Reinforcer: particular conseq for an associated bhvr that can be just as effective increases the likelihood of the bhvr being repeated in the future  Dif than punishers & reinforcers – conseqs/outcomes (O)  Primary Reinforcers: stimuli such as food, water, sex, and  Punishment & Reinforcement – the operant conditioning style sleep that are innately reinforcing, meaning that organisms are naturally driven to obtain these things & will tend to Components of Learned Association repeat the bhvrs that increase their access to them  According to Thorndike and Skinner, operant conditioning o Considered necessary for life consists of 3 components:  Secondary Reinforcers: stimuli that have no natural or o A stimulus (or set of stimuli) intrinsic value but that have been paired w primary o A response (or set of responses) reinforcers or provide access to primary reinforcers o An outcome o Most common: Money o Ex. Getting a good rating on their job  more likely to  Operant conditioning can be considered as a 3-way association btwn S, R, and O keep job & earn money so that you can provide for yourself Components of Learned Association: Stimuli o Ex. Maintaining appearance  appear more attractive  Discriminative Stimuli: in operant conditioning, stimuli that to other ppl  sexual interaction signal whether a particular response will lead to a particular o Ppl all driven to dif reinforcers outcome  Reinforcers serve to increase the likelihood of a bhvr o Ex. what happens if you start to run before or after a o Can also increase likelihood of (-)ve bhvrs whistle (stimulus) – get disqualified, or get to run  Sometimes a particular set of stimuli, responses, and Components of Learned Association: Punishers outcomes might become so strongly associated that they  Punishers: conseq of bhvr that leads to decreased likelihood of become inflexible that bhvr occurring again in the future o When this happens, we can produce a habit slip, o Outcome particularly when we may not be thinking very clearly Effectiveness of Punishment o Ex. waking up early to alarm – closesly associated  4 key factors determine punishment effectiveness: alarm (S) w waking up (R) & getting to work early (O)1. Discriminative stimuli for punishment can encourage  get so dependent on alarm, go to work early on a cheating: Saturday o Discriminative stimuli can signal if a response will be o Ex. if someone moves to condo, might mistake one for punished, causing someone to alter their bhvr to another  park in “your” spot  put key in “your” avoid punishment only when they believe there will room be a conseq o Ex. Drive to your old house out of habit o Ex. If someone knows they’re being watched  act o Associate things that look similar  make same better, don’t actually learn right vs wrong – their bhvr response even if not appropriate doesn’t affect their learning o Ex. Why there are helicopter patrols watching cars; Components of Learned Association: Responses Nanny-cams – don’t know they’re being watched  Responses: Bhvr given in response to a stimulus in order for a  Once reprimanded – learn to act better since know particular outcome to come about they can get caught; keeps them on their toes  Shaping: operant conditioning technique in which successive approximations to a desired response are reinforced 2. Concurrent reinforcement can undermine punishment: o Ex. Potty-training – tell your mom you have to pee  o Effectiveness of punishment can be counteracted if next time get to washroom  next time get to toilet reinforcement occurs along with punishment  Might stop praising once they learn the ful o Ex, if Speeding makes a person feel good bhvr  A ticket might not be enough to deter them o Involved, long process  Suggest to them to get that reinforcement in a o Vs saying that everything was wrong, say that they’re dif way, ex. going to a race track instead of on the right track speeding for the thrill  Chaining: operant conditioning technique where organisms are 3. Punishment leads to more variable bhvr: gradually trained to execute complicated sequences of discrete responses o The law of effect suggests punishment will lead to o Ex. training animals reduction in a future response, but does not specify what alternate response will occur when an o Teach separate components of a bhvr until get to the final bhvr organism explores other possible responses, and as o Used for more complicated sequences such punishment is not a good way to shape or train  But only for 2-3 step particular desired bhvrs o Alternate: Backwards chaining: used for >3 steps; o Ex. if tell kid to stop smoking  might stop smoking, but taught backwards also do other drugs instead  For even longer complex bhvrs  So important to introduce them to something else, ex. art o Creates a memory of a chain of events o If instead the goal is to shape a desired bhvr, Components of Learned Association: Reinforcers reinforcement is a faster way to produce learning than simply punishing the alternate undesired response, as it reduces the likelihood of an organism exploring o Explain to child immediately what they did wrong, but undesirable alternate bhvrs punish later at home  still not as effective o Ex. female chimps punished for mating w less desirable members – better results if reward them for mating w  Self-Control: an organism’s willingness to forgo a small more desirable members immediate reinforcement in favor of a large future  That way they gain something from the situation reinforcement o Trade-off can be seen in humans & animals alike 4. Initial intensity of punishment determines effectiveness: o Age impacts ability to wait for delayed reinforcement o Punishment is most effective if a strong punisher is  Study by Green, Fry & Myerson used from the outset – if prior weak punishers are  Take $500 now or $1000 in a yr – older ppl initially given instead, they undermine the chose the latter effectiveness of the severe punisher when it finally  Self-control & regulated by frontal lobes, PFC – less comes later on mature in younger ppl, role played by experience o If don’t punish severely enough at the beginning  can  “Pre-commitments” help improve ability to wait for a get attached to the bhvr & repeat it reward – don’t want to let ppl down o Ex. If give a warning & no ticket for speeding  keep o Make it harder to go back on commitments needed speeding & getting larger tickets  by the time they get a for long term achievements large fine, they don’t care because they liked the thrill o Ppl can still obtain the immediate reward w pre- commitments, it’s just more difficult to do Putting it all Together: Building the S-R-O Association  Ppl would need to break their pre-commitment –  Reinforcement Schedules: Rules determining when outcomes could require a sacrifice or punishment are delivered in an expt  The difficulty associated w breaking a pre- commitment helps ppl stick to their commitment or Timing Affects Learning promise  Operant conditioning is faster if the response-outcome o Ex. 2 ppl working together to save up for a house – 1 (R-O) interval is short person can’t make big purchases or they’ll let the other o Typically, immediate outcome conseqs produce best person down; set up a savings account learning o Whether or not conseq if reinforcement or Conseqs/Outcomes Can Be Added or Subtracted punishment  (+)ve Reinforcement: type of operant conditioning in which  Schlinger & Blakely (1994) the response causes a reinforcer to be “added” to the envt; over o 3 grps of rats time, the response becomes more frequent o Immediate reward delivery following lever press o S (toilet present)  R (empties bladder)  O (praise) = quicker association formation than delayed reward presentation  (+)ve Punishment: type of operant conditioning in which the o 4s delay – still considered a slow learning curve response causes a punisher to be “added” the envt; overtime, the response becomes less frequent
