Class Notes (839,626)
Canada (511,431)
Psychology (3,528)
PSY260H1 (59)

PSY260H1F (Summer) Lecture 4

6 Pages

Course Code
Daniela Bellicoso

This preview shows pages 1 and half of page 2. Sign up to view the full 6 pages of the document.
PSY260H1F L4; May 23, 13  Operant Conditioning: Ch. 7 (5 in 2 )nd Free-Operant Learning Instrumental Conditioning  Thorndike’s learning procedures involved discrete trials  Operant Conditioning: process whereby organisms learn to o Discrete Trials: operant conditioning paradigm make responses in order to obtain or avoid important conseqs where the experimenter defines the beginning and end o A form of associative learning, like classical of each trial conditioning  B.F. Skinner believed he could refine Thorndike’s techniques, o Dif than classical conditioning, since organism and devised the Skinner box to do this interacts w envt o Gives animal more control  Classical response is reflexive  Classical outcome can’t be stopped – in Skinner Box operant it can be avoided  Skinner Box: conditioning chamber where o aka instrumental conditioning reinforcement/punishment is automatically delivered when an  When determining whether a paradigm is classical or operant, animal makes a response (ex: lever pressing) focus on the conseq o aka operant chamber o Classical: conseq occurs regardless of response  Skinner’s paradigm = Free Operant Paradigm: operant given conditioning paradigm where the animal can operate the o Operant: response given affects the conseq apparatus “freely”, responding to obtain reinforcement/avoid  Similarities also exist btwn the 2 paradigms punishment, whenever it chooses o If no pairing btwn response & conseq  o Commonly called operant conditioning extinction  Operant conditioning is based on avoiding or obtaining a specific outcome o It requires an organism operate in its envt to determine an outcome Bhvr’al Processes  Thorndike: first to study bhvr’al outputs due to operant conditioning o Puzzle boxes – put cat inside box w latches, has to learn how to get out w trial & error; gradually figures it out  Initially escapes by accident   gets reward  Discrete trial paradigm – completely controlled by expt’er  The findings of the puzzle box work suggested that organisms: o More likely repeat actions they have experienced as producing satisfying conseqs o Less likely repeat actions they have experienced as producing undesirable consequences o = law of effect  Once you stop presenting the reward (lever pressing bhvr stops yielding food)  animal starts to get tired (can have quick drop in graph)  Can recondition the bhvr again o Animal is likely to check the lever again later o If food comes out  reconditioned Free Operant Learning  Adding a stimulus to free operant expts can be make them more elaborate  S (Light ON)  R (Lever Press)  O (Food Release)  W passage of more trials  quicker escape time  S (Light OFF)  R (Lever Press)  O (NO Food Release)  (-)ve accelerating  Providing conseqs to ↑ probability of a bhvr occurring again in future  reinforcement Law of Effect  Providing conseq to ↓ probability of a bhvr occurring again in  Law of Effect: probability that a particular bhvr’al response future  punishment increases or decreases depending on the conseqs that have  Thorndike & Skinner believed Reinforcement is more followed that response in the past effective than Punishment in learning  Stimulus S  Response R  Outcome O o However if Punishment administered properly,  Reinforcer: particular conseq for an associated bhvr that can be just as effective increases the likelihood of the bhvr being repeated in the future  Dif than punishers & reinforcers – conseqs/outcomes (O)  Primary Reinforcers: stimuli such as food, water, sex, and  Punishment & Reinforcement – the operant conditioning style sleep that are innately reinforcing, meaning that organisms are naturally driven to obtain these things & will tend to Components of Learned Association repeat the bhvrs that increase their access to them  According to Thorndike and Skinner, operant conditioning o Considered necessary for life consists of 3 components:  Secondary Reinforcers: stimuli that have no natural or o A stimulus (or set of stimuli) intrinsic value but that have been paired w primary o A response (or set of responses) reinforcers or provide access to primary reinforcers o An outcome o Most common: Money o Ex. Getting a good rating on their job  more likely to  Operant conditioning can be considered as a 3-way association btwn S, R, and O keep job & earn money so that you can provide for yourself Components of Learned Association: Stimuli o Ex. Maintaining appearance  appear more attractive  Discriminative Stimuli: in operant conditioning, stimuli that to other ppl  sexual interaction signal whether a particular response will lead to a particular o Ppl all driven to dif reinforcers outcome  Reinforcers serve to increase the likelihood of a bhvr o Ex. what happens if you start to run before or after a o Can also increase likelihood of (-)ve bhvrs whistle (stimulus) – get disqualified, or get to run  Sometimes a particular set of stimuli, responses, and Components of Learned Association: Punishers outcomes might become so strongly associated that they  Punishers: conseq of bhvr that leads to decreased likelihood of become inflexible that bhvr occurring again in the future o When this happens, we can produce a habit slip, o Outcome particularly when we may not be thinking very clearly Effectiveness of Punishment o Ex. waking up early to alarm – closesly associated  4 key factors determine punishment effectiveness: alarm (S) w waking up (R) & getting to work early (O)1. Discriminative stimuli for punishment can encourage  get so dependent on alarm, go to work early on a cheating: Saturday o Discriminative stimuli can signal if a response will be o Ex. if someone moves to condo, might mistake one for punished, causing someone to alter their bhvr to another  park in “your” spot  put key in “your” avoid punishment only when they believe there will room be a conseq o Ex. Drive to your old house out of habit o Ex. If someone knows they’re being watched  act o Associate things that look similar  make same better, don’t actually learn right vs wrong – their bhvr response even if not appropriate doesn’t affect their learning o Ex. Why there are helicopter patrols watching cars; Components of Learned Association: Responses Nanny-cams – don’t know they’re being watched  Responses: Bhvr given in response to a stimulus in order for a  Once reprimanded – learn to act better since know particular outcome to come about they can get caught; keeps them on their toes  Shaping: operant conditioning technique in which successive approximations to a desired response are reinforced 2. Concurrent reinforcement can undermine punishment: o Ex. Potty-training – tell your mom you have to pee  o Effectiveness of punishment can be counteracted if next time get to washroom  next time get to toilet reinforcement occurs along with punishment  Might stop praising once they learn the ful o Ex, if Speeding makes a person feel good bhvr  A ticket might not be enough to deter them o Involved, long process  Suggest to them to get that reinforcement in a o Vs saying that everything was wrong, say that they’re dif way, ex. going to a race track instead of on the right track speeding for the thrill  Chaining: operant conditioning technique where organisms are 3. Punishment leads to more variable bhvr: gradually trained to execute complicated sequences of discrete responses o The law of effect suggests punishment will lead to o Ex. training animals reduction in a future response, but does not specify what alternate response will occur when an o Teach separate components of a bhvr until get to the final bhvr organism explores other possible responses, and as o Used for more complicated sequences such punishment is not a good way to shape or train  But only for 2-3 step particular desired bhvrs o Alternate: Backwards chaining: used for >3 steps; o Ex. if tell kid to stop smoking  might stop smoking, but taught backwards also do other drugs instead  For even longer complex bhvrs  So important to introduce them to something else, ex. art o Creates a memory of a chain of events o If instead the goal is to shape a desired bhvr, Components of Learned Association: Reinforcers reinforcement is a faster way to produce learning than simply punishing the alternate undesired response, as it reduces the likelihood of an organism exploring o Explain to child immediately what they did wrong, but undesirable alternate bhvrs punish later at home  still not as effective o Ex. female chimps punished for mating w less desirable members – better results if reward them for mating w  Self-Control: an organism’s willingness to forgo a small more desirable members immediate reinforcement in favor of a large future  That way they gain something from the situation reinforcement o Trade-off can be seen in humans & animals alike 4. Initial intensity of punishment determines effectiveness: o Age impacts ability to wait for delayed reinforcement o Punishment is most effective if a strong punisher is  Study by Green, Fry & Myerson used from the outset – if prior weak punishers are  Take $500 now or $1000 in a yr – older ppl initially given instead, they undermine the chose the latter effectiveness of the severe punisher when it finally  Self-control & regulated by frontal lobes, PFC – less comes later on mature in younger ppl, role played by experience o If don’t punish severely enough at the beginning  can  “Pre-commitments” help improve ability to wait for a get attached to the bhvr & repeat it reward – don’t want to let ppl down o Ex. If give a warning & no ticket for speeding  keep o Make it harder to go back on commitments needed speeding & getting larger tickets  by the time they get a for long term achievements large fine, they don’t care because they liked the thrill o Ppl can still obtain the immediate reward w pre- commitments, it’s just more difficult to do Putting it all Together: Building the S-R-O Association  Ppl would need to break their pre-commitment –  Reinforcement Schedules: Rules determining when outcomes could require a sacrifice or punishment are delivered in an expt  The difficulty associated w breaking a pre- commitment helps ppl stick to their commitment or Timing Affects Learning promise  Operant conditioning is faster if the response-outcome o Ex. 2 ppl working together to save up for a house – 1 (R-O) interval is short person can’t make big purchases or they’ll let the other o Typically, immediate outcome conseqs produce best person down; set up a savings account learning o Whether or not conseq if reinforcement or Conseqs/Outcomes Can Be Added or Subtracted punishment  (+)ve Reinforcement: type of operant conditioning in which  Schlinger & Blakely (1994) the response causes a reinforcer to be “added” to the envt; over o 3 grps of rats time, the response becomes more frequent o Immediate reward delivery following lever press o S (toilet present)  R (empties bladder)  O (praise) = quicker association formation than delayed reward presentation  (+)ve Punishment: type of operant conditioning in which the o 4s delay – still considered a slow learning curve response causes a punisher to be “added” the envt; overtime, the response becomes less frequent
More Less
Unlock Document

Only pages 1 and half of page 2 are available for preview. Some parts have been intentionally blurred.

Unlock Document
You're Reading a Preview

Unlock to view full version

Unlock Document

Log In


Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.