Operant and Classical Conditioning

Operant-Conditioning-and-Classical-Conditioning

The article below is written by Elijah Ozbat, a freshmen in high school. He is a passionate, future marine animal trainer, who is doing his research now so he can be fully prepared later. I have worked with animal trainers who don’t understand operant conditioning and classical conditioning as thoroughly as Elijah. So many people e-mail me and ask what they can do now to become a trainer. The answer: learn!

Start learning now.

Read the article below, read the rest of the articles on this website, read my book, sign-up for my newsletter, join organizations, research dolphins…do it all!

No matter your age, you can always be learning. 

 

Operant Conditioning in Full Detail

By: Elijah Ozbat

Note: This article contains advanced content. If you don’t understand certain words or concepts, that is completely fine.

Many people have a limited understanding of how marine animal training works. They think that you just teach a dolphin to jump twenty feet in the air and that’s that. In reality, marine animal training is based on a psychological technique called operant conditioning. If you are a regular reader of this site, and you have read Wear a Wetsuit to Work: How You Can Become a Marine Mammal Trainer, then you will already have a basic understanding of how operant conditioning works. But you probably have never learned in elaborate detail how it works. Until now. So for all of you knowledge junkies out there, get ready.

Operant conditioning is process of increasing or decreasing of specific behaviors through consequential actions. Basically put, you give a good reward for a good behavior and you either refrain from rewarding or you give a bad reward for bad behavior. The chart below lays out the basic concepts of operant conditioning:

Note:

Stimulus– anything that causes an effect

Positive– presence of a stimulus

Negative– removal of a stimulus

 

Operant Conditioning and Classical Conditioning

Before we can go into full detail, you must be able to distinguish between classical conditioning and operant conditioning.

Conditioning

First of all, conditioning in general is the training of a behavior through stimulus. This applies to all types of conditioning, but the “stimulus” part is applied in different types of ways, and this is what makes certain types of conditioning different from other types of conditioning. The main types of conditioning, as I have already mentioned, are classical conditioning and operant conditioning.

Classical Conditioning

Classical conditioning is the process of pairing a conditioned stimulus with an unconditioned stimulus. An unconditioned stimulus is something that is already naturally rewarding to an individual. For example, food is naturally reinforcing to dogs. An unconditioned stimulus is something that was previously not reinforcing, but through classical conditioning, is conditioned to be reinforcing. For example, the ringing of a bell may not be naturally reinforcing to a dog, but if done right, the bell will be associated with food, causing the dog to drool whenever he hears the bell ring. Classical conditioning basically means that you take a conditioned stimulus and you pair it with an unconditioned stimulus; in this process, the individual learns to find the conditioned stimulus to be reinforcing. Here is a good example of operant conditioning. Ever wonder why your dog starts to drool when you are opening a dog food can? The answer: classical conditioning. Saliva is naturally produced by a dog when it eats to assist with digestion. Salivating is the main behavior that is being focused on here. The dog knows that every time before he gets fed, you open a can. As a result, the dog learns to associate the opening of the can with being fed. This causes it to salivate when you open the can. By pairing an unconditioned stimulus, the dog food, with a conditioned stimulus, the opening of a can, you have successfully conditioned your dog to find the opening of a can reinforcing, and this brings out a behavior: salivation. Now you will be able to get your dog to salivate whenever you open a can. Make sure it always receives food immediately after you open the can, otherwise it will get confused and the effect will eventually wear off.

In the marine animal training industry, classical conditioning is used to pair primary reinforcers with secondary reinforcers. A primary reinforcers is an item that is of instinctual monetary value to an individual (in the case of marine mammals, fish). A secondary reinforcer is an item that is not instinctually of any value to the individual, but through the use of classical conditioning, can be taught to be reinforcing (in the case of marine mammals, toys, ice, and gelatin). By pairing an unconditioned stimulus (fish) with a conditioned stimulus (a toy), a trainer can teach the marine mammal to associate receiving a toy with receiving fish, and as a result, the mammal learns to find receiving a toy reinforcing.

Operant Conditioning

Operant conditioning works under similar principles, but in this process, you modify the frequency of a specific behavior through consequence, an event that happens after the behavior. In this sense, operant conditioning is distinguishable from classical conditioning. Basically, you ask for a behavior to be performed, and if it performed correctly, you give a reinforcing consequence. This gives an incentive for doing the behavior, thus the frequency at which the behavior is performed is increased. Operant conditioning can also be used to decrease undesired behavior by either giving an undesirable consequence or removing a pleasurable consequence. Note: marine animal trainers never punish animals for performing undesirable behavior, and food is never withheld from the animals.

In the marine animal training industry, operant conditioning is used to increase the frequency of behaviors that marine mammals perform through consequence. Notice how I said behaviors, and not tricks. What happens is that a trainer asks a mammal to do a behavior, and if that behavior is done correctly, the mammal will be rewarded. In doing this, the frequency of that behavior is increased. The mammal has a reason to jump 10 ft. in the air: fish, toys, or some other type of reinforcement. A dolphin is not taught to jump up high in the air; it already knows how to do that. Trainers just encourage and give an incentive for the dolphin to do that. In this manner, marine mammals are participants of performing behaviors, not victims, as many people seem to think. The mammal has a choice of doing behaviors; if it does not want to do behaviors, it won’t.

The Difference?

The main difference between classical conditioning and operant conditioning is that classical conditioning involves pairing conditioned stimuli with unconditioned stimuli, for the purpose of teaching an individual to find a previously neutral stimulus to be reinforcing. In contrast, operant conditioning is the specific modification of a behavior through consequence.

You may notice that conditioning is basically just a lot of cause-and-effect.

Basic Operant Conditioning Terms

There are two main classification of stimuli: core stimuli and additional stimuli. Core stimuli are involved in every type of operant conditioning and additional stimuli may or may not be involved. Note: These next few terms have different meanings than the connotative meanings people would normally associate with these terms (e.g. “positive” does not necessarily mean good, “negative” does not necessarily mean bad, and a “consequence” may be good or bad)

Core Stimuli

Positive– a consequence (pleasean or unpleaseant) is given following a behavioral response

Negative– a consequence is held back following a behavioral response

Reinforcement– a consequence used to increase the occurrence of a behavior

Punishment– a consequence used to decrease the occurrence of a behavior

Additional Stimuli

Antecedent stimuli– stimulus that occurs before a behavior is performed

Extinction– Lack of any consequence following a behavior. Inconsequential behaviors will tend to occur less often, resulting in a decline of that behavior’s frequency

Different combinations of stimuli create a list of five basic consequences:

Note: These terms can be very misleading; again, the psychological meanings are different than their connotative meanings.

Positive Reinforcement: a behavior is followed by a pleasant consequence; increases desired behavior; referred to as reinforcement

Negative Reinforcement: MISLEADING; an unpleasant consequence is removed following a desired behavior; increases desired behavior; referred to as escape

Positive Punishment: MISLEADING; an unpleasing consequence is added following an undesired behavior; decreases undesired behavior; referred to as punishment

Negative Punishment: a pleasing consequence is removed following an undesired behavior; decreases undesired behavior; referred to as penalty

Extinction: a previously reinforced behavior is no longer reinforced; used when certain behaviors are no longer useful

Complex Operant Conditioning Terms

Escape – a behavior eliminates an existing unpleasant stimulus; e.g. pinching your nose when you smell something rotten

Avoidance – preparing for a potential unpleasant stimulus (e.g. wearing sunscreen to avoid getting sunburns)

Note: Avoidance behavior cannot be reinforced since the behavior does not occur at the same time that the unpleasant stimulus occurs

Noncontingent Reinforcement: delivery of pleasant stimuli regardless of behavior performed; may be used to get rid of one desired behavior by rewarding multiple alternative behaviors

Note: Use of the term “noncontingent reinforcement” is disputed as no single behavior is identifiably strengthened

Schedules of Reinforcement

These are rules that regulate the delivery of reinforcement. These may be a set time between deliveries or a set amount of behaviors to be performed.

Fixed Time Interval Schedule– Reinforcement is given after a set amount of time since the last reinforcement was delivered.
20 Seconds =>  Reinforcement
20 Seconds =>  Reinforcement
Variable Time Interval Schedule– Reinforcement is given after a changeable amount of time since the last reinforcement was given.
20 Seconds =>Reinforcement
30 Seconds => Reinforcement
Fixed Ratio Schedule– Reinforcement is given after a set amount of behaviors are done after the last reinforcement was delivered
2 Behaviors => Reinforcement
2 Behaviors => Reinforcement
Variable Ratio Schedule– Reinforcement is given after a changeable amount of behaviors are done after the last reinforcement was delivered
2 Behaviors => Reinforcement
4 Behaviors => Reinforcement
Continuous Reinforcement– Reinforcement is given after every behavior
Behavior => Reinforcement
Behavior => Reinforcement

The variable schedules are usually the most effective schedules.

Discrimination, Generalization, and Context

These are situational tactics where most behavior is controlled by circumstantial stimuli:

Discrimination– a technically correct behavior is only reinforced under specific circumstances. E.g. you are trapped in a room and there are two buttons: red and blue. You are given food for pressing the red button, but not the blue button. Either behavior is the same: pressing a button. But, you are only rewarded for pressing a specific button.

Generalization– a tendency to respond to stimuli that are similar to previously discriminatory stimuli. E.g. a crimson button appears, and you press that button because its color is similar to red.

Context– Stimuli that are always present, such as chairs, tables, walls, etc. E.g. the room’s atmosphere turns from a dark, eerie room to a fancy room with a hot tub in the middle. Since the context changed, you won’t feel as convinced to hit the red button; you’ll be soaking in the hot tub.

Variables that Affect a Consequence’s Effectiveness

Different factors can affect the effectiveness of consequential stimuli, both reinforcements and punishments:

Satiation/ Deprivation: The effectiveness of a consequence will be lower if an individual’s desire for that consequence is fulfilled. The effectiveness will be higher if that individual’s desire for a consequence needs to be fulfilled. (E.g. I have 5 tacos from Taco Bell. I eat 4 tacos, and I find them very satisfying. However, I am already dull, so oI won’t find the fifth taco very satisfying. I may even avoid eating it.)

Immediacy: The time at which a consequence is given relative to when an action was performed will have an impact of the effectiveness of that consequence. (E.g. if a child is spanked for something he/she did two weeks ago, that punishment will probably not prevent the child from doing whatever he/she did; however, if that child is immediately spanked for something they did, then that punishment will be very effective in preventing him/her for doing whatever they did again). It isn’t hard to figure out that animals would rather have consequences sooner rather than later.

If you don’t remember anything else, remember this:
Operant conditioning is the basic psychological tool used to train animals. Trainers implement these basic tactics to train the animals to perform various behaviors. By encouraging behavior through positive reinforcement, trainers can effectively increase the frequency of desired behaviors. Also, trainers do not refer to behaviors as good or bad. They may refer to them as “desired” or “undesired” or “favorable” or “unfavorable.”

Congratulations, you now have more of an understanding of operant conditioning than many people do!

3 comments

  • Pingback: Meet Future Trainer, Elijah Ozbat! - Marine Mammal Trainer

  • Wow, thanks for the write-up, Elijah! And thanks for sharing it with us, Kyle. Now I won’t have to read BF Skinner’s book (kidding)! This is quite easy to understand because of the writing style. Thanks again!

  • Overall good job Elijah! However I would like to converse about somethings you said.
    First you state “stimulus- anything that causes an effect” and that is not necessarily true. A stimulus is any detectable change in the environment. Think of it like the wind. The wind is a stimulus for you and I and any dolphin at station. It doesn’t cause an effect but it is a change in the environment that can be detected and is a stimulus. The reason this is important is because the other stimuli in the environment need to be accounted for when looking at behavior, not just the ones the trainers are providing.
    Second, “an unconditioned stimulus is something that is already naturally rewarding to an individual.” Does it have to be rewarding? No, it does not. You can create a delta (or a conditioned punisher which is often used in dog training) by pairing an unconditioned punisher with a conditioned stimulus. In marine mammal training usually a reinforcing stimulus is paired with an unconditioned stimulus to create a bridge, as well as pairing a behavior with an sD, but not usually a delta being created. Some facilities, however do have deltas because they can then pinpoint (similarly to a bridge) the exact moment on when the animals fails.
    Third, this is more advice than anything, don’t lump the whole field together by saying that it does not punish. Have you been to the facilities in Dubai or Singapore or even Dolphins Quest in Hawaii or Burmuda? Have you seen the way they train? They could use nets there as a punisher. Since you don’t know that, it is a very broad statement that you cannot confirm as true for the whole industry and probably should be avoided.
    Fourth, be careful using terms such as “unpleasant” or “pleasing” when defining stimuli. I find freshly pickled pickles on my burger reinforcing, but do you? I don’t know that so I cannot say that it is pleasant or unpleasant. Just like we don’t know with the animals, trying to understand feeling and saying “pleasant” or “unpleasant” is a dangerous game. Use reinforcing or aversive instead.
    Fifth avoidance is defined as by eliciting a certain behavior an adversive stimulus is postponed or delayed. Think of it as you ask a dolphin for a fluke present for a blood draw, and the dolphin sees the vet there and decides to break station and swim around the pool. They are then avoiding the aversive stimulus of the needle stick. As a trainer, you can reapproximate, wait for the animal to come back to station and ask again, timeout, wait for the animal to come back to station and move on while having the vet reinforce so that the reinforcement is paired with the vet thus making the vet more reinforcing as opposed to aversive. Also your note following avoidance kind of confuses me. Avoidance is naturally reinforcing because an adversive is delayed. Avoidance also can be reinforced. Since avoidance is a behavior, when the animal does return to station correctly, if that animal is reinforced, then you just reinforced breaking and returning to station. Is that a behavior you wish to increase in frequency?
    In your schedules of reinforcement, when talking about interval schedules, you also should include that the reinforcement is delivered after the completion of the first correct behavior following the interval. For example, a fixed interval schedule at 25 seconds, behaviors are asked throughout the 25 seconds, but if a behavior is completed at 24 seconds correctly, the following correct behavior is reinforced. Same situation but the last behavior is a failure, a trainer should LRS the last behavior then repeat or ask a different behavior to get a success then reinforce.

    You have a much higher understand of training methods than I did at you age and I’m in the field. Brilliant work. Keep working, but be careful on wording and broad statements. Be as specific as you can without excluding circumstances. Also, don’t forget your ABC’s of training: A- antecedent, B- behavior, C- consequence.

Leave a Reply

Your email address will not be published. Required fields are marked *