Operant and Classical Conditioning
The article below is written by Elijah Ozbat, a freshmen in high school. He is a passionate, future marine animal trainer, who is doing his research now so he can be fully prepared later. I have worked with animal trainers who don’t understand operant conditioning and classical conditioning as thoroughly as Elijah. So many people e-mail me and ask what they can do now to become a trainer. The answer: learn!
Start learning now.
No matter your age, you can always be learning.
Operant Conditioning in Full Detail
By: Elijah Ozbat
Note: This article contains advanced content. If you don’t understand certain words or concepts, that is completely fine.
Many people have a limited understanding of how marine animal training works. They think that you just teach a dolphin to jump twenty feet in the air and that’s that. In reality, marine animal training is based on a psychological technique called operant conditioning. If you are a regular reader of this site, and you have read Wear a Wetsuit to Work: How You Can Become a Marine Mammal Trainer, then you will already have a basic understanding of how operant conditioning works. But you probably have never learned in elaborate detail how it works. Until now. So for all of you knowledge junkies out there, get ready.
Operant conditioning is process of increasing or decreasing of specific behaviors through consequential actions. Basically put, you give a good reward for a good behavior and you either refrain from rewarding or you give a bad reward for bad behavior. The chart below lays out the basic concepts of operant conditioning:
Stimulus– anything that causes an effect
Positive– presence of a stimulus
Negative– removal of a stimulus
Before we can go into full detail, you must be able to distinguish between classical conditioning and operant conditioning.
First of all, conditioning in general is the training of a behavior through stimulus. This applies to all types of conditioning, but the “stimulus” part is applied in different types of ways, and this is what makes certain types of conditioning different from other types of conditioning. The main types of conditioning, as I have already mentioned, are classical conditioning and operant conditioning.
Classical conditioning is the process of pairing a conditioned stimulus with an unconditioned stimulus. An unconditioned stimulus is something that is already naturally rewarding to an individual. For example, food is naturally reinforcing to dogs. An unconditioned stimulus is something that was previously not reinforcing, but through classical conditioning, is conditioned to be reinforcing. For example, the ringing of a bell may not be naturally reinforcing to a dog, but if done right, the bell will be associated with food, causing the dog to drool whenever he hears the bell ring. Classical conditioning basically means that you take a conditioned stimulus and you pair it with an unconditioned stimulus; in this process, the individual learns to find the conditioned stimulus to be reinforcing. Here is a good example of operant conditioning. Ever wonder why your dog starts to drool when you are opening a dog food can? The answer: classical conditioning. Saliva is naturally produced by a dog when it eats to assist with digestion. Salivating is the main behavior that is being focused on here. The dog knows that every time before he gets fed, you open a can. As a result, the dog learns to associate the opening of the can with being fed. This causes it to salivate when you open the can. By pairing an unconditioned stimulus, the dog food, with a conditioned stimulus, the opening of a can, you have successfully conditioned your dog to find the opening of a can reinforcing, and this brings out a behavior: salivation. Now you will be able to get your dog to salivate whenever you open a can. Make sure it always receives food immediately after you open the can, otherwise it will get confused and the effect will eventually wear off.
In the marine animal training industry, classical conditioning is used to pair primary reinforcers with secondary reinforcers. A primary reinforcers is an item that is of instinctual monetary value to an individual (in the case of marine mammals, fish). A secondary reinforcer is an item that is not instinctually of any value to the individual, but through the use of classical conditioning, can be taught to be reinforcing (in the case of marine mammals, toys, ice, and gelatin). By pairing an unconditioned stimulus (fish) with a conditioned stimulus (a toy), a trainer can teach the marine mammal to associate receiving a toy with receiving fish, and as a result, the mammal learns to find receiving a toy reinforcing.
Operant conditioning works under similar principles, but in this process, you modify the frequency of a specific behavior through consequence, an event that happens after the behavior. In this sense, operant conditioning is distinguishable from classical conditioning. Basically, you ask for a behavior to be performed, and if it performed correctly, you give a reinforcing consequence. This gives an incentive for doing the behavior, thus the frequency at which the behavior is performed is increased. Operant conditioning can also be used to decrease undesired behavior by either giving an undesirable consequence or removing a pleasurable consequence. Note: marine animal trainers never punish animals for performing undesirable behavior, and food is never withheld from the animals.
In the marine animal training industry, operant conditioning is used to increase the frequency of behaviors that marine mammals perform through consequence. Notice how I said behaviors, and not tricks. What happens is that a trainer asks a mammal to do a behavior, and if that behavior is done correctly, the mammal will be rewarded. In doing this, the frequency of that behavior is increased. The mammal has a reason to jump 10 ft. in the air: fish, toys, or some other type of reinforcement. A dolphin is not taught to jump up high in the air; it already knows how to do that. Trainers just encourage and give an incentive for the dolphin to do that. In this manner, marine mammals are participants of performing behaviors, not victims, as many people seem to think. The mammal has a choice of doing behaviors; if it does not want to do behaviors, it won’t.
The main difference between classical conditioning and operant conditioning is that classical conditioning involves pairing conditioned stimuli with unconditioned stimuli, for the purpose of teaching an individual to find a previously neutral stimulus to be reinforcing. In contrast, operant conditioning is the specific modification of a behavior through consequence.
You may notice that conditioning is basically just a lot of cause-and-effect.
Basic Operant Conditioning Terms
There are two main classification of stimuli: core stimuli and additional stimuli. Core stimuli are involved in every type of operant conditioning and additional stimuli may or may not be involved. Note: These next few terms have different meanings than the connotative meanings people would normally associate with these terms (e.g. “positive” does not necessarily mean good, “negative” does not necessarily mean bad, and a “consequence” may be good or bad)
Positive– a consequence (pleasean or unpleaseant) is given following a behavioral response
Negative– a consequence is held back following a behavioral response
Reinforcement– a consequence used to increase the occurrence of a behavior
Punishment– a consequence used to decrease the occurrence of a behavior
Antecedent stimuli– stimulus that occurs before a behavior is performed
Extinction– Lack of any consequence following a behavior. Inconsequential behaviors will tend to occur less often, resulting in a decline of that behavior’s frequency
Different combinations of stimuli create a list of five basic consequences:
Note: These terms can be very misleading; again, the psychological meanings are different than their connotative meanings.
Positive Reinforcement: a behavior is followed by a pleasant consequence; increases desired behavior; referred to as reinforcement
Negative Reinforcement: MISLEADING; an unpleasant consequence is removed following a desired behavior; increases desired behavior; referred to as escape
Positive Punishment: MISLEADING; an unpleasing consequence is added following an undesired behavior; decreases undesired behavior; referred to as punishment
Negative Punishment: a pleasing consequence is removed following an undesired behavior; decreases undesired behavior; referred to as penalty
Extinction: a previously reinforced behavior is no longer reinforced; used when certain behaviors are no longer useful
Complex Operant Conditioning Terms
Escape – a behavior eliminates an existing unpleasant stimulus; e.g. pinching your nose when you smell something rotten
Avoidance – preparing for a potential unpleasant stimulus (e.g. wearing sunscreen to avoid getting sunburns)
Note: Avoidance behavior cannot be reinforced since the behavior does not occur at the same time that the unpleasant stimulus occurs
Noncontingent Reinforcement: delivery of pleasant stimuli regardless of behavior performed; may be used to get rid of one desired behavior by rewarding multiple alternative behaviors
Note: Use of the term “noncontingent reinforcement” is disputed as no single behavior is identifiably strengthened
Schedules of Reinforcement
These are rules that regulate the delivery of reinforcement. These may be a set time between deliveries or a set amount of behaviors to be performed.
Fixed Time Interval Schedule– Reinforcement is given after a set amount of time since the last reinforcement was delivered.
20 Seconds => Reinforcement
20 Seconds => Reinforcement
Variable Time Interval Schedule– Reinforcement is given after a changeable amount of time since the last reinforcement was given.
20 Seconds =>Reinforcement
30 Seconds => Reinforcement
Fixed Ratio Schedule– Reinforcement is given after a set amount of behaviors are done after the last reinforcement was delivered
2 Behaviors => Reinforcement
2 Behaviors => Reinforcement
Variable Ratio Schedule– Reinforcement is given after a changeable amount of behaviors are done after the last reinforcement was delivered
2 Behaviors => Reinforcement
4 Behaviors => Reinforcement
Continuous Reinforcement– Reinforcement is given after every behavior
Behavior => Reinforcement
Behavior => Reinforcement
The variable schedules are usually the most effective schedules.
Discrimination, Generalization, and Context
These are situational tactics where most behavior is controlled by circumstantial stimuli:
Discrimination– a technically correct behavior is only reinforced under specific circumstances. E.g. you are trapped in a room and there are two buttons: red and blue. You are given food for pressing the red button, but not the blue button. Either behavior is the same: pressing a button. But, you are only rewarded for pressing a specific button.
Generalization– a tendency to respond to stimuli that are similar to previously discriminatory stimuli. E.g. a crimson button appears, and you press that button because its color is similar to red.
Context– Stimuli that are always present, such as chairs, tables, walls, etc. E.g. the room’s atmosphere turns from a dark, eerie room to a fancy room with a hot tub in the middle. Since the context changed, you won’t feel as convinced to hit the red button; you’ll be soaking in the hot tub.
Variables that Affect a Consequence’s Effectiveness
Different factors can affect the effectiveness of consequential stimuli, both reinforcements and punishments:
Satiation/ Deprivation: The effectiveness of a consequence will be lower if an individual’s desire for that consequence is fulfilled. The effectiveness will be higher if that individual’s desire for a consequence needs to be fulfilled. (E.g. I have 5 tacos from Taco Bell. I eat 4 tacos, and I find them very satisfying. However, I am already dull, so oI won’t find the fifth taco very satisfying. I may even avoid eating it.)
Immediacy: The time at which a consequence is given relative to when an action was performed will have an impact of the effectiveness of that consequence. (E.g. if a child is spanked for something he/she did two weeks ago, that punishment will probably not prevent the child from doing whatever he/she did; however, if that child is immediately spanked for something they did, then that punishment will be very effective in preventing him/her for doing whatever they did again). It isn’t hard to figure out that animals would rather have consequences sooner rather than later.
If you don’t remember anything else, remember this:
Operant conditioning is the basic psychological tool used to train animals. Trainers implement these basic tactics to train the animals to perform various behaviors. By encouraging behavior through positive reinforcement, trainers can effectively increase the frequency of desired behaviors. Also, trainers do not refer to behaviors as good or bad. They may refer to them as “desired” or “undesired” or “favorable” or “unfavorable.”
Congratulations, you now have more of an understanding of operant conditioning than many people do!