I frequently run into trainers that tell me that they “use Positive Reinforcement.” They don’t. They think they do, but they don’t. They can’t. It’s actually not possible to use Positive Reinforcement or any of the other quadrants of the Operant Conditioning model defined in the behavioural research of B.F. Skinner, Keller Breland, and many others. It would be like saying that a criminal used “murder” to kill his victim.
Perhaps we should start with a brief introduction for those who aren’t familiar with the Operant Conditioning (OC) model. Operant Conditioning is a behaviour modification technique and it defines behaviour using two variables – whether the behaviour increases or decreases as a result of the training and whether something was added to the trainee’s environment or was removed. It seems to be an unfortunate accident of history that some confusing terms were used to define these two variables.
In the OC model, any behaviour that becomes more intense, more frequent, or more likely as a result of the training is said to be “Reinforced” and any behaviour that becomes less intense, less frequent, or less likely as a result of the training is said to have been “Punished.” So behaviour increases indicate Reinforcement and behaviour decreases indicate that Punishment of the behaviour has occurred. To make matters even more fuzzy, if you added something to the trainee’s environment (like food or petting), that is called “Positive” in the OC model but it really means “additive.” And if you remove something from the environment, that is called “Negative” in the OC model but it really means “subtractive.”
Confusion right from the start
It’s unfortunate that the initial researchers of Operant Conditioning chose to use words that already had common meanings slightly different from the way they used them. “Positive” usually means something good or pleasant and “negative” means kind of the opposite. “Reinforcement” is a pretty good term for describing what it does but “Punishment” is full of all kinds of meaning that doesn’t really line up with what OC means to communicate.
Here’s an example. If I say that I had trained my dog with “Positive Punishment”, someone unfamiliar with the OC model might think that I had reprimanded my dog in a cheerful or upbeat way. And they would not be surprised if the behaviour I “punished” didn’t actually decrease. That’s because the common understanding of the words “positive” and “punishment” don’t carry precise definitions.
It’s this linguistic fogginess that I believe is causing all kinds of problems in the dog world these days. The confusion goes beyond just the definition of the terms, there is also the problem of how the terms are used. The OC model can only work if you use it AFTER your dog’s behaviour has changed in some way. In other words, you can’t determine what aspect of Operant Conditioning has had an effect until after you have some change in your dog’s behaviour to examine.
Getting the horse in front of the cart
The basis of Operant learning is dictated by results. Whatever happens as a result of a dog’s behaviour will determine if they will be more or less likely to repeat the behaviour. If the result was the addition of something the dog wanted, it is likely (but not certain!) that the behaviour they just did will become more likely in the future. And that is an important distinction. We might intend to reinforce a behaviour but the results do not necessarily follow the intentions of the trainer.
Some of the confusion comes from the fact that it is very likely that the rewards we provide to try to reinforce or increase a behaviour are often successful. So it becomes a kind of short hand to say that we “use Positive Reinforcement” when we mean that we gave our dog a reward in response to a behaviour. The same is true if we remove our attention from our dog in order to decrease a behaviour; we may fall into shorthand by saying we “used Negative Punishment” to decrease the behaviour.
But here’s the rub – what if what we did in our training doesn’t actually have the effect on our dog that we intended? What if I give my dog a yummy treat every time she comes to me when I call her name but sometimes she responds to the call and other times she doesn’t. If her recall never gets better, she doesn’t come more frequently or more consistently, have I “used Positive Reinforcement” or did I just give my dog a treat? According to the strict definitions of science, if the behaviour didn’t increase, it could not be Positive Reinforcement! The same is true if I try to stop my dog from jumping up by yelling at her and pushing her down. I could say that I have used Positive Punishment (because I added the yelling and pushing to try to reduce the jumping behaviour) but if my dog continues to jump up with the same frequency, then I haven’t “punished” the jumping up in behavioural terms although it was my intention to “punish” her for jumping up! Do you see the confusion here?
The last 15 years or so has seen a tremendous push to change how we think about dogs and dog training. A large part of that effort has been the introduction of animal and behavioural sciences to the dog training and dog owning communities. In our haste to advance, we may just have caught ourselves in a trap of our own making.
Perhaps we have simplified things too far. So far, in fact, that discussions and debates are arising about the usefulness and validity of Operant Conditioning. Many times these discussions can seem like arguments over religion – the equivalent of “how many angels can dance on the head of a pin?” In all of it, one very important point can get overlooked.
The usefulness of Operant Conditioning, at least as I learned it, for dog training is that it is a remarkably effective forensic tool. That is, it is a great system for helping us analyze how behaviours come about or diminish. Once I choose to begin referring to the OC terms and quadrants as if they are my intentions, that usefulness begins to break down. The reason is simple. Just like the examples I provided above, sometimes our intentions in training do not match the results. If I make the claim that I will use Positive Reinforcement to train a behaviour and that behaviour does not increase, then I have created a paradox. I may have indeed added rewards to our training session (the additive property of Positive in OC) but the behaviour did not increase so it cannot be Reinforcement!
Trainers often use the shorthand of saying that they will employ Positive Reinforcement for a simple and logical reason. They have used certain training techniques in the past and these have produced behaviours by Positive Reinforcement. Even so, that past experience does not guarantee that future uses for different behaviours or with different dogs will produce the same results.
Am I just splitting hairs?
Well, yes and no. I have a number of dog training colleagues with whom I can have conversations where we use this kind of shorthand without confusion. We are all on the same page and all have the same basic understanding of behavioural science and animal learning. We all know what we mean.
But here is that trap that I referred to earlier. Not everyone in the dog world is on the same page regarding behavioural science and animal learning. So when someone like myself, who has been using this kind of training for 13 years, has a conversation with someone new to the science and methods, things can get confusing. Unfortunately, most if not all of the disagreements I see between trainers can be traced back to different levels of understanding of the science or the incorrect use the of terms for the concepts they are trying to discuss.
The sciences of Operant Conditioning, Classical Conditioning, Ethology, Animal Learning, and Biology are complex and all of them should inform how we live and work with our dogs to a greater or lesser degree. But we do ourselves no favors by using shortcuts and shorthand in our efforts to make our point quickly. Sometimes it just takes time and effort to explain what we mean.
A counter-production of experts
I have talked in this column before about the need for the use of scientific terms in their proper context and using their proper definition. The positive training community is growing rapidly and, like a game of “Telephone”, as these concepts and techniques get passed from one dog trainer to another, it seems that the meanings are getting jumbled up. And nothing good will come of many people using the same words to mean different things.
The most obvious example of this problem that I see is Operant Conditioning. It is a framework designed to be used to determine how and why behaviour happened. That’s past tense! So all of this talk about why you can’t use Positive Reinforcement to teach good performance in agility or that we should never use Negative Reinforcement to address behaviour issues should just STOP.
Operant Conditioning is neutral. It tells us what happened so we can adjust how we are training and be more effective. I may intend to use Positive Reinforcment to teach my dog but if I do my training in an area she finds scary, I may actually end up with less of the behaviour than when I started – and that’s Positive Punishment.
As good trainers, we should be using and talking about Operant Conditioning in the proper context. One that helps us to define what happened from the observed results of our training session. Not what we intended or wanted to happen. Misrepresenting the science doesn’t do anything good. We confuse each other, we get frustrated when we think the science doesn’t work, and the critics of this kind of training have more to criticize.
Until next time, have fun with your dogs!
The NEW Canine Nation ebook is now available –
“Relationships: Life with Dogs”
Photo credits –
Diagram attribution –
Operant Conditioning Simplified – Eric Brad
Operant Conditioning Expanded – William S. Altman from this website