| Operant Conditioning |
Website Links For Conditioning |
Information AboutOperant Conditioning |
| CATEGORIES ABOUT OPERANT CONDITIONING | |
| animal cognition | |
| dog training and behavior | |
| educational technology | |
| learning | |
|
Operant conditioning, sometimes called ''instrumental conditioning'' or ''instrumental learning'', was first extensively studied by Edward L. Thorndike (1874-1949), who observed the behavior of cats trying to escape from home-made puzzle boxes. When first constrained in the boxes, the cats took a long time to escape. With experience, ineffective responses occurred less frequently and successful responses occurred more frequently, enabling the cats to escape in less time over successive trials. In his Law Of Effect , Thorndike theorized that successful responses, those producing ''satisfying'' consequences, were "stamped in" by the experience and thus occurred more frequently. Unsuccessful responses, those producing ''annoying'' consequences, were ''stamped out'' and subsequently occurred less frequently. In short, some consequences ''strengthened'' behavior and some consequences ''weakened'' behavior. B.F. Skinner (1904-1990) built upon Thorndike's ideas to construct a more detailed theory of operant conditioning based on reinforcement and punishment. REINFORCEMENT AND PUNISHMENT Reinforcement and Punishment , the core ideas of operant conditioning, are either positive (adding a stimulus to an organism's environment), or negative (removing a stimulus from an organism's environment). This creates a total of four basic consequences, with the addition of no consequence (i.e. nothing happens). It's important to note that organisms are not reinforced or punished; behavior is reinforced or punished.
''Four contexts of operant conditioning:'' Here the terms ''"positive"'' and ''"negative"'' are not used in their popular sense, but rather: ''"positive"'' refers to addition, and ''"negative"'' refers to subtraction. What is added or subtracted may be either reinforcement or punishment. Hence ''positive punishment'' is sometimes a confusing term, as it denotes the addition of punishment (such as spanking or an electric shock), a context that may seem very negative in the lay sense. The four situations are: : # Positive Reinforcement occurs when a behavior (response) is followed by an appetitive (commonly seen as pleasant) stimulus that increases that behavior. In the Skinner box experiment, a stimulus such as food or sugar solution is present when the rat presses the lever. # Negative Reinforcement occurs when a behavior (response) is followed by the removal of an aversive (commonly seen as unpleasant) stimulus thereby increasing that behavior. In the Skinner box experiment, negative reinforcement is a loud noise continuously sounding inside the rat's cage until it presses the lever, when the noise ceases. # Positive punishment occurs when a behavior (response) is followed by an aversive stimulus, such as introducing a shock or loud noise, resulting in a decrease in that behavior. # Negative punishment occurs when a behavior (response) is followed by the removal of an appetitive stimulus, such as taking away a child's toy, resulting in a decrease in that behavior. ''Also:''
Drawbacks and limitations to operant conditioning Skinner's construct of learning did not include what Nobel Prize winning biologist Konrad Lorenz termed "fixed action patterns," or reflexive, impulsive, or instinctive behaviors. These behaviors were said by Skinner and others to exist outside the parameters of operant conditioning. In dog training, the use of the prey drive, particularly in training working dogs, detection dogs, etc., the stimulation of these fixed action patterns, relative to the dog's predatory instincts, are the key to producing very difficult yet consistent behaviors, and in most cases, do not involve operant, classical, or any other kind of conditioning. The key to understanding this is that, according to the laws of operant conditioning, any behavior that is consistently rewarded, every single time, will be produced only intermittently and will not be reliable. However, in detection dogs, any correct behavior of indicating a "find," must always be rewarded with a tug toy or a ball throw. This is because the prey drive, once started, follows an inevitable sequence: the search, the eye-stalk, the chase, the grab-bite, the kill-bite. This is why dogs trained for detection work, through the prey drive, only work well if they are always reinforced, every single time they behave correctly, which breaks one of the laws of operant conditioning. Some trainers are now using the prey drive to train pet dogs and find that they get far better results in the dogs' responses to training than when they only use the principles of operant conditioning, which according to Skinner, and his disciple Keller Breland (who invented clicker training), break down when strong instincts are at play. AVOIDANCE LEARNING Avoidance training belongs to negative reinforcement schedules. Showing the instrumental response results in terminating or preventing an aversive stimulus. There are two kind of commonly used experimental settings: discriminated and free-operant avoidance learning. '' Discriminated avoidance learning '' : In discriminated avoidance learning, a novel stimulus such as a light or a tone is followed by an aversive stimulus such as a shock (CS-US, similar to classical conditioning). Whenever the animal performs the instrumental response, the CS(conditioned stimulus) respectively the US(unconditioned stimulus)is removed. During the first trials (called escape-trials) the animals usually experiences both the CS and the US, showing the instrumental response to terminate the aversive US. By the time, the animal will learn to perform the response already during the presentation of the CS thus preventing the aversive US from occurring. Such trials are called avoidance trials. '' Free-operant avoidance learning '' : In this experimental session, no discrete stimulus is used to signal the occurrence of the aversive stimulus. Rather, the aversive stimulus (mostly shocks) are presented without explicit warning stimuli. : There are two crucial time intervals determining the rate of avoidance learning. This first one is called the S-S-interval (shock-shock-interval). This is the amount of time which passes during successive presentations of the shock (unless the instrumental response is performed). The other one is called the R-S-interval (response-shock-interval) which specifies the length of the time interval following an instrumental response during which no shocks will be delivered. Note that each time the organism performs the instrumental response, the R-S-interval without shocks begins newly. TWO-PROCESS THEORY OF AVOIDANCE This theory was originally established to explain learning in discriminated avoidance learning. It assumes two processes to take place . '' a) Classical conditioning of fear '' During the first trials of the training, the organism experiences both CS and aversive US(escape-trials). The theory assumed that during those trials classical conditioning takes places by pairing the CS with the US. Because of the aversive nature of the US the CS is supposed to elicit a conditioned emotional reaction (CER) - fear. In classical conditioning, presenting a CS conditioned with an aversive US disrupts the organisms ongoing behavior. '' b) Reinforcement of the instrumental response by fear-reduction '' Because during the first process, the CS signaling the aversive US has itself become aversive by eliciting fear in the organism, reducing this unpleasant emotional reaction serves to motivate the instrumental response. The organism learns to make the response during the CS thus terminating the aversive internal reaction elicited by the CS. An important aspect of this theory is that the term "Avoidance" does not really describe what the organism is doing. It does not "avoid" the aversive US in the sense of anticipating it. Rather the organism escapes an aversive internal state, caused by the CS.
SEE ALSO
REFERENCES
EXTERNAL LINKS |
|
|