B. F. Skinner’s evolving views of punishment: II. 1940-1960

Bruna Colombo dos Santos; Marcus Bentes de Carvalho Neto

Artículos conceptuales

Received: 11 March 2020

Accepted: 28 July 2020

DOI: https://doi.org/10.5514/rmac.v46.i2.77884

Funding

Funding source: Brazilian Federal Agency for the Support and Evaluation of Graduate Education (CAPES)

Award recipient: Bruna Colombo dos Santos

Abstract: The reserve concept was the basis for Skinner considering punishment asymmetrical to reinforcement in the 1930’s. In this paper we explore why he abandoned the reflex reserve concept in the 1950’s, and what the implications of that were for his view on punishment. Skinner continued to claim that punishment was asymmetrical to reinforcement. We conclude that, although the reserve concept was nominally abandoned, its logic remained. We also discuss the terminology and definition of punishment and its explanatory mechanisms.

Keywords: punishment, 1950s, B.F. Skinner.

Resumen: El concepto de reserva fue la base para que Skinner considerara el castigo asimétrico del refuerzo. En este artículo demostraremos por qué el concepto de reserva fue abandonado por Skinner en la década de 1950, y cuáles fueron las implicaciones de eso para su punto de vista sobre el castigo. Skinner continuó afirmando que el castigo era asimétrico al refuerzo. Concluimos que, aunque el concepto de reserva fue nominalmente abandonado, su lógica se mantuvo. También discutimos la terminología y la definición del castigo y sus mecanismos explicativos

Palabras clave: castigo, década de 1950, BF. Skinner.

In the first part of this review (Santos & Carvalho Neto, 2020) we described how Skinner changed his views about punishment in the 1930’s. We showed that in 1935 Skinner held a symmetrical view of punishment which he called “negative conditioning”. In 1938, a change in the way Skinner talked about punishment occurred. He still called it “negative conditioning” but also “negative reinforcement”. However, in 1938, he started to question if punishment really weakened behavior. We demonstrated that the basis for this questioning was the reflex reserve concept (hereafter, the reserve concept). For Skinner, reinforcement built a reserve (a number of responses that could be emitted in extinction) that could not be destroyed by punishment, according to his experiments.

Nevertheless, Skinner started to question the reserve concept in 1940. He (1977, unpublished letter to Michael Zeiler, courtesy of the Archives of Harvard University) indicated that it was the study of increasingly complex reinforcement schedules that definitively ended the usefulness of the reserve concept. This review will show that this final change took place in 1950. Before this, however, Skinner discussed punishment in Walden Two (1948). The discussion of punishment in Walden Two therefore can be interpreted as having occurred in a period of transition between the questioning of and definitive abandonment of the reserve concept. After his work in 1950, Skinner spoke about punishment in a more focused way – in 1953 and again in 1957. We will evaluate the effect that the ultimate contesting of the reserve concept had on the concept of punishment during this period.

Andery (1990) suggested that Skinner (1938/1991) began a new stage in his descriptive and explanatory system focused on human behavior that culminated in a proposal for an experimental society, Walden Two. In addition, she suggested that the 1950s was a period of extrapolation of behavioral science to human issues. An example is the textbook Science and Human Behavior (Skinner, 1953), which served as the basis for subsequent experimental analysis of human behavior (Lattal & Perone, 1998). During this period, Skinner also became a psychologist with popular visibility in the United States (Rutherford, 2003).

Punishment in the late 1940s: Terminology and definition

At the end of the 1940s the term “punishment” was first used in published papers. The term appeared in Skinner and Campbell (1947), in which they described the construction of an apparatus for the repeated use of electric shocks. As it was a technical paper, there was no presentation or discussion of concepts. Skinner and Campbell (1947) simply used the term “punishment” without defining it. The word “punishment” was also used in Walden Two, where Skinner continued to use it interchangeably with the term “negative reinforcement”: “Punishment. Negative reinforcement. The threat of pain. It is a primitive principle of control” (1948, p. 302). In this novel he also presented the operations involved and the weak effect of punishment on the probability of a response:

The old school made the amazing mistake of supposing that the reverse [of positive reinforcement] was true, that by removing a situation a person likes or setting up one he doesn’t like – in other words punishing him – it was possible to reduce the probability that he would behave in a given way again. That simply doesn’t hold. […] We are gradually discovering – at an untold cost in human suffering – that in the long run punishment doesn’t reduce the probability that an act will occur. (Skinner, 1948, p. 260)

Skinner defined punishment in this passage by the first phrase. If the “removal of a situation that the person likes and production of one that the person does not like” can be interpreted as the withdrawal of positive reinforcement stimuli and the presenting of negative reinforcement, then, in Walden Two, punishment is defined in these terms.

Skinner (1948) used the term “probability” in discussing punishment. The adoption of the term “probability” occurred before 1948, appearing in Skinner (1947) as an “end term” within behavioral science, in the sense that probability should be the analytical tool of a predictive science of behavior. One might ask whether or not it was being conflated with the term strength. Analytically speaking, the term probability does not seem to add anything beyond what the term strength accomplishes, that is, the analysis remains the same as in the 1930s, where, in the long term, punishment does not maintain a reduction in response frequency (for more details on the concepts of strength and probability, see Johnson & Morris, 1987; Ferreira & De Rose, 2010). Skinner (1948) also maintained the distinction between immediate suppressive (short-term or temporary) and long-term effects on behavior in the same manner as he did in the 1930s (p. 260).

Skinner also discussed punishment intensity in a dialogue between Castle and Frazier¹, where Castle says that if punishment is strong enough, behavior will not be repeated. The answer given by Frazier was: “He’ll still tend to repeat it. He’ll want to repeat it. We haven’t really altered his potential behavior at all” (Skinner, 1948, pp. 260-261). In this excerpt, Skinner talks about “tendency” and “potential behavior.” In the 1930s, he questioned the status of punishment because it did not affect the number of “potential” responses available to be elicited (reserve). He seems to invoke in this context a similar explanatory logic, but with other semantics consistent with the analysis offered in the present review, in which the 1940s are considered a transitional period regarding the questioning and permanent abandonment of the reserve concept.

Bringing together the terminological characteristics and definitions presented here provides a definition of punishment in 1948 that reads as follows: “Punishment, or negative reinforcement, is (1) a primitive control technique, (2) wherein the presentation of negative reinforcement or removal of positive reinforcement occurs, (3) which has a temporary effect and does not reduce the probability of a behavior in the long-term.”

The final throes of the reserve concept: 1950

In the 1930s, Skinner considered punishment both symmetrically (1935) and asymmetrically (1938/1991) to reinforcement (Santos & Carvalho Neto, 2020). Arguably, the principal basis for asymmetry between reinforcement and punishment was the reserve concept.

Skinner (1948) considered punishment in terms of withdrawal of positive reinforcer and presentation of negative reinforcer. In terms of operations, these could be considered to be the opposite of reinforcement. However, he argued that, in the long term, its effects would not be symmetrically opposed to those of positive reinforcement because punishment does not alter an organism’s tendency to behave. In discussing punishment, he seemed to have used an explanatory logic quite similar to the logic of the reserve, so it may be argued that, in 1948, he continued to consider punishment as asymmetric in relation to reinforcement, and although the term “reserve” was not being used, its logic remained.

Skinner (1950) discussed the reserve concept, criticizing its usefulness in behavioral science:

One way of considering the question of why extinction curves are curved is to regard extinction as a process of exhaustion comparable to the loss of heat from source to sink or the fall in the level of a reservoir when an outlet is opened. Conditioning builds up a predisposition to respond – a “reserve” – which extinction exhausts. This is perhaps a defensible description at the level of behavior. The reserve is not necessarily a theory in the present sense, since it is not assigned to a different dimensional system. It could be operationally defined as a predicted extinction curve, even though, linguistically, it makes a statement about the momentary condition of a response. But it is not a particularly useful concept, nor does the view that extinction is a process of exhaustion add much to the observed fact that extinction curves are curved in a certain way. (Skinner, 1950, p. 203)

In this excerpt, the usefulness of the reserve concept was questioned, along with the notion of extinction as an exhaustion process; reserve was defined as “a planned extinction curve.” Although Skinner (1950) argued that the reserve concept was not a theory². because it did not invoke explanations in another domain that could not be observed, for example, he judged it to be useless. From this point onward, there is no explicit use of the term by Skinner, at least not in the texts selected for this review. The term occurred again only in his autobiography (Skinner, 1979) and in a review of “The Behavior of Organisms” (Skinner, 1989, p. 125), where he stated that the concept should have been abandoned sooner, because speculating about what is happening within the organism was a violation of a basic principle.

Considering the reserve concept as useless leads to certain implications for Skinner’s behavioral system, which was, from the mid1930s, based on this concept. If the reserve concept does not serve, then the definitions of conditioning as the “creation of a reserve” and extinction as “exhaustion” also should no longer serve, nor should the notions of drive and emotion in terms of changing the ratio between strength and reserve. The division of behavioral operations is therefore lost, along with the basis for classifying them differently.

Skinner (1950) tried to explain extinction without resorting to the notion of exhaustion. He said that two variables are important in extinction: emotion – generated by failure in the production of reinforcement – and novelty – in that the situation of extinction (after continuous reinforcement) is quite distinct from conditioning: because the responses do not produce more reinforcement, there is no ingestion and there is production of emotional responses. The “novelty” factor was considered by Skinner to be the most important in the explanation of extinction, and this can be observed with the use of periodic reinforcement³.

When there is periodic reinforcement, conditioning and extinction situations become more similar because there are periods of extinction during periodic reinforcement. Thus, there is adaptation of emotional responses, and the novelty factor is lower. Extinction curves with fewer and longer cyclical fluctuations therefore are produced. Skinner (1950) noted, however, that, if the interval is fixed, there is a possibility of discrimination because high response rates are correlated with the presentation of reinforcement and low rates are correlated with the absence of reinforcement.

Skinner (1950) stated that by preventing discrimination formation by using aperiodic reinforcement, the novelty factor can be decreased⁴. Within this variable-interval schedule, there is no correlation between different response rates and different reinforcement probabilities. Soft extinction curves therefore are produced such that responding is constant and more sustained than are the extinction curves developing following either continuous or periodic reinforcement.

With these conclusions, Skinner (1950) argued that the fact that intermittent reinforcement produces extinction curves that contained a larger number of responses than the ones after continuous reinforcement is hard to explain, if one expects a linear relation between the number of reinforcers and extinction. This means that it is possible to have extinction curves containing many more responses as compared to those numbers after continuous reinforcement, even when the number of reinforcers is the same. Such results violate both the reserve principle (where there is an established relation between the number of reinforcers and number of responses in extinction) and the principle of extinction as exhaustion.

Skinner (1950) showed that intermittent reinforcement schedules were crucial for abandonment of the reserve concept. Furthermore, there was a change in the treatment of extinction. It could be argued, then, that in this context, extinction lost its symmetrical character in relation to conditioning because it was no longer a simple process of removing responses built through conditioning. Rather, it depended on other factors, such as the similarity of extinction to the conditioning condition and thus the reinforcement schedule used, and discrimination.

It is possible to argue that Skinner (1950), even with a new analysis of extinction, still regarded it as the opposite of reinforcement, as illustrated by statements like these: “As the organism learns, the rate rises. As it unlearns (for example, in extinction) the rate falls” (Skinner, 1950, p. 197); “Learning is said to take place because the reinforcement is pleasant, satisfying, tension reducing and so on. The converse process of extinction is explained with comparable theories” (Skinner, 1950, p. 200).

In 1948, it appears that punishment continued to be considered asymmetric in relation to reinforcement and that the explanatory logic of the reserve was used. As noted previously, the reserve concept began to be called into question in 1940⁵; however, Skinner arguably did not abandon it completely until at least 1950. Walden Two was written in this period; therefore, the concept’s logic, although shaken, may have been maintained by Skinner in this work.

The reserve concept was finally abandoned by the end of the 1940s because it no longer had predictive value. In the first part of this review (Santos & Carvalho Neto, 2020) we suggested that the reserve was Skinner’s basis for considering punishment asymmetrical to reinforcement. Now, the question that arises is how Skinner maintained his view of punishment as asymmetrical to reinforcement, without the reserve concept. In the next sections, we will describe the definition of punishment, its explanatory mechanisms, and the issue of symmetry and asymmetry in relation to reinforcement in the 1950’s, without the reserve.

Punishment in the 1950s: Terminology and definition

The term used in the 1950s was only “punishment” (Rogers & Skinner, 1956; Skinner, 1953, 1953/2005, 1955a/1999 1955b/1999, 1957, 1957/1992). The term “negative reinforcement” came to describe a behavioral procedure/process, where a class of responses produced removal from (escape) or avoidance of a negative reinforcer stimulus and, as a consequence, the probability of the response class increased in similar conditions (Skinner, 1953/2005).

Negative reinforcement became a type of reinforcement. The terms “positive” and “negative” no longer indicated an increase or decrease in the strength of the operant, but rather stimulus “addition” or “subtraction” operations. This distinction, based on the operation, was made by Keller and Schoenfeld (1950) and subsequently was adopted by Skinner (1953/2005) (Michael, 1975).

Skinner (1953/2005, 1957/1992) defined the terms “punishment” and “reward” as retroactive effects of the consequences of behavior on the organism. Skinner identified the lay term “reward” with the technical term “reinforcement,” and “punishment” remained a lay and technical term, as negative reinforcement now had another meaning.

The types of consequences that may be retroactive on the organism were classified as positive and negative reinforcers. These consequences were identified through their effects (increase) on the probability of a class of responses that are contingent: positive reinforcers are produced by the response, and negative reinforcers are removed; the process (behavioral change) or procedure (operations performed) was called positive or negative reinforcement. In describing the two types of reinforcement and the differences between them, Skinner said: “The difference between the two cases will be clearer when we consider the presentationof a negative reinforcer and the removal of a positive. These are the consequences which we call punishment (Chapter XII)” (Skinner, 1953, p. 73). One can observe a further specification of the term “punishment”: presentation of negative reinforcers and the removal of positive reinforcers.

Skinner (1953/2005, pp. 71, 78, and 182) also referred to punishment as a “control technique” used to reduce behavioral tendencies that are constructed by reinforcement (Skinner, 1953/2005, p. 182). He said, however, that punishment does not put an end to these tendencies, because its suppressive effects are temporary and behavioral reduction is not permanent: “More recently, the suspicion has also arisen that punishment does not in fact do what it is supposed to do. An immediate effect in reducing a tendency to behave is clear enough, but this may be misleading. The reduction in strength may not be permanent” (Skinner, 1953/2005, p. 183).

Some points are worth mentioning. The first is the use of the word “tendency” which seems to be identified by Skinner (1953/2005) – along with predisposition – with probability, both being described based on frequency. Since the 1930s, the basic data of behavior analysis have been the rate or frequency of responses, which is used to infer the concept of strength and now of probability. Note that Skinner (1948) used the term “probability” and here uses the term “strength,” supporting the interpretation that they could be used with the same connotation⁶.

It seems safe to state that when Skinner (1953/2005) used the terms “tendency” or “predisposition,” he was referring to the probability of the organism behaving, inferred from the frequency, that is, how often a response sample occurs over a period of time. This leads to the second point that should be highlighted in terms of punishment: its temporary versus permanent effect, a distinction maintained by Skinner since the 1930s. When he spoke of punishment, this temporal division seemed to be defining: punishment has temporary effects on probability (tendency), these effects do not last over time.

Effect durability is a complicated defining characteristic from a practical and an experimental point of view because one might ask: How long, in units of time, is “temporary” and “permanent”? By which criteria can one ascertain whether something is temporary or not? Upon what does durability depend? For how long must an experiment continue to ascertain whether a response remains suppressed or not?

Skinner (1953/2005) maintained this temporal distinction based on one of the experiments he published in 1938, which also is cited in the 1953 book. In this experiment, he used a mild punishment (a slap on the paws of rats for 10 min) and observed that when the punishment was discontinued, the response recovered completely. He also stated that when the punishment was severe it was more difficult to demonstrate that the responses will reappear, but even in these conditions, after a period of time, the rate did not remain low and returned to the levels to be expected had punishment not been administered.

As the durability of suppression in punishment is not permanent, Skinner (1953/2005) did not consider it to be “opposite to reward” (p. 184). He argued that punishment does not have effects that are comparable, albeit different in direction, to those of reinforcement. Thus, his definition could not follow the logic of the definition of reinforcement. It is assumed that because of this he said:

We must define punishment without presupposing any effect. This may appear to be difficult. In defining a reinforcing stimulus we could avoid specifying physical characteristics by appealing to the effect upon the strength of the behavior. If a punishing consequence is also defined without reference to its physical characteristics and if there is no comparable effect to use as a touchstone, what course is open to us? The answer is as follows. We first define a positive reinforcer as any stimulus the presentation of which strengthens the behavior upon which it is made contingent. We define a negative reinforcer (an aversive stimulus) as any stimulus the withdrawal of which strengthens the behavior. Both are reinforcers in the literal sense of reinforcing or strengthening a response. Insofar as scientific definition corresponds to lay usage, they are both “rewards.” In solving the problem of punishment we simply ask: What is the effect of withdrawing a positive reinforcer or presenting a negative? (Skinner, 1953/2005, pp 184-185).

The statement that one should define punishment without assuming any effect can cause confusion because Skinner (1953, 1953/2005) presented punishment as one of the retroactive effects of behavioral consequences and also made statements about its temporary effects (Skinner, 1953/2005). Thus, one might ask: If punishment is classified as a retroactive effect of consequences, and if Skinner presents short term effects, how could he say that punishment has no effect? This confusion is resolved when one observes that, by stating that punishment has no effect, Skinner was probably referring to an effect that was comparable to that of reinforcement.

In summary, five defining elements were identified that if grouped together would yield the following definition: Punishment is a behavioral control technique (1), characterized by the presentation of negative reinforcement or removal of positive reinforcement contingent on a class of responses (2). These operations act retroactively on behavior (3), producing temporary suppression of a response class (4), and this retroactive action is not comparable to the effects of reinforcement (5). Elements 1, 2 and 3 were present in the 1948 definition, and most of them (2, 3, 4 and 5) were in the definition subsequently provided by Skinner (1957/1992). It also is part of the definition of punishment that its effects must not be considered comparable to the effects of reinforcement. Thus, the thesis of asymmetry arises again, even within Skinner’s (1950) own discussion of the topic.

Explanatory mechanisms of behavioral suppression

When discussing punishment, Skinner (1948) argued that its effects are immediate and that it does not reduce the probability of the punished response in the long term. However, in this novel, Skinner does not describe the behavioral mechanisms involved in this temporary suppression, as he did in 1938.

The “immediate versus long-term effects” dichotomy remained in Skinner’s 1950s writings about punishment (1953/2005; 1957/1992). Skinner (1953/2005) argued that the effects of punishment on behavior were immediate or temporary, that is, punishment did not have long-term effects. He affirmed that based on at least one experiment published in 1938 (Experiment II). In his words:

The difference between immediate and long-term effects of punishment is clearly shown in animal experiments. In the process of extinction the organism emits a certain number of responses which can be reasonably well predicted. As we have seen, the rate is at first high and then falls off until no significant responding occurs. The cumulative extinction curve is one way of representing the net effect of reinforcement, an effect which we may describe as a predisposition to emit a certain number of responses without further reinforcement. If we now punish the first few responses emitted in extinction the theory of punishment would lead us to expect that the rest of the extinction curve would contain fewer responses. If we could choose a punishment which subtracted the same number of responses as are added by a reinforcement, then fifty reinforced responses followed by twenty-five punished responses should leave an extinction curve characteristic of twenty-five reinforced responses. When a similar experiment was performed, however, it was found that although punishing responses at the beginning of an extinction curve reduced the momentary rate of responding, the rate rose again when punishment was discontinued and that eventually all responses came out. The effect of punishment was a temporary suppression of the behavior, not a reduction in the total number of responses (Skinner, 1953/2005, pp. 183-184).

In this excerpt, Skinner (1953/2005) demonstrated that his view of extinction was remarkably similar to the 1930’s. The extinction represents the net effect of reinforcement, so it was its best measure. Punishment would have long-term effects if it affected the total number of responses emitted in extinction. Skinner observed that punishment did not have this effect, it only disrupted responding at the beginning of extinction. He still argued that even under severe punishment the total number of responses emitted during extinction was not lower than if no punishment had been administrated.

Even if punishment did not have long-term effects on behavior, Skinner (1953/2005) had to explain why the ‘immediate suppression’ occurred. For that, he described three effects of punishment: (1) eliciting incompatible respondent behavior and emotional predispositions; (2) building new conditioned aversive stimuli (properties of the punished behavior and context) through paring that will elicit incompatible respondent behavior and emotional predispositions; and (3) selecting any response class which withdraw or avoids these conditioned aversive stimuli (negative reinforcement).

These effects, which we often call explanatory mechanisms, are important because they explain why behavior is suppressed by punishment as soon as punishment occurs (first mechanism) and why behavior keeps being suppressed when punishment no longer occurs (second and third mechanism). We use the term explanation in discussing these mechanisms, but it is important to clarify that this term does not refer to a theory in the sense criticized by Skinner (1950). We describe the three mechanisms in the order presented by Skinner (1953/2005). Keeping the order is important because, in our interpretation, the mechanisms are hierarchical. That is, the first one needs to occur for the second to take place, and the second needs to occur for the third to take place. We recognize that this is a hypothesis that could be tested.

First mechanism: Explanation of behavioral suppression when punishment is in force

Skinner (1953/2005) stated that the first behavioral suppression mechanism involved in punishment is confined to the immediate situation, when punishment is being administered. The presentation of conditional or unconditional aversive stimuli elicits incompatible behavior and generates emotional responses that interfere with punished behavior (p. 186). So, in this moment, the behavior is suppressed. This effect does not need to be followed by any lasting behavioral change that could be viewed when punishment was not in force anymore. Skinner stressed that the behavioral suppression observed when punishment is in force is not typical of punishment itself, but of the presentation of aversive stimuli, whether contingent or not on a response class. That is, any aversive stimuli could disturb a response in course when presented, despite its functional relation to behavior.

Skinner (1953/2005) gave the example of the child who is pinched by his mother for laughing in church – the pinch elicits incompatible responses (e.g., crying), so the laughter stops momentarily. Regarding emotional predispositions, Skinner said that a man can be stopped from escaping, for example, simply by making him “angry.” This can be done via an emotional operation that will change the probability of certain responses that are maintained via common consequences. When someone is angry, the reinforcing value of the consequence “doing harm” increases and the probability of the behavior that produced this result in the past increases. These responses, in this case, are incompatible with the escape response because, to produce harm the individual must be close to the object or person on which the damage will be inflicted.

Second and third mechanisms: Explanation for maintained behavioral suppression when punishment is no longer in force

Skinner (1953/2005) argued that punishment may have effects that go beyond the immediate situation. In this case, he explained why punished behavior remains suppressed even after removal of the punishment. Two mechanisms are involved in lasting suppression: (1) conditioning of neutral stimuli to become conditional aversive stimuli; and (2) negative reinforcement.

The conditioning of neutral stimuli into conditional aversive stimuli works similarly to the first mechanism. The difference is that the stimuli that elicit unsuited responses and emotional predispositions have been conditioned according to Pavlovian principles. The neutral stimuli that will be conditioned may arise from (1) the punished behavior itself, whereby stimulation arising from the response itself is paired with the aversive stimulus; and (2) external stimulation that occurs concomitantly with the punished response. Both become conditional aversive stimuli capable of evoking incompatible behavior (Skinner, 1953/2005, p. 1987).

Incompatible behavior may be (1) respondent – e.g., responses of glands and smooth muscles; and (2) operant – e.g., emotional predispositions, which are changes in the normal probability of behavior, as in the example of the angry man. The emotional predispositions could be treated as motivating operations (Laraway et al., 2003). Although the second mechanism is not primary in explaining the suppressive effects of punishment, it is important because the behavior that was punished does not occur because stimulation resulting from the response itself or external circumstances produces respondent and operant responses that interfere with the punished response.

The third mechanism (negative reinforcement) is, for Skinner (1953/2005), the most important: when a response is followed by an aversive stimulus, any stimulation that accompanies the response, whether arising from the behavior itself or from concomitant circumstances, will be conditioned aversively. Because the response-dependent removal of aversive stimuli can act as negative reinforcement, any response that reduces conditional aversive stimulation is negatively reinforced (p. 188).

In Skinner’s view (1953/2005), behavior that reduces or prevents aversive stimulation must be specified for both theoretical and practical reasons (p. 189). Such behavior can be: (1) the opposite of the punished behavior; (2) “doing nothing,” in the sense of standing still; or

(3) behavior appropriate to other variables that occur in the situation but that are not sufficient to explain the degree of probability without the addition of negative reinforcement (i.e., imagine a situation where a child stays off task in classroom most of the time, and the teacher sets up a behavioral program in which the child has to work on a task for 10 min in his seat to access 10 min of interval—recess. The teacher observes, when this contingency is in force, that the frequency of on task behavior increases, but not to the point he would consider acceptable. Then, the teacher adds a response cost contingency in which the child loses one min of interval for every minute he stays off task. After that, the teacher observes an increase in the frequency of on task behavior).

Skinner (1953/2005) discussed in greater detail behavior that interferes with punished behavior, labeled as “doing nothing” or “doing anything else.” He noted different types of conflict generated by punishment⁷: (1) the response produces positive and negative reinforcers (e.g., eating food that taste “good,” but results in poor digestion); (2) the response produces first negative and then positive reinforcers and (3) the response produces an aversive stimulus unless another one is emitted (e.g., putting on a raincoat on a dark, cold day—if one fails to emit this response in particular, one will contact an aversive stimulus, cold rain).

Skinner (1953/2005) noted that it is tempting to formulate these cases without mentioning the incompatible behavior because if one is concerned about whether the individual does or does not perform a particular response, and if he does not, there will be a tendency to talk about negative probability if responses that occur are not specified. However, Skinner stressed that the purpose of a science of behavior— prediction and control—is achieved when dealing with positive, but not with negative, probabilities.

To be concerned with what the organism does is perhaps one of the most important contributions of Skinner’s formulation of punishment, whether or not it is the most appropriate with regard to the description and explanation of behavior. It is valuable to consider the responses that occur when the punished behavior no longer occurs because they may not be more effective or “better,” from an ethical point of view, than the punished behavior, either for the individual or for society. Thus, a formulation that emphasizes these responses may be important to the behavior analyst because it draws attention to what the individual is doing.

The temporary effects of punishment are explained via mechanisms that have been described previously. How “temporary” these effects are will depend on the degree of conditioning of the stimuli generated by the response itself or by concomitant circumstances. Therefore, parameters such as the intensity and duration of the aversive stimuli play an important role in the durability of suppressive effects.

Figure 1 is a schematic representation of the explanatory mechanisms of response suppression involved in punishment, based on Skinner (1953/2005):

Figure 1
Diagrammatic representation of the mechanisms of punishment in the 1950s

Note. The abbreviations correspond to the following: S1, S2, S3, S4, S5... = context; R = punished response; sR1, sR2, sR3... = properties of punished R; SR+ = positive reinforcement; Sav= aversive stimulus; Rp1,2,3 = elicited responses; Re1,2,3 = emotional predispositions; (...) = time; Sav1, Sav2, Sav3, Sav4, Sav5... = antecedent aversive stimuli (context); sRav1, sRav2, sRav3, sRav4, sRav5... = aversive properties of the punished response; Ri= incompatible response; (black line upwards) p = increase probability; horizontal black line = produces; continuous line = evokes; horizontal black line cut by continuous line = removes or reduces.

According to Figure 1, one can observe that a response class (R) is emitted and produces a reinforcer (SR+) in a context (S., S., S., S.). However, this response class also produces an aversive stimulus (Sav).

Now the suppression starts because the aversive stimulus elicits respondent behavior (R_p1,2,3) and emotional responses (R_e1,2,3). But the aversive stimulus also changes the function of response properties

(sR_av1, sR_av2, sR_av3…) and context (S_av1, S_av2, S_av3). They become conditional aversive stimuli. When the organism is placed in the same context and the probability of punished behavior increases, the properties of behavior and the context, now, function as aversive stimuli, so any behavior that eliminates these conditional aversive stimuli is reinforced negatively.

The negative reinforcement is an included mechanism that had not been presented in the 1930’s. Skinner (1938/1991) had not developed the concept of negative reinforcement, so it was not included in the analysis of punishment. Once Keller and Schoenfeld (1950) had made the distinction between positive and negative reinforcement and Skinner adopted it, it was possible to include this new explanatory mechanism in the analysis of punishment.

The remaining question is: What were the conditions that led Skinner to include this new mechanism? Keller and Schoenfeld (1950) did not invoke negative reinforcement in their analysis of punishment. An explanation of punishment as occurring via the emergence of an antagonistic response was, however, posited by Konorski and Miller (1937). It seems plausible to consider that Konorski and Miller’s critique had some influence, although there was a considerable time gap between the two works.

Another factor that appeared was a merging of emotion and motivation (Skinner 1953/2005). Skinner argued that the presentation of an aversive stimulus resembled a sudden increase in deprivation in its effects on behavior. However, because deprivation is an operation and differ from the presentation of stimuli, he argued that deprivation and presentation of an aversive stimulus should remain in separate fields – motivation and emotion, respectively (for details, see Pereira, 2013). Skinner (1957/1992) introduced a distinct concept in recognizing the presentation of aversive stimuli as a motivational operation. Thus, whether one analyzes punishment in terms of emotion or motivating operations does not appear to result in any great theoretical differences, although Skinner continued to use the former.

Symmetry and asymmetry with reinforcement

In the 1950s, the hypothesis of punishment asymmetry in relation to reinforcement was maintained. Skinner (1953/2005; 1957/1992) stated at various times that punishment is not the opposite of reward or reinforcement (Skinner, 1953/2005, pp. 184, 230), it does not produce a direct weakening of responses (Skinner 1953/2005, p. 360) and the assumption that punitive consequences would reverse the effect of reinforcement has not survived experimental analysis (Skinner 1957/1992, p. 166).

The reserve concept which underpinned the thesis of asymmetry in the 1930s was abandoned in the 1940s. Although in the 1950s the term “reserve” was no longer used, when talking about punishment, the logic involved in the concept remained. Skinner (1953/2005) cited one of his 1930s experiments to illustrate why punishment is not the opposite of reinforcement, making an input-output type analogy to describe that, if 50 responses are reinforced and 25 punished, one would have to expect an extinction curve containing 25 responses (p. 184). This example included characteristics of the reserve concept (potential number of responses that will be elicited in extinction) and assumed a simple relation between the number of reinforcers and the number of subsequent responses that Skinner already had questioned (Skinner, 1950).

It therefore seems that some of Skinner’s older ideas survived in the analysis of punishment in the 1950s. Although questioned in 1950, the notion of extinction continued to be represented by Skinner as symmetrically opposed to reinforcement. Skinner (1953/2005) noted, for example, that extinction removes an operant from the repertoire of an organism (p. 71) and that it has the effect of reversing the process generated by reinforcement (p. 206).

It has been noted that, by abandoning the reserve concept, Skinner abandoned the basis for the thesis of asymmetry between reinforcement and punishment. Although he had nominally abandoned the concept, some of its defining elements remained in use. For example, he continued to state that reinforcement builds a number of potential responses (Skinner, 1957, p. 2) and that extinction was its main measure (Skinner, 1953/2005, p. 184).

These ideas demonstrate the notion that the effects of reinforcement are observed in the future and the reserve concept had the function of explaining why this occurs. The way Skinner discussed the reserve in the 1930s was more rigid and was characterized by the metaphor of a hydraulic system, which experimentation with reinforcement schedules and drive proved inadequate. There was no exact relation between the number of reinforcers and number of responses in extinction, and changes in the drive produced direct changes in the number of responses. Although Skinner could no longer speak about the reserve, the fact that reinforcement changed the organism in the sense of constructing the potential for responses observed in the future was not disputed. The accuracy of the numerical relation between reinforcers and responses and that variables could change this potential number were the contested issues.

The core of the reserve concept (change in the organism observed in the future) remained, and with it, the thesis of asymmetry, despite the questions posed by Skinner himself (1950). However, even before his 1950 work, Skinner seemed to flirt with the symmetry thesis. Skinner mentored Estes’ (1944) monograph, in which he extended Skinner’s analysis of punishment. In discussing the experiments, Skinner (1979) stated that “... although strong punishment evidently “reduced the reserve, the eventual rate of engaging in punished behavior was not much affected” (p. 278). In this assertion, Skinner admitted that very intense punishment eliminates tendencies to behave (“reserve”), which would leave punishment, at high intensities, diametrically opposed to positive reinforcement. This position is different from the one Skinner (1938/1991) argued, even when he used the “slap” for an extended period and observed no response recovery. However, although Skinner seems to have recognized this fact, no one knows for sure if this recognition had occurred at the time when the monograph was being produced (1940s) or if it occurred later during the writing of his autobiography. If the first interpretation is correct, this recognition did not affect the way he portrayed punishment in 1953 and 1957. If the second is correct, some type of change in his conceptualization of punishment proposal should be identifiable in the 1970s.

Final considerations

At the end of the 1940s, the term “punishment” was first used in published documents. However, Skinner (1948) also used the term “negative reinforcement” with the same connotation. In 1948, the definition of punishment remained the opposite of reinforcement in operational terms, but Skinner maintained the hypothesis of asymmetry, also maintaining its division between temporary and permanent effects. The 1940s may have been years of transition from the questioning of the reserve concept to its definitive abandonment. Thus, the definition of punishment presented in 1948 and the explanation of its effects maintain the same logic as in the 1930s. Skinner (1950) announced the futility of the reserve concept and maintained that its argument was based on studies with intermittent reinforcement schedules that, together with the data obtained in 1940, eventually broke the logic between the number of reinforcers and number of responses observed in extinction. In this article, Skinner questioned whether extinction was really the reverse of reinforcement, despite maintaining this assumption in his other works.

With the final contesting of the reserve concept, the definition of punishment presented in the 1950s was analyzed. Several factors were considered defining, especially the operations that remained the same and the division between temporary and permanent effects of punishment. Three mechanisms were used to explain the temporary effects of punishment and how temporary they are. The first two, in a way, already were present in the 1930s, and only the third (negative reinforcement) was added. The thesis of asymmetry was maintained even after it was contested in 1950. It was argued that, although the reserve concept had been abandoned nominally and in a strict sense, its main characteristic (change in the organism produced by the reinforcement) remained in Skinner’s analysis, allowing him, arguably, to sustain the thesis of asymmetry.

Acknowledgments

The authors thank the Brazilian Federal Agency for the Support and Evaluation of Graduate Education (CAPES) for financing this research through a doctoral scholarship and sandwich doctorate granted to the first author. The authors also thank professor Dr. Carlos Souza (UFPA) who improved the quality of this paper and B. F. Skinner Foundation for support for the first author.

References

Andery, M. A. P. A. (1990). Uma tentativa de (re)construção do mundo: a ciência do comportamento como ferramenta de intervenção [An attempt of world’s (re)construction: science of behavior as an intervention tool] [Unpublished doctoral dissertation]. Pontifícia Universidade Católica de São Paulo.

Keller, F. S., & Shoenfeld, W. N. (1995). Principles of psychology: A systematic text in science of behavior. B. F. Skinner Foundation. (Original work published 1950).

Lattal, K. A., & Perone, M. (1998). The experimental analysis of human operant behavior. In K. A. Lattal & M. Perone (Eds.), Handbook of research methods in human operant behavior (pp. 3-14). Springer Science & Business Media.

Laraway, S., Snycerski, S., Michael, J., & Poling, A. (2003). Motivating operations and terms to describe them: some further refinements. Journal of Applied Behavior Analysis, 33, 407–414. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1284457/

Michael, J. (1975). Positive and negative reinforcement, a distinction that is no longer necessary; or a better way to talk about bad things. Behaviorism, 3, 33-44.

Pereira, M.B. R. (2013). A noção de motivação na análise do comportamento [The notion of motivation in behavior analysis] [Unpublished master thesis]. Pontifícia Universidade Católica de São Paulo.

Rogers, C. R., & Skinner, B. F. (1956). Some issues concerning control of human behavior: A symposium. Science, 124, 1057 -1066. https://science.sciencemag.org/content/124/3231/1057

Rutherford, A. (2003). B. F. Skinner’s technology of behavior in American life: from consumer culture to counterculture. Journal of History of Behavioral Sciences, 39(1), 1-23. https://doi.org/10.1002/jhbs.10090

Santos, B. C., & Carvalho Neto, M. B. (2020). B. F. Skinner’s evolving views of punishment: I. 1930-1940. Mexican Journal of Behavior Analysis, 45(2), 149-172. http://dx.doi.org/10.5514/rmac.v45. i2.75561

Skinner, B. F. (1935). Two types of conditioned reflex and a pseudotype. The Journal of General Psychology, 12(1), 66-77. https://doi.org/10.1080/00221309.1935.9920088

Skinner, B. F. (1947). Experimental psychology. In W. Dennis (Ed.), Current trends in experimental psychology (pp. 16-49). University of Pittsburg Press.

Skinner, B. F., & Campbell, S. L. (1947). An automatic shockinggrid apparatus for continuous use. Journal of Comparative and Physiological Psychology, 40, 305-307. https://doi.org/10.1037/h0063537

Skinner, B. F. (1953). Some contributions of an experimental analysis of behavior to psychology as whole. American Psychologist,8, 6978. https://doi.org/10.1037/h0054118

Skinner, B. F. (1957). The experimental analysis of behavior. American Scientist, 45, 343-371.

Skinner, B. F. (1977). Unpublished letter to Michael Zeiler. Harvard University Archives: Cambridge.

Skinner, B. F. (1979). The shaping of a behaviorist: part two of an autobiography. Alfred Knopf.

Skinner, B. F.(1989). Recent issues in the analysis of behavior. Merrill.

Skinner, B. F. (1991). The behavior of organisms: An experimental analysis. B. F. Skinner Foundation (Original work published 1938).

Skinner, B. F. (1992). Verbal behavior. B. F. Skinner Foundation (Original work published 1957).

Skinner, B. F. (1999). Freedom and the control of men. In V. G. Laties & A. C. Catania (Eds.), Cumulative record: Definitive edition (pp. 3-18). B. F. Skinner Foundation (Original work published 1955a).

Skinner, B. F. (1999). The control of human behavior (Abstract). In V. G. Laties & A. C. Catania (Eds.), Cumulative Record: Definitive Edition (pp. 19-24). B. F. Skinner Foundation (Original work published 1955b).

Skinner, B. F. (2005). Science and human behavior. B. F. Skinner Foundation (Original work published 1953).

Notes

1 Frazier is the planner of the Walden Two community, which is an experimental community based on behavioral science principles. Castle is a philosopher who, along with Burris, visits Walden Two in order to get to know the community. Castle represents traditional thinking and Frazier experimental thinking based on behavioral science.

2 Skinner (1950) defines theory as any explanation for an observed event that refers to events that are located on another dimension or level of observation, described in different terms and measured in terms of different dimensions (p. 193).

3 When describing periodic reinforcement, Skinner (1950) outlined an intermittent schedule in which the presentation of reinforcement depended on the response, but in which reinforcement availability was assigned after a fixed time interval had elapsed.

4 Skinner (1950) defined aperiodic reinforcement as a schedule in which there are intervals between reinforced responses that are so short that no non-reinforced response intervenes, and long intervals (two minutes in this case). Other intervals are distributed periodically between the lowest and highest interval value, with the average equal to one minute (p. 207).

5 Skinner (1989) stated that he abandoned the reserve concept within a year of the publication of his 1938 paper, but he still used it after that period in the works published in 1941.

6 Probability and strength seem to have the same meaning, as can be seen in the statements: “In operant conditioning we “strengthen” an operant in a sense of making a response more probable or, in actual fact, more frequent” (Skinner, 1953, p. 65); “Our basic datum is not the occurrence of a given response as such, but the probability that it will occur at a given time. Every verbal operant may be conceived of as having under specified circumstances as assignable probability of emission – conveniently called “strength” (Skinner, 1957, p. 22).

7 A class of responses that is punished is likely to be maintained by reinforcing consequences in a schedule. Because the same class produces positive and negative consequences, it creates conflict.

[Commentary article , vol1. 45, 149] http://dx.doi.org/10.5514/rmac.v45.i2.75561