Delay Discounting in Pigeons Using a Token Reinforcement System

Ricardo S. Campos-Rivera; Cristiano V. dos Santos

resúmenes

secciones

referencias

imágenes

Abstract: Delay discounting is the decrease of the value of a reward due to a delay in its receipt. The results with human and non-human animals have shown several similarities, but there have also been consistent differences. In order to further explore the sources of these differences, we used a token reinforcement system to evaluate delay discounting in pigeons with a procedure more similar to the ones used with humans. Pigeons were exposed to an adjusting amount task with tokens as rewards. In Phases 1 and 3, tokens were exchangeable immediately after its receipt in each trial, whereas in Phase 2, tokens were exchangeable after four trials. The results of Phases 1 and 3 showed a pattern of discounting similar to previous studies, whereas subjects showed a greater degree of discount in Phase 2 in comparison to the other phases. We argue the greater degree of discount might be due to the accumulation of tokens and the actual delay to the food.

Keywords: delay discounting, token reinforcement system, species differences.

Resumen: El descuento temporal es la disminución en el valor de una recompensa debido a una demora en su obtención. Los resultados con animales humanos y no-humanos muestran similitudes, pero también existen diferencias consistentes. Con el fin de explorar las fuentes de estas diferencias, usamos un sistema de reforzamiento de tokens para evaluar descuento temporal en palomas con un procedimiento más similar al usado con humanos. Las palomas fueron expuestas a una tarea de ajuste de la magnitud con tokens como recompensas. En las Fases 1 y 3, los tokens eran intercambiados inmediatamente después de su obtención en cada ensayo. En la Fase 2, los tokens eran intercambiados después de cuatro ensayos. Los resultados de las fases 1 y 3 mostraron un patrón de descuento similar al de otros estudios mientras que, en la Fase 2, los sujetos mostraron un mayor nivel de descuento en comparación las otras fases. Nosotros argumentamos que el mayor nivel de descuento podría deberse a la acumulación de los tokens y la demora obtenida a la comida.

Palabras clave: Descuento temporal, Sistema de reforzamiento de tokens, Diferencias entre especies.

Carátula del artículo

Artículos de investigación

Delay Discounting in Pigeons Using a Token Reinforcement System

Descuento Temporal en Palomas Usando un Programa de Reforzamiento de Tokens

Ricardo S. Campos-Rivera camrivrs@gmail.com

Universidad de Guadalajara, México

Cristiano V. dos Santos

Universidad de Guadalajara, México

Revista Mexicana de Análisis de la Conducta, vol. 48, núm. 1, pp. 71-93, 2022
Sociedad Mexicana de Análisis de la Conducta

Recepción: 12 Diciembre 2021

Aprobación: 30 Abril 2022

DOI: https://doi.org/10.5514/rmac.v48.i1.82750

Introducción

Delay discounting is the decrease in the subjective value of a reward due to the delay to its receipt (Odum, 2011) and has been used to explain the preference for more immediate over more delayed rewards, even when the latter have a larger magnitude. It has been observed in a variety of species, such as humans (Rachlin et al., 1991), rats (Krebs et al., 2016; Reynolds et al., 2002; Turturici et al., 2018), mice (Mitchell, 2014), big apes (Rosati et al., 2007) and pigeons (Ainslie & Herrnstein, 1981; Green et al., 2010), which suggests delay discounting may be adaptive (Fawcett et al., 2012).

Assessing delay discounting involves procedures in which the subjects are exposed to several choices between two alternatives with different delays and magnitudes of reward. Then one dimension of one of these alternatives is varied until subjects show an equal preference for both alternatives. This is called the indifference point and the value of both alternatives is assumed equal.

The shape of discount functions, which decreases rapidly with short delay and levels off with longer delays (McKerchar & Renda, 2012), is also a feature shared among many species. In addition, discount rates may be affected by the long-term use of psychotropic drugs and this effect may be seen in humans (Kirby et al., 1999; Madden et al., 1997) and nonhuman animals (Eppolito et al., 2013; Logue et al., 1992). Furthermore, the age of the subjects, of either humans (Green et al., 1999) or rats (Renda et al., 2018), is inversely related to discount rates.

Despite these similarities, humans tend to show smaller discounting rates in comparison to non-humans (Tobin & Logue, 1994). This is usually based on the fact that humans can wait months (e. g. Rachlin et al., 1991) while nonhumans usually wait seconds. Nevertheless, Jimura et al. (2009) showed that humans may show a decrease in the subjective value of a reward with intervals as short as 8 s or 16 s when the rewards are real and consumable. Another variable that fails to generalize between humans and nonhumans is the effect of the magnitude .smaller magnitudes are discounted at a higher rate than larger magnitudes). This effects has been widely reported with humans (Green et al., 1997; Green et al., 2013; Mellis et al., 2017) but it appears to be absent with nonhumans (Calvert et al., 2010; Farrar et al., 2003; Green et al., 2004; Holt & Wolf, 2019) with exceptions that are difficult to interpret due to methodological differences (Evenden & Ryan, 1996; Grace et al., 2012).

Given the number of studies suggesting the absence of an effect, it may be tempting to conclude that an inherent difference between humans and nonhumans exists. Nonetheless, a study by Reyes-Huerta and dos Santos (2016) showed that the magnitude effect is also absent when humans are exposed to alternatives whose magnitudes must be estimated (a feature similar to the procedures with nonhumans) and are not expressed as symbols. In their study, they assessed delay discounting in human participants with two tasks. One was the typical task in which participants chose among alternatives with magnitudes expressed as numbers. For example, $2000 in one month or $750 now. The other was a variation in which the magnitude of each alternative was represented with dots of different colors and had to be estimated. For example, 200 green dots in one month or 75 blue dots now. Except for the color of the dots, they all had the same value ($10). Their result showed that the magnitude effect was present in the condition where the magnitudes were expressed as numbers but absent when the magnitudes were presented as dots.

Regarding the differences found between human and nonhuman behavior, Hackenberg (2005) argued that these may be one of two types: quantitative (part of a cross-species behavior continuum) or qualitative and distinctive by its own nature, which calls for special principles for human behavior. However, arriving at an unambiguous conclusion to this question is often difficult because of the procedural differences among studies with different species. One such difference is that studies with humans usually involve earning points at a time A and exchanging them for another reinforcer (for example money) at a later time B. This arrangement may be achieved in studies with nonhumans by means of a token system, which would help reduce the differences between species.

The systems of token reinforcement are a series of three interdependent schedules that specify the relation between the a) the response that produces tokens; b) the exchange-production schedule, which is the rule that establishes how exchange opportunities are made available and c) the token exchange, which is the schedule by which the token is exchanged for other reinforcers (Hackenberg, 2009, 2018). Thus, subjects would have to respond for a token that would be later exchangeable for food instead of responding for a primary reinforcer (such as food or water), a feature that makes the procedures with humans and nonhumans more similar.

The pioneering study by Jackson and Hackenberg (1996) showed that the pigeons’ choices may be affected by the delivery of tokens. In their Experiment 1, they exposed pigeons to choices between two alternatives: a delivery of one immediately available token or three delayed tokens and forced them to accumulate tokens that were exchangeable for food later in the session or at the end of it, similar to the studies with humans. Across different conditions, they manipulated the number of choice trials needed to initiate the exchange period (when the tokens could be exchanged for food), from 1 to 10 choice trials. Their results showed a preference for the immediate and smaller over larger delayed alternative, regardless of the number of trials needed to initiate the exchange period, because the total delay to the food was shorter with the smaller-sooner alternative than the larger-delayed alternative and thus the delay to the exchange was unequal between alternatives. This preference was reversed in a second experiment when the delay to the exchange was equal for both alternatives.

In addition to choice, other similarities between the behavior of pigeons and humans were found under token reinforcement. For example, pigeons avoided situations in which responding could diminish the number of tokens obtained (Pietras & Hackenberg, 2005) and preferred a token exchangeable for more reinforcers relative to another exchangeable for fewer reinforcers (specific conditioned reinforcer versus generalized conditioned reinforcer, Andrade & Hackenberg, 2017). However, even though choice and preference for largerlater and smaller-sooner reinforcers have been studied with tokens, the analysis of a discount function, where indifference points are assessed with a titration procedure, has not yet been conducted using tokens as reinforcers.

Also, another feature in the studies with humans is the delay to the exchange of the rewards. With human participants, the exchange of the reward usually occurs at the end of some period, commonly at the end of one experimental session, whereas nonhumans are allowed to exchange immediately after token receipt. Thus, contrary to pigeons, humans are usually forced to accumulate token reinforcers. Thus, the aim of the present study is twofold: first, to evaluate whether pigeons discount delayed rewards when they are tokens exchangeable for food and second, to assess the effect of delaying the exchange period forcing the subjects to accumulate tokens.

Method

Subjects

Three male pigeons identified as Subject 1, Subject 2 and Subject 3 with previous experience in progressive ratio, fixed interval and tandem schedules. The pigeons did not have previous experience with token reinforcement systems. They were housed individually and their weights were maintained at 80 % (±15 grams) of their free-feeding weight. All procedures developed were approved by an ethical committee.

Apparatus

One experimental chamber designed by MED was used. Its dimensions were 32cm long, 25cm wide and 30cm tall. At the top center of the back wall, a chamber light (28v) was located. In the front wall of the chamber, three response keys were arranged in a horizontal row. The distance from the chamber floor to the center of the key was 14.5cm. Each key had a diameter of 2.2cm. The side keys could be illuminated with either a red or a green light and the center key with either a blue or a yellow light. A food magazine was placed below the center key and it was illuminated when activated. The distance from the chamber floor to the magazine was 2cm. Above the response keys, a matrix of 256 Lights Emitting Diodes (LEDs) was placed (8 rows and 32 columns). Every turning on and off of a single LED was accompanied by a click. The dimensions of the matrix were 3.2cm wide, 1.3cm tall and 12.8cm long and it was parallel to the ceiling and floor of the chamber. Each LED had a diameter of 0.3cm and functioned, hereafter, as a token. The matrix was controlled by a microcontroller outside of the experimental chamber that received electric pulses from the chamber. All experimental conditions were programmed in the software MED PC-IV. in a computer running Windows XP.

Procedure

Pre-training

Subjects were exposed to magazine training sessions in which a variable time 30-s (VT 30-s) was in effect. Sessions began with the placement of the subject in the chamber and all the lights were off. When the VT elapsed, the magazine was activated for 3-s. An intertrial interval of 20-s was in effect before another value for the VT was selected. Sessions ended after 32 food deliveries. This condition lasted until the subjects ate from the magazine consistently, evaluated by visual inspection. The rationale was to ensure that subjects would eat reliably from the magazine. Then, the subjects were exposed to one session with all the chamber lights on (keys, tokens and chamber light) but no contingencies were programmed. This was to ensure that the lights would not be novel stimuli. After that, training with the tokens began.

Illumination of tokens

The first illuminated LED was located at the intersection of the first row and first column. One inoperative LED (column) was always left between consecutive operative LEDs. Similarly, a complete row of inoperative LEDs was always left after a row with operative LEDs. Therefore, a total of 64 LEDs could be used. LEDs were turned off in the reverse order (the LED illuminated last was the first one turned off).

Training in token systems

Each token could be exchanged for access time to grain. During the first four days, sessions began with the illumination of the chamber light and 32 tokens. The interval between token illuminations was 0.3s. After the illumination of all the tokens, the center key was illuminated in blue and remained on for the remainder of the session. The chamber light was turned off. A VT 60-s was in operation and, when that value had elapsed, the last token was turned off and 0.5 s afterwards, the magazine was activated for 3-s. After 15-s had elapsed, another VT value was selected and the process repeated until all the tokens were exchanged. This procedure was in effect until the subject ate for five consecutive occasions from the magazine.

In the following two days, sessions began with the illumination of the chamber light and 32 tokens. After the illumination of the tokens, the chamber light went off and an intertrial interval (ITI) with an average of 60-s began. When the ITI had elapsed, the center key, hereafter deemed the exchange key, was illuminated in blue. The exchange key went off after a response was made or after 8-s had elapsed since its illumination; then the last token was exchanged and the magazine was activated for 2-s. When the magazine was deactivated, another value for the ITI was selected and the cycle repeated until all the tokens were exchanged. Two more sessions were conducted, but all token exchanges were contingent upon a response to the exchange key and sessions ended when all the tokens were exchanged.

Assessment of preference for magnitude

After token training, subjects were exposed to a choice between two alternatives with no delay. One alternative had a magnitude of five tokens and the other alternative delivered one token. The side keys were used for the presentation of the alternatives. Sessions consisted of 40 trials divided into 10 blocks. Each block consisted of two forcedchoice trials and two free-choice trials. The forced trials were always the first and second. The start of each trial was signaled by the illumination of the center yellow key. A response turned off the center key and illuminated one or both side keys depending on whether it was a forced or a free trial, respectively. A red key was associated with the smaller alternative and the green key was associated with the larger alternative. A response to an illuminated side key turned off that key (or both if it was a free trial) and delivered the corresponding number of tokens. If the smaller alternative was presented in the first forced trial on one side (left or right), the second forced trial was the larger alternative on the opposite side. During free trials, the smaller alternative had a 0.5 probability of appearing at either side and the larger alternative appeared at the opposite side.

After the delivery of the last token in a trial, the center key was illuminated in blue signaling the beginning of the exchange period. During this period, a response to the exchange key turned off the last token and 0.5-s after that the magazine was activated for 1.5-s. The total time of access to the food was 1.5-s and 7.5-s for the smaller and larger alternative, respectively. When all the tokens were exchanged, the exchange key went off and an inter trial interval (ITI) began. The duration of the ITI was adjusted so that each trial lasted exactly 70-s

In cases where the exchange period and the 70-s overlapped, the exchange period was lengthened to the necessary time and the next trial began immediately after the finalization of the exchange period. If the exchange period ended before the conclusion of the 70-s interval, the lights remained off until the start of the next trial signaled by the illumination of the center key in yellow. This procedure was in effect until the subjects chose the larger alternative in at least 80 % of the free-choice trials for five consecutive sessions. With this procedure, we wanted to establish whether the subjects could discriminate between a small and a larger magnitude of tokens and had a preference for the larger number of tokens, without which a delay discounting procedure would be pointless.

Adjusting amount procedure

After the completion of the previous assessment, an adjustingamount delay discounting procedure was carried out. This procedure was similar to the one used by Green et al. (2004) with the difference that tokens exchangeable for food were used as rewards instead of food.

Sessions consisted of 10 blocks of four trials: two forced- and two free-choice trials. The sequence of the trials was the same as in the previous assessment. The green key was associated with the larger-later reinforcer and the red key was associated with the smaller-sooner alternative.

The delay to the larger-later alternative was varied across conditions but its magnitude was always five tokens. The magnitude of smaller-sooner alternative varied in each block of trials depending on the previous choice of the subjects, but it was always delivered immediately after its choice. The rule for the adjustment of the magnitude for the smaller-sooner alternative was as follows: if the larger-later alternative was chosen in two free trials, the smaller-sooner magnitude increased by one token in the next block; if the smaller-sooner alternative was chosen in two free trials, the smaller-sooner magnitude was decreased by one token. If both options were chosen, the magnitude remained unchanged. The magnitude of the adjustable option had the restriction that it could not be less than one token or greater than five tokens. At the beginning of each delay condition, the smaller-sooner magnitude was set to one token. The value at the beginning of each successive session was the same as the last block of the previous session.

If a subject failed to respond in any link (initial link or choice link) after 70-s, the trial was repeated until a choice was made. In each condition, a different delay to the larger-later alternative was programmed. The programmed delays were 0-, 1-, 2-, 4- and 8-s and were scheduled between the choice for larger-later alternative and the delivery of the tokens.

The delay to the exchange period was manipulated across phases. In the first phase, Exchange 1 to 1, the exchange of the tokens was enabled after every trial. In the second phase, Exchange 4 to 1, the exchange was enabled after the completion of four trials. The third phase was identical to the first. Table 1 shows the order of the delays within each phase per subject.

Table 1
Experimental design and sequence of delay conditions

Note. The number inside and outside of the parentheses shows the order of exposure and the number of sessions for each delay condition, respectively.

Every session was divided into two parts of five blocks each and the mean value of the smaller-sooner magnitude was calculated for each mid-session. Indifference points were computed using the average value of the adjustable alternative in the last 50 blocks for each delay condition and were considered stable when: a) at least 20 sessions had been run, b) the mean of each of the last 10 mid-sessions were not greater than or less than two tokens of the general mean of those 10 mid-sessions and c) data points for the 10 mid-sessions did not show a trend. The absence of a trend was assessed using the C statistical criterion for time series (Tryon, 1982; p> 0.2, two tailed test).

Results

Indifference points as function of the delay are shown in Figure 1. In the first phase (Exchange 1 to 1), a decrease in indifference points as the delay to the reward increased was observed except for the first point of Subject 1. Additionally, there is a steeper decrease in the points when the delay changes from 0 s to 1 s compared to the rest of the delays. In the second phase, Exchange 4 to 1, the points show a rapid devaluation of the larger reward even when the delay to that alternative was 0 s and this pattern is maintained with all the subsequent delays for Subject 2 and Subject 3. The third column shows the return to the first phase, Exchange 1 to 1. Here, indifference points increased compared to the previous phase.

Subject 1 shows a nonsystematic data pattern according to Johnson and Bickel’s algorithm (2008) which states that a pattern of data is not systematic if 1) any point is greater than the preceding by a 20% of the larger magnitude (a difference of 1 token in this case) or 2) if the last point is not less than the first point by a least 10% of the larger magnitude (0.5 token in this case). By these rules, Subject 1 showed nonsystematic data in 30 % of the points in both Exchange 1 to 1 phases combined according to the criterion 1 and 20 % in the Exchange 4 to 1 phase. Data for Subject 2 was considered nonsystematic in the Exchange 4 to 1 phase.

The last row shows the mean indifference points in each phase. At this level, a more orderly pattern of data may be seen in the first and second Exchange 1 to 1 phases. The Exchange 4 to 1 phase shows a nonsystematic pattern of data according to the Criterion 2.

Figure 1
Mean number of tokens for the adjusting alternative in stability as a function of delay

Note. First, second and third row correspond to the Subject 1, 2 and 3, respectively. The last row shows the mean number of tokens for the three subjects. Each column corresponds to each phase.

Discussion

The objective of the present study was to evaluate if, and to what extent, tokens are discounted by delay and to assess the effect of delaying the exchange period. To our knowledge, this is the first study that evaluated delay discounting in nonhuman animals using non-directly consumable objects as rewards. Along three phases, the number of trials needed for enabling the exchange period was manipulated. In Phases 1 and 3 (Exchange 1 to 1), the exchange period was enabled after each trial. In Phase 2 (Exchange 4 to 1), the exchange period was enabled after four trials.

During Exchange 1 to 1, the patterns of discounting were similar to previous studies that used food as a reinforcer, and show the same tendency of diminished value as a function of the delay to the reward with a faster decrease at smaller delays. During Exchange 4 to 1, however, there was a rapid devaluation of the reward even when the delay to the larger-later magnitude was 0 s and two factors may have contributed to this result.

In the exchange component of the token system, there were fewer exchanges for food in Exchange 1 to 1 and Exchange 4 to 1, because the maximum number of food deliveries at each exchange period in the Exchange 1 to 1 was five and the minimum number of food deliveries was eight in the Exchange 4 to 1. This situation can be seen as analogous to a bundled reward, and according to the Hyperbolic Value Added Model (Mazur, 2001), the value of a bundled reward has diminished gains for each reward earned. In other words, if several rewards are delivered as a bundle, the last ones would have less impact in value. In the context of the tokens, the difference between the five foods in the Exchange 1 to 1 would not be so different from the eight food deliveries in the Exchange 4 to 1. In addition, there might be an unintended delay associated with the larger-later alternative because there was an interval between the illuminations of each token. So, the larger the number of tokens delivered implied a longer wait for the reward. If we add the condition that the choice of the larger-later alternative over the smaller-sooner would delay the food more, it is reasonable to expect less choices of the larger-later in the Exchange 4 to 1.

Nevertheless, if the unintended delay associated with the largerlater alternative were the sole variable responsible for preference, the same should have been observed during Exchange 1 to 1. The second factor that may have added value to the smaller-sooner alternative is the accumulation of tokens from previous trials. Because of the tokens accumulated during the preceding forced-choice trials, the difference in magnitudes between alternatives becomes smaller, and the delay to the reward may have a greater impact on choice. These factors stress the importance of the delay to the exchange as a variable that can control behavior even when other alternatives offer a larger magnitude of reinforcement (Hackenberg & Vaidya, 2003) and of the precise arrangements in the way that rewards are earned.

A complementary hypothesis is that a common delay to food was added to both alternatives (the three trials before the trial ending with food deliveries) in the Exchange 4 to 1 phase. A study with these characteristics was presented by Calvert et al. (2011) and they found that, when both alternatives had a common delay, the devaluation of the reward was steeper if the common delay (for both alternatives) and the unique delay (for each alternative) were not signaled differentially. Their results are similar to the one we found in the sense that subjects discounted steeply even at short delays in the Exchange 4 to 1 phase. This may suggest that the accumulation of tokens and the addition of a common delay to both alternatives have a similar impact in the devaluation of a reward.

Regarding nonsystematic data, the algorithm by Johnson and Bickel (2008) is commonly used with human participants. Yet, the criteria of the algorithm are not based on the species being evaluated but on the data. So, in principle it may be adopted to any set of data set as long as it is a discount function. In the present study, the percentage of nonsystematic data is 33% and 11% for Criteria 1 and 2, respectively. Similar studies (non-human subjects, adjusting amount procedure, at least four points in the delay discounting function) have found systematic data (Calvert et al., 2010. Exp. 2; Calvert et al., 2011. Exp. 2; Green et al., 2010; Green et al., 2004; Holt & Wolf, 2019; Oliveira et al., 2013. Exp. 2; Reynolds et al., 2002; Richards et al., 1997 Exp. 1 and 3; Woolverton et al., 2007) but others have found nonsystematic data according to Criterion 1 (Calvert et al., 2010. Exp. 1; Green et al., 2004; Oliveira et al., 2013. Exp. 1; Richards et al., 1997. Exp. 2) and Criterion 2 (Calvert et al., 2011. Exp. 1; Holt et al., 2018). The range of nonsystematic data of these other studies was 3.85% to 40% for Criterion 1 and 5.71% to 25% for Criterion 2, so the results we found in the present study are within the expected range. It may be argued that the difference found here is larger than in the majority of the previous studies and therefore the procedure is not suitable for the assessment of delay discounting. However, due to the absence of other studies with token systems in the delay discounting literature, it is not possible to draw a stronger conclusion. Further investigation is needed to evaluate the aspects of validity and reliability of the data with this procedure.

It may also be argued that the tokens were just an irrelevant feature of the experimental situation and the delay to the food was the only variable affecting choice and hence their similitude with the other procedures without tokens. However, the pigeons were sensitive to the magnitudes when they were exposed to the condition of choice with a 0 s delay in both alternatives (prior to the delay discounting task). The fact that pigeons chose consistently a larger magnitude partially suggests a control by magnitude and therefore by the tokens was in effect. The third experiment by Jackson and Hackenberg (1996) also suggests that the tokens play a role in the performance of the pigeons. In one condition named NLED, they assessed choice between one food delivery and three food deliveries both after 6 seconds. In another condition named LED, the choice was between one token delivered immediately or three tokens delivered after a delay of six seconds. The delay to the exchange period was equal between alternatives. The results showed that in the NLED condition the subjects chose near indifference between the alternatives whereas, in the LED condition, subjects showed a consistent preference for the alternative that was signaled with three tokens. Thus, the tokens enhanced choice for the alternative with the greater magnitude. These results in conjunction with the preference for larger magnitude found in the present study suggest tokens are not being ignored by the subjects.

Nonetheless, the choice of a larger number of tokens over a smaller magnitude might be accounted not only by the number of tokens delivered and therefore their greater reinforcer properties (relatively to the smaller alternative) but also by the covariation of the number of tokens and food of that alternative. This covariation is a critical point because it confounds the part of the tokens that is generating an effect due to itself or due to the relation that they maintain with food or to the food alone. However, this covariation between food and tokens is a defining property of a token and may be difficult to separate them from the effects of the primary reinforcer (Bullock & Hackenberg, 2015; Hackenberg, 2018).

One of the limitations of this study is the programmed delays to the primary rewards (food). The delay to the exchange period was manipulated across phases but the exchange delay was always unequal between alternatives, and this may obscure the conclusions about the role that the delay to the food may have in the control of behavior (Hackenberg & Vaidya, 2003). Further investigation is needed to evaluate the role of the delay to the exchange period on discounting.

The use of token schedules reduces the differences in the protocols employed with humans and nonhumans. By doing so, we will be in a better position to find the processes responsible for choice in intertemporal situations and to provide a better understanding of behavior management and behavioral economics. One characteristic of the tokens is that they allow the study of choice with non-consumable rewards either with humans or non-humans. A ubiquitous finding is that humans tend to discount consumable reward at a higher rate than non-consumable (Odum et al., 2020). The use of token systems would allow to study the generality of this results in a greater number of species.

One question that remains unanswered is the process by which a token becomes a conditioned reinforcer and a complete evaluation of its role is beyond the scope of the study. For example, being exchanged for a primary reinforcer may be necessary for a token to become a conditioned reinforcer, but it is yet unclear whether an operant contingency is also needed or only the temporal relation between the token and reinforcer (a pavlovian relation).

Material suplementario

References

Ainslie, G., & Herrnstein, R. J. (1981). Preference reversal and delayed reinforcement. Animal Learning & Behavior, 9(4), 476–482. https://doi.org/10.3758/BF03209777

Andrade, L. F., & Hackenberg, T. D. (2017). Substitution effects in a generalized token economy with pigeons. Journal of the Experimental Analysis of Behavior, 107(1), 123–135. https://doi.org/10.1002/jeab.231

Bullock, C. E., & Hackenberg, T. D. (2015). The several roles of stimuli in token reinforcement. Journal of the Experimental Analysis of Behavior, 103(2), 269–287. https://doi.org/10.1002/jeab.117

Calvert, A. L., Green, L., & Myerson, J. (2010). Delay Discounting of Qualitatively Different Reinforcers in Rats. Journal of the Experimental Analysis of Behavior, 93(2), 171–184. https://doi.org/10.1901/jeab.2010.93-171

Calvert, A. L., Green, L., & Myerson, J. (2011). Discounting in pigeons when the choice is between two delayed rewards: Implications for species comparisons. Frontiers in Neuroscience, 5(AUG), 1–10. https://doi.org/10.3389/fnins.2011.00096

Eppolito, A. K., France, C. P., & Gerak, L. R. (2013). Effects of acute and chronic morphine on delay discounting in pigeons. Journal of the Experimental Analysis of Behavior, 99(3), 277–289. https://doi.org/10.1002/jeab.25

Evenden, J. L., & Ryan, C. N. (1996). The pharmacology of impulsive behaviour in rats: The effects of drugs on response choice with varying delays of reinforcement. Psychopharmacology, 128(2), 161-170. https://doi:10.1007/s002130050121

Farrar, A. M., Kieres, A. K., Hausknecht, K. A., De Wit, H., & Richards, J. B. (2003). Effects of reinforcer magnitude on an animal model of impulsive behavior. Behavioural Processes, 64(3), 261–271. https://doi.org/10.1016/S0376-6357(03)00139-6

Fawcett, T. W., McNamara, J. M., & Houston, A. I. (2012). When is it adaptive to be patient? A general framework for evaluating delayed rewards. Behavioural Processes, 89(2), 128–136. https://doi.org/10.1016/j.beproc.2011.08.015

Grace, R. C., Sargisson, R. J., & White, K. G. (2012). Evidence for a magnitude effect in temporal discounting with pigeons. Journal of Experimental Psychology: Animal Behavior Processes, 38(1), 102–108. https://doi.org/10.1037/a0026345

Green, L., Myerson, J., & Calvert, A. L. (2010). Pigeons’ discounting of probabilistic and delayed reinforcers. Journal of the Experimental Analysis of Behavior, 94(2), 113–123. https://doi.org/10.1901/jeab.2010.94-113

Green, L., Myerson, J., & Mcfadden, E. (1997). Rate of temporal discounting decreases with amount of reward. Memory & Cognition, 25(5), 715–723. https://doi.org/10.3758/BF03211314

Green, L., Myerson, J., & Ostaszewski, P. (1999). Discounting of delayed rewards across the life span: age differences in individual discounting functions. Behavioural Processes, 46(1), 89–96. https://doi.org/10.1016/S0376-6357(99)00021-2

Green, L., Myerson, J., Holt, D. D., Slevin, J. R., & Estle, S. J. (2004). Discounting of delayed food rewards in pigeons and rats: is there a magnitude effect? Journal of the Experimental Analysis of Behavior, 81(1), 39–50. https://doi.org/10.1901/jeab.2004.8139

Green, L., Myerson, J., Oliveira, L., & Chang, S. E. (2013). Delay discounting of monetary rewards over a wide range of amounts. Journal of the Experimental Analysis of Behavior, 100(3), 269–281. https://doi.org/10.1002/jeab.45

Hackenberg, T. D. (2005). of Pigeons and People : Some Observations on Species Differences in Choice and Self-Control. Brazilian Journal of Behavior Analysis, 1(2), 135–147.

Hackenberg, T. D. (2009). Token reinforcement: a review and analysis. Journal of the Experimental Analysis of Behavior, 91(2), 257– 286. https://doi.org/10.1901/jeab.2009.91-257

Hackenberg, T. D. (2018). Token reinforcement: Translational research and application. Journal of Applied Behavior Analysis, 51(2), 393–435. https://doi.org/10.1002/jaba.439

Hackenberg, T. D., & Vaidya, M. (2003). Determinants of pigeons’ choices in token-based self-control procedures. Journal of the Experimental Analysis of Behavior, 79(2), 207–218. https://doi.org/10.1901/jeab.2003.79-207

Holt, D. D., & Wolf, M. R. (2019). Delay discounting in the pigeon: In search of a magnitude effect. Journal of the Experimental Analysis of Behavior, 111(3), 436–448. https://doi.org/10.1002/jeab.515

Holt, D. D., Wolf, M. R., & Skytta, R. D. (2018). Towards validation of delay discounting in the pigeon. Journal of the Experimental Analysis of Behavior, 110(3), 394–411. https://doi.org/10.1002/jeab.470

Jackson, K., & Hackenberg, T. D. (1996). Token reinforcement, choice, and self-control in pigeons. Journal of the Experimental Analysis of Behavior, 66(1), 29-49. https://doi:10.1901/jeab.1996.66-29

Jimura, K., Myerson, J., Hilgard, J., Braver, T. S., & Green, L. (2009). Are people really more patient than other animals? Evidence from human discounting of real liquid rewards. Psychonomic Bulletin & Review, 16(6), 1071–1075. https://doi.org/10.3758/PBR.16.6.1071

Johnson, M. W., & Bickel, W. K. (2008). An algorithm for identifying nonsystematic delay-discounting data. Experimental and Clinical Psychopharmacology, 16(3), 264-274. https://doi:10.1037/1064-1297.16.3.264

Kirby, K. N., Petry, N. M., & Bickel, W. K. (1999). Heroin Addicts Have Higher Discount Rates for Delayed Rewards Than NonDrug-Using Controls, 128(1), 78–87. https://10.1111/j.1360-0443.2003.00669.x

Krebs, C. A., Reilly, W. J., & Anderson, K. G. (2016). Reinforcer magnitude affects delay discounting and influences effects of damphetamine in rats. Behavioural Processes, 130, 39–45. https://doi.org/10.1016/j.beproc.2016.07.004

Logue, A. W., Tobin, H., Chelonis, J. J., Wang, R. Y., Geary, N., & Schachter, S. (1992). Cocaine decreases self-control in rats: a preliminary report. Psychopharmacology, 109(1-2), 245–247.

Madden, G. J., Petry, N. M., Badger, G. J., & Bickel, W. K. (1997). Impulsive and self-control choices in opioid-dependent patients and non-drug-using control patients: Drug and monetary rewards. Experimental and Clinical Psychopharmacology, 5(3), 256–262. https://doi.org/10.1037/1064-1297.5.3.256

Mazur, J. E. (2001). Hyperbolic value addition and general models of animal choice. Psychological Review, 108(1), 96-112. https://doi:10.1037/0033-295x.108.1.96

McKerchar, T. L., & Renda, C. R. (2012). Delay and Probability Discounting in Humans: An Overview. The Psychological Record, 62(4), 817–834. https://doi.org/10.1007/BF03395837

Mellis, A. M., Woodford, A. E., Stein, J. S., & Bickel, W. K. (2017). A second type of magnitude effect: Reinforcer magnitude differentiates delay discounting between substance users and controls. Journal of the Experimental Analysis of Behavior, 107(1), 151–160. https://doi.org/10.1002/jeab.235

Mitchell, S. H. (2014). Assessing delay discounting in mice. Current Protocols in Neuroscience, 1–12. https://doi.org/10.1002/0471142301.ns0830s66

Odum, A. L. (2011). Delay discounting: Trait variable? Behavioural Processes, 87(1), 1–9. https://doi.org/10.1016/j.beproc.2011.02.007

Odum, A. L., Becker, R. J., Haynes, J. M., Galizio, A., Frye, C. C., Downey, H., Friedel, J. E., & Perez, D. M. (2020). Delay discounting of different outcomes: Review and Theory. Journal of the Experimental Analysis of Behavior, 113(3), 657-679. https://doi:10.1002/jeab.589

Oliveira, L., Calvert, A. L., Green, L., & Myerson, J. (2013). Level of deprivation does not affect the degree of discounting in pigeons. Learning & Behavior, 41(2), 148–158. https://doi.org/10.3758/s13420-012-0092-4

Pietras, C. J., & Hackenberg, T. D. (2005). Response-cost punishment via token loss with pigeons. Behavioural Processes, 69(3), 343– 356. https://doi.org/10.1016/j.beproc.2005.02.026

Rachlin, H., Raineri, A., & Cross, D. (1991). Subjective probability and delay. Journal of the Experimental Analysis of Behavior, 55(2), 233–244. https://doi.org/10.1901/jeab.1991.55-233

Renda, C. R., Rung, J. M., Hinnenkamp, J. E., Lenzini, S. N., & Madden, G. J. (2018). Impulsive choice and pre-exposure to delays: IV. Effects of delay- and immediacy-exposure training relative to maturational changes in impulsivity. Journal of the Experimental Analysis of Behavior, 109(3), 587–599. https://doi.org/10.1002/jeab.432

Reyes-Huerta, H. E., & dos Santos, C. V. (2016). The absence of numbers to express the amount may affect delay discounting with humans. Journal of the Experimental Analysis of Behavior, 106(2),

Reynolds, B., De Wit, H., & Richards, J. B. (2002). Delay of gratification and delay discounting in rats. Behavioural Processes, 59(3), 157–168. https://doi.org/10.1016/S0376-6357(02)00088-8

Richards, J. B., Mitchell, S. H., de Wit, H., & Seiden, L. S. (1997). Determination of discount functions in rats with an adjusting-amount procedure. Journal of the Experimental Analysis of Behavior, 67(3), 353–366. https://doi.org/10.1901/jeab.1997.67-353

Rosati, A. G., Stevens, J. R., Hare, B., & Hauser, M. D. (2007). The Evolutionary Origins of Human Patience: Temporal Preferences in Chimpanzees, Bonobos, and Human Adults. Current Biology, 17(19), 1663–1668. https://doi.org/10.1016/j.cub.2007.08.033

Tobin, H., & Logue, A. W. (1994). Self-control across species (Columba livia, Homo sapiens, and Rattus norvegicus). Journal of Comparative Psychology, 108(2), 126–133. https://doi.org/10.1037/0735-7036.108.2.126

Tryon, W. W. (1982). A simplified time-series analysis for evaluating treatment interventions. Journal of Applied Behavior Analysis, 15(3), 423-429. https://doi:10.1901/jaba.1982.15-423

Turturici, M., Ozga, J. E., & Anderson, K. G. (2018). Pair Housing Alters Delay Discounting in Lewis and Fischer 344 Rats. The Psychological Record, 68(1), 61–70. https://doi.org/10.1007/s40732-018-0268-1

Woolverton, W. L., Myerson, J., & Green, L. (2007). Delay Discounting of Cocaine by Rhesus Monkeys. Experimental and Clinical Psychopharmacology, 15(3), 238–244. https://doi.org/10.1037/1064-1297.15.3.238

Notas

Table 1
Experimental design and sequence of delay conditions

Note. The number inside and outside of the parentheses shows the order of exposure and the number of sessions for each delay condition, respectively.

Figure 1
Mean number of tokens for the adjusting alternative in stability as a function of delay

Note. First, second and third row correspond to the Subject 1, 2 and 3, respectively. The last row shows the mean number of tokens for the three subjects. Each column corresponds to each phase.