Paper The following article is Open access

Evolutionary advantages of adaptive rewarding

and

Published 11 September 2012 © IOP Publishing and Deutsche Physikalische Gesellschaft
, , Citation Attila Szolnoki and Matjaž Perc 2012 New J. Phys. 14 093016 DOI 10.1088/1367-2630/14/9/093016

1367-2630/14/9/093016

Abstract

Our well-being depends on both our personal success and the success of our society. The realization of this fact makes cooperation an essential trait. Experiments have shown that rewards can elevate our readiness to cooperate, but since giving a reward inevitably entails paying a cost for it, the emergence and stability of such behavior remains elusive. Here we show that allowing for the act of rewarding to self-organize in dependence on the success of cooperation creates several evolutionary advantages that instill new ways through which collaborative efforts are promoted. Ranging from indirect territorial battle to the spontaneous emergence and destruction of coexistence, phase diagrams and the underlying spatial patterns reveal fascinatingly rich social dynamics that explain why this costly behavior has evolved and persevered. Comparisons with adaptive punishment, however, uncover an Achilles heel of adaptive rewarding, coming from over-aggression, which in turn hinders optimal utilization of network reciprocity. This may explain why, despite its success, rewarding is not as firmly embedded into our societal organization as punishment.

Export citation and abstract BibTeX RIS

Content from this work may be used under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

Responsible use of public goods and continual investment in the common pool are of paramount importance for sustainable development on a global scale. Losing sight of this and over-exploiting these goods for short-term benefits inevitably creates systemic risks that may lead to the 'tragedy of the commons' [1]. The public goods game captures the essence of the underlying social dilemma succinctly by requiring players to decide simultaneously whether or not they wish to bear the cost of cooperation and thus to contribute to the common pool. Regardless of their decision, each member of the group receives an equal share of public goods after the initial contributions are multiplied by a factor that takes into account the added value of collaborative efforts. Individuals benefit most by defecting, while the group is most successful if everybody cooperates. Historical evidence suggests that humans have developed remarkable other-regarding abilities to mitigate between-group conflicts [2], as well as helping each other by rearing one another's offspring [3]. However, while these issues might have sparked our cooperative behavior, it is mechanisms like kin and group selection as well as different forms of reciprocity [4] or other recently identified mechanisms [58] that have probably been instrumental in drawing out its full potential and solidifying it as one of the most distinguishable behavioral traits of mankind.

Reward and punishment [9] are also cited frequently as viable means for promoting the evolution of public cooperation, although punishment has received substantially more attention, as reviewed comprehensively in [10]. Related to that, it is important to note that recent research related to antisocial punishment [11, 12] and reward in particular [13] has questioned the aptness of sanctioning for elevating collaborative efforts and raising social welfare. Indeed, while the majority of previous studies addressing the 'stick versus carrot' dilemma concluded that punishment is more effective than reward in sustaining cooperation [9, 10], evidence suggesting that reward may be as effective as punishment and may lead to higher total earnings without potential damage to reputation [14] or fear of retaliation [15] is increasing rapidly. Moreover, in their recent paper [12], Rand and Nowak provide firm evidence that antisocial punishment renders the concept of sanctioning ineffective, and argue further that healthy levels of cooperation are likelier to be achieved through less destructive means.

Regardless of whether we place the burden of cooperation promotion on punishment [1619] or reward [2022], the problem with both actions is that they are costly. In particular, punishment implies paying a cost for another person to incur a cost, while rewards also obviously incorporate a cost to bear, but in this case for another person to experience a benefit. Cooperators who abstain therefore become 'second-order free-riders', and they can seriously challenge the success of sanctioning [2325] as well as rewarding [21]. Here we focus on the latter and take into account the fact that our willingness to reward others depends sensitively on the success of antisocial behavior. If defection is on the rise, we may feel more inclined to support cooperation by means of additional incentives in order to avert an impending social decline. On the other hand, if everybody is already cooperating such actions may appear superfluous. Moreover, there is a permanent tendency to eschew the costs that are associated with administering rewards. Inspired by these observations, we introduce a third strategy to the spatial public goods game to supplement the traditional cooperators and defectors, namely the so-called rewarding cooperators, and show that adaptive rewarding yields several evolutionary advantages that can overcome the 'second-order free-rider' problem. Compared to steady rewarding [21], for example, the cyclic dominance between the three competing strategies can be broken, which in turn leads to higher levels of cooperation and even to completely defector-free states. Nevertheless, punishment still outperforms rewarding, because it acts more coherently with network reciprocity. We thus arrive at interesting and partly counterintuitive conclusions that extend the existing theory of sanctioning and rewarding in structured populations [21, 2629], as well as supplementing the array of recently identified mechanisms that promote cooperation in public goods games, ranging from complex interaction networks and coevolution [3034] through diversity [3539] to the risk of collective failures [40] and selection pressure [41]. Before presenting the main results, however, we proceed with a detailed description of the studied spatial public goods game.

2. Spatial public goods game with adaptive rewarding

The game is contested by cooperators (sx = C), defectors (sx = D) and rewarding cooperators (sx = R), who initially populate the square lattice with equal probability. A player x plays the public goods game with its k = G − 1 = 4 interaction partners as a member of all the g = 1,...,G = 5 groups it belongs to. Both cooperating strategies contribute 1 to the public good, while defectors contribute nothing. The sum of all contributions in each group is multiplied by the factor r > 1, reflecting the synergetic effects of cooperation, and the resulting amount is equally divided among all group members irrespective of their strategy. Adaptive rewarding is accommodated by assigning each rewarding cooperator an additional parameter πx, which keeps score for the rewarding activity. While this parameter is initially zero, subsequently, whenever a defector succeeds in passing its strategy, all the remaining rewarding cooperators in all the groups containing the defeated player increase their rewarding activity by one, i.e. πx = πx + 1. The related costs increase accordingly. However, to maintain the latter is unwanted, and hence at every second round all rewarding cooperators decrease their rewarding activity by one, as long as πx ⩾ 0. The payoff of player x adopting sx = C in a given group g of size G is thus

Equation (1)

where NC, ND and NR are the number of other cooperators, defectors and rewarding cooperators in the group g, respectively. The sum runs across all the neighbors in the group, while πi is the actual rewarding activity of player i. The corresponding payoff of a rewarding cooperator at site x is

Equation (2)

while a defector, whose payoff is derived exclusively from the contributions of others, gets

Equation (3)

As it follows, each player adopting sx = C or sx = R is rewarded with an amount πiΔ/k from every rewarding cooperator, having rewarding activity πi, who is a member of the same group. At the same time, each rewarding cooperator bears the cost πiαΔ/k for every cooperator rewarded. Self-rewarding is excluded. Here Δ and α are important free parameters, determining the incremental step used for the rewarding activity and the cost of rewards, respectively. Note that α is actually the ratio between the cost of rewarding and the reward that is allotted to cooperators.

The stationary fractions of cooperators ρC, defectors ρD and rewarding cooperators ρR on the square lattice are determined by means of a random sequential update comprising the following elementary steps. Firstly, a randomly selected player x plays the public goods game with its partners as a member of all the five groups it belongs to. The overall payoff it obtains thereby is thus $P_{s_x} = \sum _g P_{s_x}^g$ . Next, one of the four nearest neighbors of player x is chosen randomly. This player y also acquires its payoff Psy, just as player x did previously. Finally, if sx ≠ sy player y imitates the strategy of player x with the probability $q=1/\{1+\exp [(P_{s_y}-P_{s_x})/K]\}$ , where K determines the level of uncertainty by strategy adoptions. Without loss of generality we set K = 0.5 [42], implying that better-performing players are readily imitated, but it is not impossible to adopt the strategy of a player performing worse. Each full Monte Carlo step of the game involves all players having a chance to adopt a strategy from one of their neighbors once on average. Depending on the proximity to phase transition points and the typical size of emerging spatial patterns, the linear system size was varied from L = 200–2000 and the equilibration required up to 106 full rounds of the game for the finite size effects to be avoided.

It is worth noting that this set-up enables us to directly compare the effectiveness of adaptive rewarding with steady rewarding efforts studied previously in [21]. While the simulation details are identical in both cases, in the steady rewarding model players adopting sx = R always reward every cooperator with a reward Δ/k and therefore bear the cost of rewarding αΔ/k. The initially set rewarding activity of rewarding cooperators πx = 1 never increases or decreases, while Δ simply determines the strength of rewards. As in the adaptive model, α determines just how costly rewards are. For further details we refer to [21], where the steady rewarding model was presented and studied in detail. Moreover, the outcome of the model studied here can also be compared to that obtained by means of adaptive punishment, as studied recently in [29]. The main difference is that while rewarding cooperators increase their rewarding activity to reward cooperators, punishing cooperators increase their punishing activity to punish defectors. In both cases a constant drift towards inactivity in terms of either punishment or reward is assumed. For further details we again refer to [29], while here we proceed with presenting the main results.

3. Results

3.1. Adaptive versus steady rewarding

Firstly, it is instructive to compare the impact of the newly introduced adaptive rewarding with that of steady rewarding at the same synergy factor r and the cost of reward α. As shown in figure 1 (left), the application of steady rewards yields a stable presence of defectors virtually across the whole span of Δ. This implies that no matter how strong the rewarding, defection cannot be eliminated. Here, rewarding cooperators enable the survival of cooperators, who act as second-order free-riders, who in turn provide easy targets for defectors, thus creating a closed loop of dominance. The persistence of defectors is thus a direct consequence of 'second-order free-riding', which emerges almost as soon as rewarding cooperators are able to invade defectors. Notably, though, there is a very narrow span of intermediate Δ values, at which steady rewarding is just successful enough to overcome defection, but not sufficiently so to enable cooperators to free-ride on the newly acquired success. For adaptive rewarding, however, the outcome is significantly different, as shown in figure 1 (right). To begin with, much lower values of Δ suffice to elicit the downfall of defectors. But even more importantly, second-order free-riding never gets a foothold in the population. Accordingly, as Δ increases rewarding cooperators gradually rise to complete dominance, despite the very low synergy factor (r = 2) governing the production of public goods. As demonstrated in [21], defector-free states are also attainable with steady rewarding, but require α < 0.05, i.e. very low administration costs for the rewards. Adaptive rewarding is thus more effective, predominantly because second-order free-riders fail to induce cycling dominance between the three competing strategies.

Figure 1.

Figure 1. Fractions of the three competing strategies in dependence on Δ, as obtained at r = 2 and α = 0.1 for steady (left) and adaptive (right) rewarding. While steady rewarding fails to eliminate defection due to the spontaneous emergence of cycling dominance that is brought about by second-order free-riding, adaptive rewarding suffers from no such drawbacks, gradually leading to complete dominance of rewarding cooperators as Δ increases.

Standard image

3.2. Phase diagrams and spatial patterns

The comparison with steady rewarding begets further explorations. In particular, the question is whether coexistence in the absence of cyclic dominance is nevertheless possible, and to what degree the results presented in figure 1 are robust to parameter variations. To address this systematically, we proceed with the presentation of characteristic phase diagrams and spatial patterns for different values of r.

Figure 2 features the full Δ − α phase diagram for r = 4.4. It is important to note that for such a relatively high value of r cooperators can survive in the presence of defectors without rewards, solely on the basis of spatial reciprocity. Accordingly, if rewarding is inefficient and costly, rewarding cooperators die out, leaving D + C as the stable two-strategy phase. As α decreases, however, rewarding cooperators become more and more competitive, which culminates in the outbreak of the stable D + R phase if Δ is sufficiently small. However, the discontinuous D + C → D + R transition is deceptive, in that it suggests that the competition is won or lost directly between cooperators and rewarding cooperators. This is in fact not the case because in the absence of defectors the relation between the two eventually becomes neutral. Which of C and R is the winner is therefore determined indirectly in terms of which of the two strategies is more successful in invading defectors. This indirect territorial battle is illustrated in figure 3, where in the upper row cooperators are more successful, while in the bottom row rewarding cooperators prevail. Note that in both cases cooperators and rewarding cooperators form compact clusters that are isolated from one another, which is a direct consequence of coarsening within a finite size domain. An identical phenomenon was reported in [28], where punishing cooperators and cooperators (second-order free-riders) engaged in indirect competition that was mediated by defectors, and where the winner was also determined based on the success and efficiency of this invasion. It is also worth emphasizing that the fraction of defectors changes insignificantly during this evolutionary process, regardless of whether finally the D + C or the D + R phase is reached, i.e. C(R) spread almost exclusively on the expense of R(C) (not shown). Thus, defectors truly just mediate the difference in efficiency between cooperators and rewarding cooperators.

Figure 2.

Figure 2. Full Δ − α phase diagram, as obtained at r = 4.4. Red dashed line depicts first-order phase transitions while blue solid lines depict continuous second-order phase transitions. Symbols mark the surviving strategies in the stationary state. Besides stable two-strategy D + C and D + R phases, the coexistence of all three competing strategies is also possible, where D and C form an alliance to compete against R. Notably, R(C) denotes the defection-free phase, but since in the absence of defectors strategies R and C become equivalent, the evolutionary process proceeds via slow logarithmic coarsening, as in the voter model [43]. However, since at the time of extinction of defectors the majority of players are rewarding cooperators, the system finally arrives at the R phase with a significantly higher probability. Notably, the dominance of strategy R becomes more evident if rare mutations are allowed, similar to what was reported for punishment in [44].

Standard image
Figure 3.

Figure 3. Indirect territorial competition between cooperators C (blue) and rewarding cooperators R (green) that is mediated by defectors D (red). In the upper row cooperators are more successful in invading defectors. Accordingly, rewarding cooperators are crowded out, leaving a stable D + C phase (top right) as the final stationary state. In the bottom row the situation is reversed. Rewarding cooperators outperform cooperators in the indirect competition against defectors, ultimately arriving at a stable D + R phase. The spatial segregation of indirectly competing strategies against a third strategy (defectors) creates the blueprint for a discontinuous phase transition, as marked by the red dashed D + C → D + R transition line in figure 2. Parameter values are: r = 4.4, Δ = 0.4, α = 0.9 (top row) and α = 0.5 (bottom row).

Standard image

Returning to the phase diagram presented in figure 2, it can be observed that as Δ increases, the discontinuous first-order phase transitions give way to a continuous transition line leading to the D + C + R coexistence. In contrast to the steady rewarding model, however, here the coexistence is not rooted in a dynamical invasion process of the form D → C → R → D, but rather it is due to a static equilibrium. For details concerning the dynamical invasion fronts brought about by steady rewarding we refer to [21], while here we elaborate further on the static equilibrium that is characteristic of adaptive rewarding. Figure 4 (left) features a cross-section of the phase diagram presented in figure 2 at Δ = 1.5. It can be observed that as the cost of rewarding (α) increases, the pure R phase transform into the three-strategy D + C + R phase, which for still higher values of α becomes the D + C phase. This indicates that as rewarding cooperators lose their ability to deter defectors, they also simultaneously enable the existence of cooperators. Since the value of r is sufficiently high, cooperators can coexist with defectors, in fact forming an alliance with them to compete against rewarding cooperators. The emergence of this alliance can also be inferred from the cross-section plot, where ρC and ρD change simultaneously as α increases but all the while their ratio remains approximately the same. A characteristic spatial pattern attesting to this fact is presented in figure 4 (right), where the D + C patches, which are locally similar to the stable morphology plotted in the upper right panel of figure 3, are surrounded by invading green R players. For the latter the cost of rewarding is simply too high to eliminate defectors, which brings along the second-order free-riders to form the D + C free-riding axis. It is also worth pointing out that as soon as rewarding cooperators die out, the fractions of D and C strategies ceases to vary, indicating that the two do indeed form an alliance that depends only on the value of r.

Figure 4.

Figure 4. Left panel features a cross-section of the phase diagram presented in figure 2, as obtained at Δ = 1.5. As α increases the rewarding cooperators first give way to a three-strategy D + C + R phase, while further on they completely subdue to the free-riding D + C alliance. Right panel depicts a characteristic snapshot of the three-strategy phase taken at Δ = 1.4 and α = 0.42, where the D + C alliance (red and blue) competes against invading R (green). Note that here the D + C patches are practically identical with the stationary state depicted in the top right panel of figure 3.

Standard image

If, however, the adaptive rewarding response is made more severe while at the same time rewards remain sufficiently affordable, the three-strategy phase terminates in a defector-free state, denoted as R(C) in figure 2. The absence of defectors makes cooperators and rewarding cooperators two equivalent strategies. Note that there is a constant drift towards non-rewarding if defectors fail to spread. This can be because they are altogether missing, as is the case in the R(C) phase, or because they are not within the immediate neighborhood of R and thus spread undetected. The evolutionary process proceeds without surface tension via logarithmical slow coarsening, as is characteristic of the universality class of the voter model [43]. In [44], albeit within a model based on steady punishment, we have demonstrated that the prevalence of 'active cooperators'—here players adopting strategy R—can be accelerated very effectively by means of rare mutations. The latter give rise to occasional defectors, who in turn mediate the winner similarly as described by the indirect territorial battle in the realm of the D + C → D + R transition.

If the added value of collaborative efforts is smaller, i.e. if r decreases, the phase diagrams change significantly, primarily because the D + C alliance is no longer possible. Figure 5 features two phase diagrams, as obtained for r = 3.5 (left) and r = 2 (right), where the differences if compared to figure 2 are clearly inferable. If the cost of rewarding is substantial, defectors are the only ones to survive. Naturally, the lower the value of r, the lower the value of α that still warrants defector dominance. The pure D phase becomes the two-strategy D + R phase by means of a continuous phase transition even at small r, if the value of Δ is not too small and the value of α is not too large. Continuing further towards more efficient rewarding may lead to the defector-free R(C) state, which has the same properties as described above for r = 4.4. As with the D + R phase, the extent of the R(C) region shrinks expectedly with decreasing r towards higher Δ and lower α.

Figure 5.

Figure 5. Full Δ − α phase diagrams, as obtained at r = 3.5 (left) and r = 2 (right). As in figure 2, blue solid lines depict continuous second-order phase transitions and symbols mark the surviving strategies in the stationary state. Since the synergetic effects of collaborative efforts are too weak, cooperators can no longer survive alone in the presence of defectors. Accordingly, the D + C phase is missing. Instead, as Δ increases, and if α is sufficiently small, the pure D phase gives way to the two-strategy D + R phase, which may further transform into the three-strategy D + C + R phase, but only if r is sufficiently large (left). At r = 2, for example, the three-strategy phase is no longer attainable on the considered Δ − α-plane. For small rewarding costs the defector-free R(C) phase is obtained (having the same properties as described for r = 4.4), although its area shrinks continuously as r increases.

Standard image

The second-order free-riders, on the other hand, can only survive in the three-strategy D + C + R phase, but its existence is limited to high values of Δ, intermediate α and still sufficiently high values of r, as can be observed by comparing the left and right panels of figure 5. Importantly, this three-strategy phase is qualitatively different from that described above for r = 4.4. As mentioned, because of lower r here cooperators cannot survive alone if surrounded solely by defectors. In fact, they can survive only where defectors and rewarding cooperators meet, i.e. along the D + R interfaces. The characteristic snapshot presented in figure 6 (right) confirms such a spatial configuration within the three-strategy phase. Details of its emergence are inferable from the cross-section of the phase diagram presented in figure 6 (left), which reveals that as α exceeds a critical value the efficiency of R weakens to the point where defectors are able to survive. The stable presence of a small fraction of cooperators, surviving at the D + R interfaces, accompanies this transition. Interestingly, as α is further increased the rewarding cooperators are not the first to become extinct, the second-order free-riders are—they fail to harvest the benefits of decreased rewarding efficiency. This indicates that, especially at small synergy factors, only a fine balance of all the other parameters enables the survival of second-order free-riding.

Figure 6.

Figure 6. Left panel features a cross-section of the phase diagram presented in figure 5 (left), as obtained at Δ = 2.0. As α increases the rewarding cooperators first give way to a three-strategy D + C + R phase, but further on persevere longer than second-order free-riders. At smaller values of r the latter require a delicate balance of conditions to survive, and can do so only along the D + R interfaces. Right panel depicts a characteristic snapshot of such a three-strategy phase, taken at Δ = 2.0 and α = 0.55. Small and rare patches of cooperators (blue) can survive where defectors (red) and rewarding cooperators (green) meet.

Standard image

3.3. Reward versus punishment

Finally, we address the 'stick versus carrot' dilemma within the realm of adaptive modeling. To do so, we first focus solely on the competition between defectors and rewarding cooperators. The question is, given a fixed cost of administering rewards α, what is the minimum required value of Δ to warrant the complete elimination of defectors? The answer is presented in figure 7 as a function of the synergy factor r (solid green line). Next, we answer the same question again, but replacing the rewarding cooperators with punishing cooperators. For consistency we use the same value of α, but accordingly it now represents the punishment cost rather than the cost of rewarding. The dashed gray line in figure 7, depicting the results for adaptive punishment, falls significantly below that obtained with adaptive rewarding. This leads to the conclusion that adaptive punishment, which we studied separately in [29], is more effective than adaptive rewarding in warranting defector-free states.

Figure 7.

Figure 7. Minimum required strength of the adaptive response Δm that warrants extinction of defectors in dependence on the synergy factor r at α = 0.4. The efficiency of adaptive rewarding (solid green line) is compared with that of adaptive punishment (dashed gray line), and it can be observed clearly that the latter is more effective.

Standard image

An intuitive explanation as to why this is the case is presented in figure 8, where we follow the evolution of interfaces separating defectors and punishing cooperators (top row) as well as defectors and rewarding cooperators (bottom row) under identical conditions. It can be observed that while rewarding cooperators are more successful in penetrating the area of defectors, the punishing cooperators advance less quickly but maintain a compact phase. For example, in the third snapshot from the left some rewarding cooperators have already reached the border of the lattice while punishing cooperators have yet to advance notably. However, rewarding cooperators have to pay a price for their over-aggressive invasion, namely an irregular interface that facilitates coexistence with defectors. Paradoxically, the less aggressive effect of punishment, which focuses on repairing the gaps in the phalanx rather than on advancing into the territory of defectors at any cost, turns out to be more effective at the end. Punishing cooperators rise to complete dominance with the aid of a near flawless support of network reciprocity [45]. Rewarding cooperators, on the other hand, sacrifice the latter to advance more quickly, but then fail to create the desired defector-free state. The Achilles heel of rewarding is thus an excessively aggressive invasion towards defectors that neglects the benefits of network reciprocity.

Figure 8.

Figure 8. Comparison of the evolution of interfaces separating punishing cooperators and defectors (top row), and rewarding cooperators and defectors (bottom row). It can be observed that while rewarding cooperators (green) advance more quickly into the territory of defectors (red), the punishing cooperators (gray) are relentlessly determined to keep their phase compact. Although the latter consequently advance less quickly, they ultimately succeed in completely eliminating the defectors. Rewarding cooperators, on the other hand, have to settle for coexistence. Note that darker shades of gray (green) denote players with higher punishing (rewarding) activity. The parameter values are the same for both cases, namely r = 2, Δ = 2 and α = 0.4, while the snapshots were taken at 1, 70, 300, 1000, 3000 and 6000 full Monte Carlo steps.

Standard image

4. Summary

We have shown that adaptive rewarding creates several evolutionary advantages by means of which public cooperation is promoted, and many of these go beyond those provided by steady rewarding [21]. Phase diagrams and the corresponding analysis of spatial patterns reveal that, if the added value of collaborative efforts is substantial, rewarding cooperators fight an indirect territorial battle with cooperators. The catalysts are the defectors, who essentially determine the winner depending on who can invade them more successfully. If the parameters determining adaptive rewarding are set properly, most notably if the rewards are sufficiently but not overly cheap and the response to invading defectors is sufficiently strong, the three competing strategies form a stable phase wherein defectors form a free-riding alliance with cooperators, i.e. second-order free-riders, to compete against rewarding cooperators. This three-strategy phase can also be observed at intermediate multiplication factors, although its extent in the phase diagrams shrinks continuously as the synergetic effects of cooperation are lowered, and accordingly the D + C alliance becomes increasingly difficult. It is also worth emphasizing that the spatial dynamics enabling the three-strategy phase changes. While for sufficiently high multiplication factors cooperators can survive alone in the presence of defectors, at lower values of r they can avoid extinction only in the immediate vicinity of D − R interfaces. If either the cost of rewarding is decreased further or the adaptive response is made even more severe, the coexistence is terminated, which leads to a defector-free state. Due to a constant drift towards non-rewarding in the absence of defectors able to spread successfully, cooperation and rewarding cooperation become equivalent strategies, and accordingly the victor is determined via slow logarithmic coarsening, as known from the voter model [43]. In the majority of cases, however, rewarding cooperators occupy the larger portion of the square lattice at the time defectors die out, and accordingly they are the more likely winners. This competition becomes even more biased in the presence of rare mutations.

Comparing the outcome with that elicited by adaptive punishment [29], we find that the supreme efficiency of rewarding cooperators in terms of invading defectors lessens the effectiveness of network reciprocity, which in turn means the more slowly advancing adaptive punishers have more effective and indeed the more successful strategy. We report that the minimum required fine to reach a defector-free state is much lower than the minimum reward needed to achieve the same goal. Thus, while a deep invasion of isolated players into the territory of defectors is better supported by rewarding, punishers can reach the collective target of eliminating defectors only by collaborating and 'holding the line'. The latter statement is reminiscent of an instruction that is frequently given to soldiers engaging in combat, highlighting the continued importance of network reciprocity despite additional, and locally more effective means to overcome defection. The uncovered Achilles heel of rewards may also provide further clues as to why order and justice in society are maintained by laws that focus on sanctioning rather than rewarding—although the former acts more subtly and requires a higher coherence between group members, given the same conditions it provides greater collective well-being for the whole community.

Acknowledgments

This research was supported by the Hungarian National Research Fund (grant K-101490) and the Slovenian Research Agency (grant J1-4055).

Please wait… references are loading.
10.1088/1367-2630/14/9/093016