######### Card Hero LETTERS #########
Letters to the editors

Vol. 6, NO. 2 / July 2021

To the editors:

The iterated prisoner’s dilemma had been well studied before the William Press and Freeman Dyson paper came out, and there was little anticipation that any significant new results would turn up.1 I had been introduced to game theory by my colleague Morton Davis, whose elementary survey I highly recommend.2 I remember my surprise and delight at reading Press and Dyson’s results, now well sketched by Press in his essay. While the mathematics was not especially sophisticated, I would not call these results at all minor. They were new, and unexpected, and excited considerable response.3

I would like to emphasize three points about the sorts of models described in Press’s article. First, the payoffs are long-term average payoffs. We imagine an infinite sequence of rounds. We average the payoffs of the first n rounds and then take the limit as n tends to infinity. In the memory-one case, the limits can be computed by using the Markov chain model Press describes. The important result is the payoffs of any finite collection of initial rounds have no effect on the limit. Of course, what we regard ourselves as modeling is a large, though finite and indefinite, number of rounds. Nonetheless, the effect of the early rounds is negligible, although no longer zero. This means that the early rounds can be used to provide a player with data about the strategy that the opponent is adopting. That is, after a while each player has enough information to make a good statistical estimate of the opponent’s strategy.

Second, these strategies are not fixed. Unlike the computer tournaments where fixed strategies compete against one another, here we are thinking of real repeated play. After I have an estimate of what my opponent is doing, I may wish to change my own strategy in response and my opponent may change as well. After a while each of us detects the change and we make new estimates and new responses. This constitutes a kind of negotiation without communication and corresponds, I believe, to the theory of mind that Press regards as underlying the play in the iterated game. Eventually each of us commits to a strategy, and we each harvest our own long-term payoff.

Third, the payoffs should be thought of in utilities, and not in dollars or years off a prison sentence. That is, the numbers we use are proxies for the ordering of the desirability of the different outcomes. In their classic book, John von Neumann and Oskar Morgenstern devised a means to convert a preference ordering to a numerical scale.4 The details are not so important but what is important is that the numerical ordering should reflect the real preference order of the outcomes.

Let me illustrate all this by using John Nash’s ultimatum game, which Press mentions. Bob offers to Alice a division of 100 points. If she rejects the offer, then both get 0. Suppose Bob proposes (99, 1). We might expect Alice to reject this, accepting the loss of 1 to punish Bob for his outrageous offer. However, this means that Alice in fact prefers the (0, 0) outcome to the (99, 1) alternative. If, instead, the numerical values reflect the players’ actual preferences, then since 1 > 0, Alice prefers the (99, 1) offer to (0, 0) and so she accepts the offer. On the other hand, if the situation is one of repeated play, then Alice’s decision to reject the preferred alternative of (99, 1) is completely rational. We may imagine that she has a threshold, perhaps (60, 40), above which she will accept. By repeatedly rejecting the lower offers, she informs Bob that he must raise the offer. Similarly, Bob may have a level, perhaps (50, 50), below which he will not go. The pattern of offers and rejections allows the players to find a region of mutually acceptable offers. They then repeatedly use that offer to get the corresponding long-term payoff.

To understand the utility issue, consider how prisoner’s dilemma situations are actually dealt with. Suppose I walk out of prison having betrayed my comrade who receives the long sucker’s payoff sentence. As I walk down the street a week later, my ex-partner’s cousin walks up and shoots me in the head. Anticipation of such an event serves to reduce the desirability of the betrayal choice. It may become the least desirable outcome. If each of us has such threatening relatives, the effect is to change the utilities of the various alternatives, stabilizing the cooperative outcome which now becomes a Nash equilibrium. But the game is no longer really a prisoner’s dilemma.

This is the reason I am skeptical about some of the experiments concerning the prisoner’s dilemma. Some of these were actually done in prisons and I cannot imagine that a few packs of cigarettes is sufficient reward to brave the wrath of the other player whom you are going to meet later in the yard.5

The zero determinant (ZD) strategies discovered by Press and Dyson form a three-dimensional subset of the four-dimensional space of memory-one strategies. A memory-one strategy is a four-tuple of probabilities (pcc, pcd, pdc, pdd) where px is the probability of cooperating (playing c) if x was the outcome pair of the previous round. Parenthetically, the use of probabilities strictly between 0 and 1 requires that each player have a gadget which produces a 1 or a 0 with preset probabilities. Think of something like the old Magic 8-Ball. To play c with probability p, I set the device to produce 1 with probability p. I give it a shake and look. If a 1 turns up, I play c. Otherwise, I defect—playing d.

The classic tit-for-tat (TFT) strategy of Anatol Rapoport, given by (1, 0, 1, 0), is a ZD strategy. The work of Press and Dyson shows that the ZD strategies are easy to compute with. Examples of all the strategy types I want to consider occur among the ZD strategies.

Call a strategy agreeable if pcc = 1, firm if pdd = 0, and generous if pcd, pdd > 0. Thus, an agreeable strategy always cooperates in response to cc and a firm strategy always defects in response to dd. A generous strategy responds to an opponent’s defection by still cooperating with some positive probability.

Fixation at cc can occur if, and only if, both players use agreeable strategies, and, furthermore, this is the only way that both players can achieve the cooperative payoff (usually labeled R). Fixation at dd can occur if both players use a firm strategy. Observe that TFT is both agreeable and firm and thus either fixation possibility can occur when both play TFT. In addition, there is a third possibility, namely alternating cd and dc. Which happens will depend on initial play or if errors occur during the iteration.

Press and Dyson observed that there are ZD strategies whereby a player can control the opponent’s payoff independent of the opponent’s play. Examples of these had been earlier described and called equalizer strategies.6 However, the greatest interest has focused on the collection of firm ZD strategies which Press and Dyson discovered and labeled coercive.

If Bob plays a coercive strategy, then no matter how Alice plays she receives less than he does. Any new strategy which increases her score, increases his by even more. Furthermore, her best reply to his coercive strategy, that is, a play which gives her the maximum possible payout against his strategy, yields to him a larger payout than the cooperative payoff R. The sum of the two payoffs is always bounded by 2R and so if one player receives more than R, the other player receives less than R.

I want to contrast this with good strategies.7

If Alice plays a good strategy, which is a type of agreeable strategy, then the only way that Bob can receive a payoff of at least R is by playing in such a way that both of them get R. This outcome is assured if Alice’s strategy is generous as well as good and if Bob also plays generous and good. TFT is an example of a good strategy which is not generous. Generous TFT, (1, p, 1, p) with p positive and sufficiently small, is both good and generous. If p is too large, i.e., if the strategy is too generous, then it is no longer good and can be exploited by the opponent. The precise value of the threshold depends on the payoff values.8

Now consider negotiation as play proceeds. Bob’s threat with a coercive strategy is to hold out no matter what Alice does until she is forced to give in. This is what Dyson meant by “go to lunch” as a way of ignoring alternatives proposed by Alice. Presumably she will then do the best she can, giving Bob a payoff above R.

Notice that if both players use coercive strategies we have a classic game of chicken with both receiving terrible payoffs until one of them cracks and submits to the other.

But now what happens if Alice plays generous and good against Bob’s coercive strategy? She is doing worse than he, but both are below R. Her threat is to hold out until he submits and switches to generous and good, at which point they will both receive R.

We are left with a matter of opinion as to which threat is stronger. Press favors the coercive strategy, while I believe that Alice’s threat is more credible. By submitting to Bob, she will not receive a payoff of R and by holding she can keep both of them below R, whereas if he switches they can both get R.


  1. William Press and Freeman Dyson, “Iterated Prisoner’s Dilemma Contains Strategies that Dominate any Evolutionary Opponent,” Proceedings of the National Academy of Sciences 109, no. 26 (2012): 10,409–13, doi:10.1073/pnas.1206569109. 
  2. Morton Davis, Game Theory: A Nontechnical Introduction (Mineola, NY: Dover Publications, Inc., 1983). 
  3. See for example, Alexander Stewart and Joshua Plotkin, “Extortion and Cooperation in the Prisoner’s Dilemma,” Proceedings of the National Academy of Sciences 109, no. 26 (2012): 10,134–35, doi:10.1073/pnas.1208087109; Ethan Akin, “What You Gotta Know to Play Good in the Iterated Prisoners Dilemma,” Games 6, no. 3 (2015): 175–90, doi:10.3390/g6030175; Ethan Akin, “The Iterated Prisoner’s Dilemma: Good Strategies and Their Dynamics,” in Ergodic Theory, Advances in Dynamical Systems, ed. Idris Assani (Berlin: De Gruyter, 2016): 77–107; Ethan Akin, “Good Strategies for the Iterated Prisoners Dilemma: Smale vs Markov,” Journal of Dynamics and Games 4, no. 3 (2017): 217–53, doi:10.3934/jdg.2017014; and Christian Hilbe, Martin Nowak, and Karl Sigmund, “The Evolution of Extortion in Iterated Prisoner’s Dilemma Games,” Proceedings of the National Academy of Sciences 110, no. 17 (2013): 6,913–18, doi:10.1073/pnas.1214834110. 
  4. John von Neumann and Oskar Morgenstern, Theory of Games and Economic Behavior (Princeton, NJ: Princeton University Press, 1944). 
  5. I should mention that there are some technical issues with the application of the von Neumann–Morgenstern utility theory to the prisoner’s dilemma. These are dealt with in Akin, “What You Gotta Know.” 
  6. Maarten Boerlijst, Martin Nowak, and Karl Sigmund, “Equal Pay for All Prisoners,” MAA Monthly 104, no. 4 (1997): 303–305, doi:10.1080/00029890.1997.11990641. 
  7. Examples of good strategies can be found in Akin, “What You Gotta Know,” and Akin, “The Iterated Prisoner’s Dilemma: Good Strategies and Their Dynamics.” 
  8. See Akin, “The Iterated Prisoner’s Dilemma: Good Strategies and Their Dynamics.” 

Ethan Akin is Professor of Mathematics at The City College of New York.

More Letters for this Article


Endmark

Copyright © Inference 2025

ISSN #2576–4403