Custom Development

AI Challenge at the Summa Summit

Ben Northrop

At the Summa Summit this year, we tried out a new, hands-on coding activity that we dubbed "Coding Spaces". The general idea was to allow our technical consultants to get down and dirty (geeky!) - dabble in some new technologies, collaborate and program with others, and maybe even build something cool. Topics included Angular, Scala, Clojure, LEAP Motion, Salesforce, and Common Lisp to name a few. It was a ton of fun. I led one session, an AI Challenge, and this post is a recap of what our group did in case you're interested, or in case you'd like to try this challenge on your own (the source is included).

The Challenge: Prisoners Dilemma Tournament

The challenge we competed in was a Prisoners Dilemma tournament, with a twist of Survivor. A little background is probably in order! The Prisoners Dilemma is a canonical game in game theory, which is typically used to explain why people don't cooperate with each other, even when it seems like it's in their best interests to do so. The game gets its name from this story:

You and your buddy get arrested for a crime that you both committed. It's clear that the authorities want you to confess, so they split you up into separate interrogation rooms, and interrogate you individually. If you confess and your buddy doesn't, the cops will go easy on you, say 1 year in jail, but they'll throw the book at your buddy, say 4 years in jail. The reverse situation is true as well - if he confesses and you don't, he gets 1 year and you get 4. If you both confess, however, they go a little easier on both of you, say 3 years. If neither of you confess, they give you 2 years on some trumped up charge.

Of course you know it'd be better to both "stay mum", but when you look at the outcomes in the table below, it's clear that it's in your best interests to confess no matter what your buddy does in the other room. If he stays silent, you're better off confessing (1 year

 

  Player 2 Stays Silent Player 2 Confesses
Player 1 Stays Silent 2 years, 2 years 4 years, 1 year
Player 1 Confesses 1 year, 4 years 3 years, 3 years

 

In the 1980s, political scientist Robert Axelrod organized a tournament where participants of different backgrounds (mathematics, psychology, economics, etc.) created programs to play an iterated prisoners dilemma (i.e. round robin, multiple games) against each other. To everyone's surprise, the algorithm that won was the simplest of all, a "tit-for-tat" program devised by mathematician Anatole Rapaport. This tournament is often cited as a defense of cooperation in the large. Even though in small groups or in isolated instances being "mean" (in game theory parlance, "defecting") is the better strategy, over time it's much better to be "nice" (i.e. "cooperate").

This is a obviously a blazingly fast tour of the Prisoners Dilemma game, and I'm sure I didn't do it justice. I had read about this tournament initially in a book called the Origin of Virtue, and then in Axelrod's own The Evolution of Cooperation a while back, and had always been fascinated by it. Is it really the case that tit-for-tat wins out? How long does it take for being "nice" to pay off? What if you're the only "nice" person (program!) in a world of "meanies"?

Well, this is what we endeavored to find out in our Coding Spaces. We replicated the famous Axelrod tournament, and as you'll see, got some very interesting results!

Game Setup

I created a simple engine to pit different game playing agents against each other. All the participant needed to do was implement the following interface (in any JVM language - I used Groovy). If you want to follow along, you can download the source from GitHub here.

  package com.summa.summit.spd

  public interface Player { 

     /** Called at the beginning of each round */
     void onNewRound(int numGames)

     /** Called when your about to play a new opponent */
     void onNewOpponent(String opponentId)

     /** Choose to cooperate or defect against the other player */
     Decision play()

     /** Called after play() is called, and tells what your opponent did */
     void onGamePlayed(Decision opponentsDecision)

     /** If playing survivor mode, at the end of the round vote one player off */
     String voteOffIsland()

     /** Return the name of your program */
     String getName()

  }

Note that the Decision returned by the play() method is just a simple enum with two values: Cooperate and Defect. To give an example of a simple player, a very naive, always-be-"nice" strategy could look like this (in Groovy):

  package com.summa.summit.spd

  public class MrNiceGuy implements Player { 

    def opponents = [:]
    def currentOpponent = ""

    Decision play() {
      return Decision.Cooperate // always be nice!
    }

    String voteOffIsland() {
      return opponents.isEmpty() ? currentOpponent : opponents.max { it.value }.key
    }

    String getName() {
      return "Mr. Nice Guy"
    }

    void onGamePlayed(Decision opponentsDecision) {
      if(opponentsDecision == Decision.Defect) {
        opponents[currentOpponent]++ // vote off the meanest person
      }
    }

    void onNewOpponent(String opponentName) {
      currentOpponent = opponentName
      opponents << [(currentOpponent) : 0]
    }

    void onNewRound(int numGames) {
      currentOpponent = ""
      opponents = [:]
    }

  }

Tournament

Out of the box, the initial Prisoners Dilemma game has four built-in players...

Player Strategy
Mr. Nice Guy Always be nice
Joe Evil Always be evil
Rupert Random Randomly cooperate or defect
Bob Backstabber Always cooperate, except once in a while defect

 

If you're so inclined, once you download the game engine from GitHub (here), you could play just these guys against each other by running the following from the root directory:

> java -cp ./dist/groovy.jar;./spd.jar com.summa.summit.spd.SurvivorPrisonersDilemma

This should give you a main menu, like the following:

MAIN MENU
-----------------------------
1) View players
2) Add players
3) Remove player
4) Set # games per round (current = 10)
5) Set survivor mode (current = false)
6) Play!
7) Exit

Ok, back to the recap. On top of the default players, we had 7 "real" participants. Their strategies were, roughly:

Player Strategy
Frank Dux Tit-for-Tat except 1/9 defect + defect on final game
Adamo Count defects and cooperates. If defects > cooperates then defect. Also, defect every other move.
SAY WHAT AGAIN? Effectively, always defect.
SummaNoid 3000 Tit-for-Tat
El Santo Track +/- of defect/cooperate. Cooperate when opponent cooperates more than defects. (random component too).
Spaghetti Coder Cooperate when opponents overall cooperate percentage greater than defect percentage.
Sharbaugh Track % mean.

When percentMean Defect.

When percentMean > 75% ==> Defect.

When percentmean > 25% && Be randomly mean as much as they are. i.e. if they are mean 40% of the time, be mean 40% randomly.

 

Again, if you're so inclined, you can just run the following (on a Windows machine, sorry!) from the root to play all these players against each other:

> spd.bat

Results

When you run the tournament in basic (non-survivor) mode, what you'll find is that with low number of games (~100), the mostly or purely "evil" strategies always do the best:

SAY WHAT AGAIN!: 2,482
SpaghettiCoder: 2,484
Joe Evil: 2,486

It's only when you increase the number of games past about 1,000 that Sharbaugh, a opportunistic cooperator, begins to take the lead. Tagging not far behind, however, are still the evil strategies

SAY WHAT AGAIN!: 24,596
Joe Evil: 24,606
SpaghettiCoder: 24,628
Sharbaugh: 24,855

There are a few interesting observations to be gleaned here.

First, the Tit-for-Tat strategies never do well because they can't establish trust with anyone. SummaNoid 3000, a pure Tit-for-Tat wants to cooperate with Frank Dux, but as soon as Frank Dux randomly defects, SummaNoid 3000 never forgives, and they end up defecting against each other the rest of the way and miss out on the potential benefit of cooperation.

Meanwhile, the evil strategies thrive by preying on the blissfully, ignorant players (Mr. Nice Guy and Bob Backstabber). When you take these players out of the mix, however, the mean strategies fall to the bottom, and the cooperators like Sharbaugh, Frank Dux, and SummaNoid 3000 float to the top. Here's a sample 1,000-game round without the "dumb" strategies:

Rupert Random: 13,459
El Santo: 15,012
Adamo: 15,014
SAY WHAT AGAIN!: 1,5014
SpaghettiCoder: 15,020
Frank Dux: 15,041
SummaNoid 3000: 15,157
Sharbaugh: 15,443

In other words, when others are very nice but not very smart, it's easy to get away with being "mean"!

There was also an optional survivor mode component to the game, where at the end of each round each player could vote one other player "off the island". Strategies differed as to whom to vote off. Some voted off the "meanest", some the "nicest", and some whomever did best against them. Because many players had non-deterministic (random) strategies, there didn't seem to be any predictability in these games. Although El Santo and Sharbaugh seem to win more often (from what it seems), other players also win from time to time. It's not clear if there's a conclusion to draw from the survivor mode.

In the end, we could analyze and play this game for hours/days/weeks more, but alas the fun had to end, and we had to get back to real work. It was a ton of fun, and hopefully we'll do it again soon. Let me know if you want to use the source code here, or just share an AI challenge that you've played.

Ben Northrop
ABOUT THE AUTHOR

Distinguished Technical Consultant