real gl: bluffing

question: is bluffing ever a mathematically optimal strategy?

answer: I think yes, for some definition of optimal.

let's define 'optimal strategy' as the best strategy to have if your opponent knows what strategy you have. That is, they know what algorithm you will use to make decisions.

By this definition, the optimal strategy in rock-paper-scissors is to choose randomly. If you don't choose randomly, and your opponent knows the distribution over your choises, then they'll choose the thing that beats the thing you are most likely to pick.

Now consider a simple betting game. The deck has only four cards: the jack, queen, king and ace of ❤'s. Each player is dealt a card randomly, which they can look at. Both players begin the betting by putting 1 gold into a pot. One player goes first. They can either 'call' or 'raise'. If they 'call', betting stops, and the player with the higher card wins. If they 'raise', they add 2 gold to the pot. The other player can then 'fold' -- automatically forfeiting the pot -- or 'call', adding 2 gold and giving the pot to whoever has the higher card. The game alternates who goes first each round.

You can play against the computer (nemesis), by pressing 'play' to the right. I'll tell you nemesis's strategy:

If nemesis goes first, it will only raise if it has the ace -- unless it has the jack and decides to bluff, which it will do with 50% probability.

If nemesis goes second, it will usually only call a raise if it has the ace, but there is a 25% chance that it will also call a raise with either a queen or king.

I believe that this strategy is optimal. That is, I don't think you can win all of nemesis's gold no matter how you play (unless you are genuinly lucky, or read nemesis's mind using the Javascript console, or.. I'm wrong ;)

Why do I think this strategy is optimal?

Short answer: it involves Linear Programming.

Longer answer: well, we usually play games by reacting to each bit of information we receive when we receive it, but we could, if we wanted, decide before the game even begins how we will react to each possible thing that can happen. For instance, if we are going first, we can decide before the cards are dealt how to react to each card, e.g., 'only raise if we get an ace'. If we want to bluff, we can make that decision before the game begins too, e.g., 'raise if we get an ace or a jack'.

Now there are four possible cards, and one decision to make given each card (call or raise), which gives us 2⁴ or 16 possible ways to play. If we go second, there are also 16 possible ways to play (there is only one choice to make for each card, namely: if they raise, will we call?). If we want to incorporate randomness, we can do so by choosing randomly from among the 16 options.

This lets us build a 16x16 matrix, where each row represents a way to play when going first, and each column represents a way to play when going second. The cells hold the expected transfer of gold given two ways of playing against each other (where positive amounts favor player one, and negative amounts favor player two).

Each player needs to choose a strategy. We can represent each player's choice as a 16 element vector: each element represents the probability of chosing one of the 16 possible strategies, so the elements must sum to 1. If player one's vector is A, and player two's vector is B, and the 16x16 matrix of expected winnings is W, then the expected winnings of player one versus player two is A * W * B^T, which will be a single number.

Now imagine that we're player one. We have some control over what A * W will look like. Note that it will be a 16 element vector. Note also that player two, being evil, knows what our strategy is. That is, they can look at that 16 element vector, and find the most favorable element of it for them, and adjust B so it has a 1 corresponding to that element, and a 0 everywhere else. Hence, we want to adjsut A so as to maximize the minimal element of A * W. This can be done with a linear program. We can also use the same technique, transposing everything, to find the optimal strategy as player two.

I generated such a Linear Program in JavaScript, combining both problems into a single program. You can see it by pressing the button below (and you can inspect the JavaScript of the frame to see how it works). The variables look like 'p1s0'. This represents player one -- the player going first -- strategy 0, which is the strategy of never raising.

It turns out there is a Linear Program solver written in JavaScript too: http://www.zweigmedia.com/RealWorld/simplex.html.

It tickles me to see an LP solver in JavaScript, though it does have a couple nuances: first, it requires each variable to appear in the objective function, even if we're just multiplying it by 0. Also, it implicitly adds the constraint that each variable be positive. This was a problem for my program, and is why you'll see the magic number 10, which effectively allows p1 and p2 to be 'negative', i.e., as low as -10.

The solution is p1s8 = 0.5, p1s9 = 0.5, p2s8 = 0.75, p2s14 = 0.25. This means that player one should choose strategy 8 half the time, and strategy 9 half the time. Strategy 8 is 'only raise with the ace', and strategy 9 is 'raise with either the ace or jack'. Likewise, player two should choose strategy 8 with 75% probability, otherwise strategy 14. Strategy 8 for player two is 'only call a raise with the ace', and strategy 14 is 'call a raise with anything but the jack'. (Note: this is not the only solution. p2s8 = 0.5, p2s12 = 0.5 also works, where strategy 12 is 'call a raise with a king or ace'.)

real gl

5/25/12

bluffing

No comments:

Post a Comment