real gl: glicko rating

I've been using the Elo rating system for talent court. I like that the update rules are very simple. However, people come into the system with a score of 1500, and if they lose, then their score drops below 1500, meaning that they are listed below people who have never played. I feel like this can be discouraging: "I'd be doing better if I never played." One hack I've seen for dealing with this is to only give people a score after they've played a few games, which is what I currently do.

More recently, I read a bit about TrueSkill, and saw a clever trick to deal with this problem. First, TrueSkill represents each player's skill with a gaussian distribution (mean and standard deviation) rather than a single number. If a player wins, their mean will increase, and their standard deviation will generally decrease, since more is known about their skill.

Now here's the trick: instead of showing people their mean skill, we show them their mean skill minus a few standard deviations. Conceptually, we show them a score that we are ~99% sure they are better than. This does something a little counter-intuitive -- if you play your first game, and lose, your score will increase. Actually, your mean score decreased, but your standard deviation decreased more, such that the mean minus three standard deviations actually increased. This has the emotionally pleasing property that people's scores generally always increase the more they play, where better players' scores increase faster.

This system also has a nice way to reduce people's scores over time. Why would we do that? The cynical reason is that we want to keep people playing, by making their score go down if they don't. A more diplomatic reason is something like: because the scores are only meaningful when compared to other people's scores, and because new people are coming into the system all the time, we want to keep people's scores up-to-date. Another way of saying this is that we become less and less sure of a person's score over time. This way of looking at the problem suggests a clever solution: we can represent this decrease in certainty by increasing a player's standard deviation, which has the side-effect we want of reducing the score that they see.

So why is this post titled "glicko rating" instead of "TrueSkill"? Well, TrueSkill is sortof an extension of the Glicko rating system, where the main contribution of TrueSkill seems to be adding support for teams. I don't care about teams for now, and the glicko system is simpler to implement, i.e., the update rules are on wikipedia.

Up above is a widget you can use to see what happens to two player's scores (labeled r, for rating) and standard deviations (labeled RD, for rating deviation, I think). Press one of the "win" buttons to see what happens when that player wins. You can also modify the r's and RD's directly. Note that I show gaussian distributions to represent red's and blue's skill, whereas I've read here that the Glicko system actually uses a logistic distribution.

Up above is also a JavaScript function that implements the update rules. You're free to take this. The glicko system itself is public domain, and so is this function. Note: don't go looking for the "actual" source code for that function in the html. What I "actually" do is call eval on the html pre element that you see above.

real gl

5/24/12

glicko rating

No comments:

Post a Comment