5/31/12

writing to size

Rant: one thing I hate about papers in academia is page minimums and maximums. This has been something I've hated all through school. For me, the problem was generally that I wanted to say something with fewer words than the teacher required, and I felt like I was adding bull shit to my paper to fill the remaining space.

"How long was your thesis?"  "Oh, 500 pages"  "Wow, you must have done a lot of work!"

"How long was your thesis?"  "Oh, 70 pages"  "Wow, you're adviser was fine with only 70 pages?"

As if page counts indicate the amount of work done.. As if the amount of work done indicates the amount of useful insight provided to the world..

I feel like people should write whatever's on their mind to say about their research, put it online, and expand it where and if people ask questions about it.

don't prepare

I'm reading Quora posts for a paper. I came across this answer when reading about wastes of time: "In my opinion, when you don't prepare, you are forced to react to things happening after the fact instead of handling them in stride..."

My reaction to this is: being prepared is a tradeoff.

Say you're going out of the country.. it seems good to "prepare" by bringing your passport. That will save lots of time. And in fact, many less extreme examples also have the property of "if you prepare before hand, you'll save time later" -- BUT: you spend time preparing, and not all of the eventualities you are prepared for actually happen. And not all of the things that are bound to happen are actually that bad. If you forget to bring your toothbrush, you can get one at the hotel. If you forget to bring underwear, you can.. make do.

My thoughts here are analogous to "lazy evaluation" in computer science. The idea is that sometimes you can make an algorithm faster overall by only doing work when a request comes in, and just doing the work you need to. This has the disadvantage that it'll be slower to respond to requests, because you weren't "prepared" with the answer ahead of time, BUT, you don't waste time preparing for things that never happen.

This speaks to my physical filing system as well. My filing system is a stack. To file a paper, I put it on the stack. If I need a paper, I search through the stack until I find it. Searching takes a while; however, I rarely need anything from my stack, and I save so much time filing stuff that the total time I end up spending in the filing system is lower with the stack.

In any case, it seems like a tradeoff, and the tradeoff doesn't always favor preparation.

5/29/12

Google Tech Talk (presenter)

This talk presents the goal of the 'google of work', where people hire experts to help with their work as seamlessly as they use google searches. Talk gives an overview of current efforts toward this goal, including TalentCourt and Pomodoro Hiring.

Here's the presentation. Note that all the visible images should have terms-of-use that permit commercial use, but if you delete some of them, you'll see "nicer" images that do not necessarily support commercial use.


5/28/12

mturk blunder

Ugg.. so I created a hit using mturk's own designer interface with the following HTML:


<h1>Please make this sketch look good.</h1>
<p><img src="http://example.com/my_sketch.png" alt="" /></p>
<p>Upload your version when you're done. Thanks!</p>
<p><input type="file" name="drawing" value="" /></p>

This does not work. Apparently.

The results have a "drawing" field, but it just shows the filename of the file that was uploaded. The actual file is irretrievable.

My mistake is a repeat of a mistake made by someone on this stack overflow question.

Alas.

UPDATE: so, I was able to get a couple of these images using the NotifyWorkers API call, telling my workers that the drawing was lost, and asking them to e-mail it to me.

5/26/12

current information system

My grittiest most personal processing happens in some text editor, on paper, and within my skull.

If I think something is not too personal or offensive, and has a slight possibility of being interesting, I put it here, in metameaninglessness.

If a post on metameaninglessness seems interesting enough, it is cleaned a slight bit, and reposted on realgl.

If a post requires data files, these are put in my Dropbox Public folder. If a post requires JavaScript fanciness, this is put in glittle.org (though I may start putting those in Dropbox as well, or the other way around.. not sure what is best.. I really wish blogger itself would let me post other types of files the same way it lets me post image files).

[update: metameaninglessness has been merged with real gl]

bayesian truth serum data

In a previous post, I took 100 yes/no questions and asked 20 turkers to answer each question. I also asked each turker to predict how other people would answer each question. [update: this is a refined version of this post]

That post also included an app where readers could answer questions themselves, and then see how turkers and other blog readers answered those questions.

At this point, there are about 40 answers for each question -- so about 20 answers from turkers, and 20 from blog-readers.

Here is an anonymized version of the data as JSON. The main object has three keys: users, questions, and answers*.

This plot shows people's average guesses as a function of the actual answers for a question. A point at X, Y on this plot means that for some question, X*100% of the people answered "yes", but the average guess people made about how many people would answer "yes" was Y*100%.
My main takeaway from this plot is that it seems like people are conservative in their guesses.

The next plot shows people's average guesses as a function of the actual answers for a question, grouped by how people answered the question, where blue is "yes" and red is "no". A red point on this plot at X, Y means that for some question, X*100% of the people answered "yes", but the average guess made by those who answered "no" was that Y*100% of the people would answer "yes".
I was about to write that people tend to over-predict that people will answer the same way they did, but that is not quite true. For instance, on questions where 90% of people say "yes", the people who say "yes" predict that only 70% will say "yes". Hence, I should instead say that people who answered "yes" to a question tend to predict that the outcome will have more yesses than predicted by those people who answered "no" to the same question.

If you run any analysis on the data, I'd love to hear about it! Here are some questions I'd like to know the answers to:
  • Are any questions correlated with each other? I'm not sure if there is enough data for this. We need instances where many people answered the same questions.
  • Which questions are the most popular? One way to find out may be to look at the "otherQuestion" data*, e.g., a user had a choice between two questions to answer, and they answered this one instead of the "otherQuestion", so perhaps they like this one more.
  • Are there any interesting differences between turkers and blog readers?
  • How many questions did people answer? I'm sure this follows a power law.
  • Are there any interesting results for specific questions?

* Note that some of the answers have extra fields: otherQuestion, otherQuestionAbove, and otherQuestionOlder. These provide data about what other question was visible in the interface when the given answer was provided, and whether it appeared above the answered question, and whether it had been in the interface longer. These fields should only appear in instances where there was only one other question available to answer. Note that the interface begins by showing three questions simultaneously, but it pretends that the top questions are "older" for the purposes of otherQuestionOlder.

bts data

Here's data about this experiment, having to do with bayesian truth serum. The data contains answers to 100 questions from both turkers and blog readers (all anonymized).

The questions were displayed three at a time, where usually one of the questions was showing the results, and there were two questions you could answer. In such cases:

  • 57.2% of the time, people answered the top question
  • 76.4% of the time, people answered the older question (the one that had been there longer)

I'm not sure these numbers are really meaningful though because one common case seems like it would be the case where there are 3 questions, the user answers the first one (which is not included in this data since there are 3 questions instead of 2), and then the user answers the second one. On the other hand, this question would be both on-top and older, so the bias would skew each figure the same amount, meaning that the difference in percentages probably is meaningful (i.e. older-ness of question is more predictive than which question is on top).

Graph 1: actual vs mean-guess



Graph 2: actual vs mean-guess for people who answered 'yes' and 'no'


Here's the Excel file for the above plots.

JavaScript Eval code (after pasting this data into the lower-left textarea):


var db = eval(input)
foreach(db.questions, function (q) {
    q.answers = {}
    q.guesses = []
})
foreach(db.answers, function (a) {
    var q = db.questions[a.question]
    q.answers[a.text] = q.answers[a.text] ? q.answers[a.text] + 1 : 1
    if (a.text.match(/yes|no/))
        q.guesses.push(a.guess)
})
foreach(db.questions, function (q) {
    var s = ""
    s += q.answers.yes / (q.answers.yes + q.answers.no)
    s += ','
    var sum = 0
    var total = 0
    foreach(q.guesses, function (g) { sum += g; total += 1 })
    s += sum / total
    print(s)
})



5/25/12

iterative image descriptions revisited, again

A while ago, I ran an experiment on mturk involving people iteratively improving image descriptions.

If we want to build a model of the steps involved in this process, we might consider three things:

  • I : the distribution of description qualities when someone writes a description from scratch.
  • M : the distribution of description qualities when someone writes a description based on a description of some quality.
  • V : the probability of a voter correctly picking the description of higher quality given two descriptions with some difference in quality (presumably the probability would be 0.5 if the descriptions had equal quality).
Here is the data for I

Here is the data for M (each row has two number X, Y, where someone took a description of quality X and wrote a description of quality Y)

Here is the data for V (each row has three numbers, X, Y, Z, where someone saw a description of quality X and Y, and they voted for the "correct" one iff Z = 1)

Here's a histogram plot of I (R command: hist(I$rating, breaks=0:10) )

Here's a sortof 2d histogram (looking down) of M (thanks to Panos Ipeirotis for these R commands):

> library(KernSmooth)
> m <- read.csv("M.txt")
> model <- bkde2D(m, bandwidth=c(2,2), gridsize=c(200, 200), truncate=TRUE)
> filled.contour(model$x1, model$x2, model$fhat, xlim=range(c(1,10)), ylim=range(c(1,10)), nlevels=50, col=terrain.colors(70, alpha =1.0))




Here's an even better plot from Panos Ipeirotis:


> library(som)
> m <- read.csv("M.txt")
> model <- bkde2D(m, bandwidth=c(dpik(m$r1),dpik(m$r2)), gridsize=c(150,
150), truncate=TRUE)
> filled.contour(model$x1, model$x2, normalize(model$fhat, byrow=TRUE),
xlim=range(c(3,10)), ylim=range(c(3,10)), nlevels=40,
col=terrain.colors(50, alpha =1.0))



academic paper reviews

I often get invitation to review for papers, and although I have some misgivings about the academic publication process in my field, I feel some obligation to help out with the system anyway (since I haven't managed to create something better).

However, now I'm thinking I may want to stop reviewing papers for a different reason, which is that when I read papers to review them, I read interesting ideas that I'm not allowed to tell anyone about, or presumably incorporate into my own problem solving repertoire.

Usually I'm pretty good about not telling people about these ideas until they are published (assuming they get published), but just today, I had the experience of one of these ideas finding its way into my research, i.e., I have a problem, and the solution that I'm thinking about is influenced by a paper that I reviewed for a conference, and now I'm not sure what to do. Should I not implement the solution? This seems ridiculous. I feel like I would come up with some solution on my own, but it's impossible now to know what that solution would have been.

The process of trying to assign credit for ideas is a hinderance. Ideas should be absolutely free in every sense. Ugg.. I hate this so much.

dixit and ...


consciousness vs intelligence


penrose tiling

penrose tiling using canvas

meditation


symbolic reasoning


bluffing

question: is bluffing ever a mathematically optimal strategy?

answer: I think yes, for some definition of optimal.

let's define 'optimal strategy' as the best strategy to have if your opponent knows what strategy you have. That is, they know what algorithm you will use to make decisions.

By this definition, the optimal strategy in rock-paper-scissors is to choose randomly. If you don't choose randomly, and your opponent knows the distribution over your choises, then they'll choose the thing that beats the thing you are most likely to pick.

Now consider a simple betting game. The deck has only four cards: the jack, queen, king and ace of 's. Each player is dealt a card randomly, which they can look at. Both players begin the betting by putting 1 gold into a pot. One player goes first. They can either 'call' or 'raise'. If they 'call', betting stops, and the player with the higher card wins. If they 'raise', they add 2 gold to the pot. The other player can then 'fold' -- automatically forfeiting the pot -- or 'call', adding 2 gold and giving the pot to whoever has the higher card. The game alternates who goes first each round.

You can play against the computer (nemesis), by pressing 'play' to the right. I'll tell you nemesis's strategy:

If nemesis goes first, it will only raise if it has the ace -- unless it has the jack and decides to bluff, which it will do with 50% probability.

If nemesis goes second, it will usually only call a raise if it has the ace, but there is a 25% chance that it will also call a raise with either a queen or king.

I believe that this strategy is optimal. That is, I don't think you can win all of nemesis's gold no matter how you play (unless you are genuinly lucky, or read nemesis's mind using the Javascript console, or.. I'm wrong ;)

Why do I think this strategy is optimal?

Short answer: it involves Linear Programming.

Longer answer: well, we usually play games by reacting to each bit of information we receive when we receive it, but we could, if we wanted, decide before the game even begins how we will react to each possible thing that can happen. For instance, if we are going first, we can decide before the cards are dealt how to react to each card, e.g., 'only raise if we get an ace'. If we want to bluff, we can make that decision before the game begins too, e.g., 'raise if we get an ace or a jack'.

Now there are four possible cards, and one decision to make given each card (call or raise), which gives us 24 or 16 possible ways to play. If we go second, there are also 16 possible ways to play (there is only one choice to make for each card, namely: if they raise, will we call?). If we want to incorporate randomness, we can do so by choosing randomly from among the 16 options.

This lets us build a 16x16 matrix, where each row represents a way to play when going first, and each column represents a way to play when going second. The cells hold the expected transfer of gold given two ways of playing against each other (where positive amounts favor player one, and negative amounts favor player two).

Each player needs to choose a strategy. We can represent each player's choice as a 16 element vector: each element represents the probability of chosing one of the 16 possible strategies, so the elements must sum to 1. If player one's vector is A, and player two's vector is B, and the 16x16 matrix of expected winnings is W, then the expected winnings of player one versus player two is A * W * BT, which will be a single number.

Now imagine that we're player one. We have some control over what A * W will look like. Note that it will be a 16 element vector. Note also that player two, being evil, knows what our strategy is. That is, they can look at that 16 element vector, and find the most favorable element of it for them, and adjust B so it has a 1 corresponding to that element, and a 0 everywhere else. Hence, we want to adjsut A so as to maximize the minimal element of A * W. This can be done with a linear program. We can also use the same technique, transposing everything, to find the optimal strategy as player two.

I generated such a Linear Program in JavaScript, combining both problems into a single program. You can see it by pressing the button below (and you can inspect the JavaScript of the frame to see how it works). The variables look like 'p1s0'. This represents player one -- the player going first -- strategy 0, which is the strategy of never raising.

It turns out there is a Linear Program solver written in JavaScript too: http://www.zweigmedia.com/RealWorld/simplex.html.

It tickles me to see an LP solver in JavaScript, though it does have a couple nuances: first, it requires each variable to appear in the objective function, even if we're just multiplying it by 0. Also, it implicitly adds the constraint that each variable be positive. This was a problem for my program, and is why you'll see the magic number 10, which effectively allows p1 and p2 to be 'negative', i.e., as low as -10.

The solution is p1s8 = 0.5, p1s9 = 0.5, p2s8 = 0.75, p2s14 = 0.25. This means that player one should choose strategy 8 half the time, and strategy 9 half the time. Strategy 8 is 'only raise with the ace', and strategy 9 is 'raise with either the ace or jack'. Likewise, player two should choose strategy 8 with 75% probability, otherwise strategy 14. Strategy 8 for player two is 'only call a raise with the ace', and strategy 14 is 'call a raise with anything but the jack'. (Note: this is not the only solution. p2s8 = 0.5, p2s12 = 0.5 also works, where strategy 12 is 'call a raise with a king or ace'.)

a short short story

A boy named brown eyes (B) flew across the country to do an internship in silicon valley. While there he met a girl named green eyes (G). She was beautiful and wicked smart.

B was religious, and he knew G wasn't, but he wanted to date her anyway — maybe she would see the light. To avoid misery and heartache, and a possibly pointless long distance relationship, he prayed to god, asking if he should ask her out.

and god said "yes"

so he did. He asked her to dinner, and she accepted — knowing he was religious, but thinking maybe he would see the darkness.

B: so, do you believe in anything?

G: no

B: what about gravity?

G: no. I mean, I act as if it's true when I need to make decisions, because I don't have a better model right now. but for all I know, gravity could be wrong.

B: what about your own existence? do you buy the argument 'I think therefore I am'?

G: no.. — B: ..but how can you doubt that you doubt things?

G: well, I'm just not that smart. I don't trust my logical reasoning abilities, or logic or reason for that matter, enough to be sure that I'm drawing the correct conclusion from that line of argument. The fundamental nature of the reality I think I'm in could be completely different than I think.

B: so.. you think that you might not be thinking right now, or not really here..

G: yes.. I do think that, though I don't 'believe' it..

B: right, of course, because you don't believe anything..

B thought in his mind: "wtf god? I thought you said 'yes' I should ask her out, but this doesn't seem to be going well"

..and so it went, in the mind of god, as he thought about how to answer B's prayer — whether he should ask G out — and since saying "yes" seemed to be going poorly, god gave his final answer: "no".

so he didn't.

cube and color


truth vs insight


meaning of meaning


idea dna


drop blog


This was the first entry in my Drop Blog, which has now moved to here. The hope was that it would be really easy to post stuff using Dropbox. Unfortunately, for anyone to see it, I would end up needing to post links to stuff on Facebook and Twitter, and there was no index anywhere.. so now I think it's easier to do stuff on blogger (although it has the disadvantage of not having an easy way to put JavaScript in my posts, but my current strategy for that is to post stuff on glittle.org, and put iframes in posts).

Many of the Drop Blog posts were drawn on index cards, and then photographed. I included a zen magnet in this shot, in order to convince people that it really was a photograph, and not some photoshop filter to make it look that way.

TurkPad

I created something called TurkPad, which is essentially just Etherpad, but with the ability to bring in turk workers. And it's free (for you). People can spend up to $1 for each TurkPad instance. The site spends my own turk money, and reveals my turk account balance.

There is also a PayPal link for people to donate money, with the vague hope that people will donate money to cover the cost of the experiments they run. I could make people pay with PayPal to start each experiment, but then it would have more friction, and people might complain to me saying: "Hey, I paid you money, and I didn't get what I wanted!"

Here are the results from several early trails (I forget what the people/rate's were for each one, but I give my guess):


Brainstorming a company name (I think 10 people at $0.05/task):

original typewith.me pad: http://typewith.me/p/DgLMG36g
STARTPlease brainstorm a name for a company that let's the crowd invest in startup companies (like 100,000 people each investing $10 in a startup).
put name suggestions here (1 per turker) thanks!: CROWDVEST
Micro Pass
Startup Express.
Community Oriental Investers
Small but Strong    
Simple Start Investments
 Micro Investments


John Horton suggested this task: "Come up with questions to ask applicants to a data entry job" (I think 20 people at $0.05)

original PiratePad: http://piratepad.net/zkVypZgPFb
Example Questions to Ask Applicants
============================
1. What will be the hardest part of completing this job? 
2. Do you have any questions about the task?
3. What one of your past jobs is most like this job?
4. Do you sacrifice accuracy for speed
5.  What is your words per minute speed?
6.  What is the accuracy value you have at that speed?
7.  What types of software are you familiar with?
8.  Data Entry is somewhat repetitive, will you get bored?
9.  Data entry requires you to work independantly, somewhat more alone than other jobs, would you feel isolated from other coworkers?
[...some blank lines...]
1. What will be the hardest part of completing this job?
2. Do you have any questions about the task?
3. What one of your past jobs is most like this job?
4. Do you sacrifice accuracy for speed?
5. What is your words per minute speed?
6.What is the accuracy value you have at that speed?
7. What types of software are you familiar with?
8. Data entry is somewhat repetitive, will you get bored?
9. Data entry requires you to work independantly, somewhat more alone than other jobs, would you feel isolated from others coworkers?
[...some blank lines...]
1. What outside factors may cause poor productivity for you.
2. Will you have many distractions.
3. how much money pay for each page ??
[...some blank lines...]
1. Have you done data entry job before    
2. How familiar are you with computers   
3. How familiar are you with web browsing
4. What is your typing WPM
5. How is your eye sight
6. Do you have reading glasses
7. Do you like working with team or isolated
8. Will you be confortable working long hours in front of the computer    
[...some blank lines...]
1.What is your typing speed
2.Which level you have passed in typing
3. Did you take typing as a personal hobby?
4.What is the most important aspect that you think is necessary for a data entry operator?


Panos Ipeirotis suggested this task (I think 10 people at $0.05):

original PiratePad here: http://piratepad.net/DwGZbiGN22
Please generate an idea for a Tweet (< 100 characters) for this article: http://represent.berkeley.edu/umati/
Very much to reading and also better to get good improvement on my brain. 
YOUR TWEET IDEA HERE: #TheUmatiProject Using kiosks for targeted outsourcing #UCBerkeley Crowdsourcing is a great way to efficiently use the talents of turkers
it is very useful to dud.
Crowdsourcing goes local #crowdsourcing20
Crowdsourcing done by experts #Crowdsour
#UCBerkeley creates #crowdsourceing vending machines
not bad.
really good
People with skills go where the money is #StatingTheObvious


Coin flipping (I think 10 people at $0.01/task):

original typewith.me pad: http://typewith.me/p/DgLMG36gdE
Please flip a coin, and write whether it landed on heads or tails:
Heads
- heads
heads
tails
heada
heads
heads
tails
tails
heads
heads
tails
tails
heads
tails
heads
heads
tails
heads
heads
tails
tails
tails
tails
hea
tail
heads


human computer interaction


My original thought in drawing this had to do with the debate about whether computers can be conscious, or experience emotions and what not. So, I wanted to have a couple of robots debating about whether computers can be conscious, not seeming to notice that they are themselves computers. Because of course, I tend to think that humans are computers.

However, I showed this to someone for feedback, and they thought the setting didn't match the topic. They thought the robots should be in easy-chairs with pipes, or something similar, so as to invoke the idea of philosophical discussion. I also seem to recall her saying my female robot was a bimbo.

moon direction

On the topic of astronomy, something I realized relatively recently is this: the bright part of the moon points toward the sun. I mean, I knew the light was coming from the sun, but I didn't realize the practical implication of this: you can look at the moon, and know from that where the sun is located. If the sun has set, but it's now too dark to see where it set at, the moon is pointing there. This in turn can be used to determine which direction is West.

5/24/12

solar system

I was talking to Allison about the origin of the solar system, which involved a cloud of matter, and things sortof condensated and drained toward the center, and planets are little eddies and swirls around the main whirlpool which is the sun.

And I asked why everything should all be rotating in the same direction in a plane, which seems sensible for water draining since it's flat to begin with.. but if you start with a bunch of particles flying around in 3D, why gravitate toward a plane, and why all in the same direction?

The main insight is that, if you create a little universe, and put a bunch of particles in it, and each one has a random direction and speed, then naturally they'll all start orbiting each other in some crazy chaotic pattern.. BUT, the system has a center of mass, which seems natural maybe, and that center of mass is moving at some constant speed in some direction, which is a little weirder, but I still buy it.. AND, the entire system has an angular momentum. There is some 3d line passing through the center of mass of the system that everything more-or-less spins around, and a particular direction (clockwise or counter-clockwise) and a particular speed of rotation, on the whole.

That is to say, my insight for today [edit: this was originally written a few months back], is that any random configuration of particles moving around in space defines a 3d vector through space. Even before anything happens with some nebula that's thinking about forming a solar system, you can say "well, the sun will be here, and the planets will be rotating on this plane, in this direction."

There's also the issue of why do things actually form a plane, since they could rotate at odd angles and still keep the same angular momentum.. my current thought is this, if everything glommed together into exactly 2 equally sized balls, then they would necessarily be rotating in a plane, in a certain direction.. and everything else is a gradient towards that, as things glom together, they become more and more in a plane, and more and more in a given direction.

Update: I guess another way of looking at it is that if you manage to spin a basketball very very quickly, it will sortof flatten into a pancake.. and so since random collections of particles are typically spinning, on the whole, around some axis, it makes sense that they would flatten into a pancake.

Of course, the thing bringing the top and bottom of the basketball toward the center is that they are physically attached to the sides of the ball that are moving away from the center, pulling them in. In the case of our particle cloud, that job is done by gravity.

pascal's wager

A self-proclaimed rationalist/atheist once asserted that she did not have a satisfactory rejection of Pascal's Wager. Pascal essentially says that we should believe in the Christian God, if there's any chance that He exists, because the consequences of not doing so are super bad.
Doesn't that argument call for worshiping the Flying Spaghetti Monster (FSM) as well?
Probably not.. assuming you can only worship one god, Pascal's Wager seems to suggest worshiping the most likely one. Now here's the problem: the FSM is made-up, and it's easy to make-up an infinite number of similar gods, like asserting that Russell's teapot is a god. If each of these gods has a non-zero probability of existing, then they can't all be the same, or else they would sum to larger than 1. So there is some most-likely god, and that seems unlikely to be the FSM.
Well, if I worship one god, aren't I also failing to worship another god?
Probably. The strategy doesn't guarantee avoiding some god's hell. But that doesn't mean you shouldn't worship the most-likely god.
But, Wikipedia also suggests the argument of "inauthentic belief", saying that if I believe in the Christian God because of Pascal's Wager, He'll know it, and not accept me into Christian heaven.
Well, that argument makes two doubtful assumptions. First, it assumes that the Christian God would frown on people who try to follow His gospel motivated by Pascal's Wager. I think most Christian's would say "it's a good start, and Jesus will help you, if you really want to follow Him." Second, it assumes that you cannot authentically modify your belief. I think people can change their beliefs. Also, you would need to be sure that inauthentic belief wouldn't get you into heaven in order to argue for not trying, according to Pascal's Wager.
Hm.. Well, maybe the math doesn't actually work out to negative infinity for hell. Maybe hell can't possibly be as bad the billionth year as it is the first day, and maybe the probability of hell lasting a billion years is less than the probability of hell lasting a million years, and maybe there actually are some cancelling effects, like a hell that can only be avoided by being atheist..
Er.. maybe. It seems hard to be sure you have worked it all out correctly, and any doubts seem like they would lean in Pascal's favor, since the stakes are so high..
Let's simplify the math. Let's say there are just two possibilities: God or no God, and you can chose whether to believe. If God exists, he'll reward you with heaven or hell based on whether you choose to believe in Him. However, God is unlikely to exist. Would you choose to believe?
Ugg.. hmm.. No. I mean, if I thought there was a 0.0000001% chance of God, then I would believe there was a 0.0000001% chance of God.
Do you understand how eternity works?
Yes, but I just don't feel the length of eternity. I guess I discount highly improbable future happiness in favor of more certain near-term happiness, where I guess it makes me happy to believe whatever makes the most sense to me at the time.
More than that, it seems useful to believe whatever makes sense, since this has the most likely chance of helping me gain more insight.
I see.. if you thought there was a 99% chance of God, but God was only satisfied with 100% belief, would you bump up your belief to 100% to avoid a 99% chance of eternal hell?
The romantic in me says no, stick to your guns, but I probably would delude myself in that case.
Hm.. me too. I think the issue is essentially similar to our discussion of rational gambling, where there seems to be no objectively optimal decision. In particular, it does not seem objectively irrational to choose to believe in God, even if His existence seems astronomically unlikely. Nor does it seem objectively irrational to be an atheist, despite a slight prospect of eternal punishment.

music is a drug

and when I'm on a roll, in the zone, working hard, staying up late.. I take lots of that drug. Caffein doesn't work for me. Music does.

flipping coins

When you ask a turker to flip a coin, it will land on heads about 2/3 of the time. There is an interesting series of blog posts about this: Coin Flipping, Coin Flipping with Bonuses, Coin Flips Revisited, and even more recently Michael Bernstein repeated the experiment with 1000 turkers and got 687 heads.

So when Google consumer surveys came out, I asked Allison Moberger to post a question for me: "Please flip an actual coin and choose the result below," with the options of "Heads" and "Tails".

We went for the smallest experiment allowed by Google, which is 1000 people, at $0.10 per question, so $100.

The result: 65.9% heads.



I didn't realize this, but even though you pay for 1000 people, Google doesn't have demographic information on everyone who answers, and it throws out people it doesn't know about. Hence, this 65.9% is from 794 'respondents with demographics', and I'm not sure how to see what the remaining 206 people did. [EDIT: Jack Hebert says "To see all 1000 responses, click on the 'gear' icon then set 'Prefer weighted' to Off." He's right. Thanks Jack! He also mentioned that I can share the data. Apparently it is public by default, so here's the link.]

The demographics are cool though, and they automatically look for statistically significant differences among the demographics, called "Insights". It says "124 insights investigated. 2.5 false discoveries expected on average." There were indeed two insights, so they're probably both false: "Among women, those in the US South picked Heads more than those in the US Northeast." and "Among people in the US South, women picked Heads more than men."

bayesian truth serum

I took 100 yes/no questions and asked 20 turkers to answer each question. I also asked each turker to predict how other people would answer each question.

You can see these questions to the right. If you answer one, and predict how other people will answer it, then you can see how the turkers and other blog readers answered the question. You also get points: 100 points if your prediction is exactly right, and it goes down to 0 when you are 50 points away from the right percentage. New questions will come when you answer these questions.

Sometimes you'll see that the average answer and average prediction are pretty different, suggesting cases where people have a distorted view of reality. I'd like to make the data public and say something interesting about it, but first I want to let people make guesses, and try to get a good score, without already knowing the answers.

For me, the idea of asking people to predict what other people will do comes from 'bayesian truth serum', though that paper proposes some fancy math which I am not using here. In particular, bayesian truth serum gives you a score based not only on your prediction, but also on your answer, which would seem like it incentivizes people to answer dishonestly so as to maximize their score, but for some math reason, they claim it doesn't. Here, I'm just giving a score based on your prediction, which is simpler to understand.

This idea also came up again for me when listening to a talk by Justin Wolfers at the Collective Intelligence conference about forecasting elections. Wolfers has some interesting data showing that asking 'who do you think will win' is more predictive than 'who will you vote for', which sortof makes sense since you're sortof asking people to take the average vote of all the people they know, which may effecitvely increase your sample size; however, the effect is stronger than I thought, since he also has data showing that asking a biased sample of people, like all republicans, 'who do you think will win?' is better than asking an unbiased sample of people 'who will you vote for?'.

Anyway, long store short, I wanted to play around with the idea of asking 'how will other people answer this question?'. Hopefully a lot of people answer the questions in this prototype, and I can write a subsequent blog post about that data.

rational gambling

from an old blog post on glittle.org/blog.. decided to move it here instead of realgl.blogger.com because it seems a bit half-baked:

I used to think it was irrational to buy a $1 lottery ticket, or play roulette -- a tax on people who are bad at math -- but now I don't.
Um.. because it is a fun experience?
No. I used to think the rational thing to do was maximize expected money.
Why is that irrational?
Well, there are times when maximizing expected money will probably leave me broke. Let's say someone offers to triple my bet if I win a fair coin toss. I expect to make $1.50 for every dollar I bet, so the "rational" thing to do is bet all my money. If I win, and they offer another go at it, I should bet everything again. In fact, if they offer 100 flips, I should bet everything on every flip, which will very likely leave me broke.
True, though you could maximize your expected log-money. The log of zero is negative infinity, so you'd never risk all your money. Also, the log would value each new dollar less and less, which sortof makes sense, since one extra dollar when you have a billion dollars isn't worth much.
Sure, that turns out to be the kelly criterion for this game. But what is special about the log? I mean, someone could offer so much money for winning 100 tosses in a row that I would still bet everything on that remote possibility -- everything except a penny, that is, to avoid having nothing.
Hm.. What if you imagined that you were competing against someone else with the same opportunity, and you wanted to maximize your chance of ending up with more money than them? This would stop you from caring too much about remote possibilities, since you would want to do whatever maximized your money most of the time. This also sortof makes sense, since people like to have more money than other people.
Yeah, that also turns out to be the kelly criterion for this game. But what is special about ending up with more money than someone else playing the same game? I mean, consider a game with a single coin that is biased slightly in my favor, and if I win, I get back slightly more than I bet. To have the best chance of having more money than my friend, who has the same opportunity, I should bet everything, but this means I'm expected to lose about half my money.
Hm.. Well, you could maximize your expected utility. Von Neumann and Morgenstern suggest that if you're rational, your preferences for decisions like this will be consistent, and that that implies there will exist some function mapping dollars to utiles such that you are really maximizing your expected utility.
That's an interesting reframing of the problem, but saying it is rational to have consistent preferences does not say what those preferences should be, i.e., they don't say what the mapping should be from dollars to utiles. I claim that this mapping is arbitrary. I can't think of any objective reason to prefer one mapping over another.
As an example, it seems fine to value $36 a hundred times more than $1. After all, you can buy something with $36. You probably can't buy anything with just $1, not even 1/36th of something. Hence, it may make sense for you to put your single dollar down on red 27 in roulette for a small chance of winning $36, since you have a 1/38 chance of getting a hundred times as many utiles as you had with your $1, which is an expected gain in utility.

cyclic json

I think it's useful sometimes to serialize cyclic data structures as JSON. Douglas Crockford wrote a nifty way of doing it here. The basic idea is to "decycle" a data structure before serializing it as JSON, replacing redundant references to an object with string-paths pointing to a single version of that object within the data structure. When deserializing such an object, the original links are restored with a "recycle" function.

I had two concerns with Crockford's implementation: First, I was afraid it would be too slow. Crockford's comments point out that each time the algorithm processes an object, it does a linear search to see if it has processed the object already, i.e., O(n2) running time. My solution (premature optimization?) is to add a tag to objects in the original data structure, and just check for this tag when processing new objects. These tags are removed before the method returns.

Second, Crockford marked cyclic references with an object that looks like this: { "$ref" : "$[\"path\"][2][\"object\"]" }. This breaks if an object happens to contain "$ref" as a key. My solution is to find a unique key, which means I need to say somewhere what that key is. (update 1/30/12) I do this by adding a wrapper object with a property called "cycle_root". The wrapper object has another property where the key is the cycle_root, and the value is the decycled data structure. Press "run me" above to see what this looks like. You can also play with the code to see how different data structures are handled.

(update 1/4/12) Third, the original algorithm processes the objects depth first. Now it processes the objects breadth first. To see why this might be useful, consider an array where each element contains a pointer to the next element in the array:

depth-first (old)
{
    "cycle_root": "root_0",
    "root_0": [
        {
            "next": {
                "next": {
                    "next": {
                        "next": "root_0[0]"
                    }
                }
            }
        },
        "root_0[0][\"next\"]",
        "root_0[0][\"next\"][\"next\"]",
        "root_0[0][\"next\"][\"next\"][\"next\"]"
    ]
}
breadth-first (new)
{
    "cycle_root": "root_0",
    "root_0": [
        {
            "next": "root_0[1]"
        },
        {
            "next": "root_0[2]"
        },
        {
            "next": "root_0[3]"
        },
        {
            "next": "root_0[0]"
        }
    ]
}

fuck

so, fuck is a word I sometimes use to help myself start thinking. The canonical example of starring at a white page, not knowing what to write, for me, often begins with writing "FucK" (sometimes with fancy sizes or cases to the letters). The idea is to let myself know that anything is ok to write. There is no quality bar. Sortof like with brainstorming, where they say that anything is ok.

I think in society today, we often think that we're letting anything go in sessions like brainstorming, but we are in fact not doing so. It is really hard to let anything go, since we start to worry about offending people (even the word "fuck" itself is offensive to many people).

In a meta way, this blog is kindof a "fuck" blog, where I allow myself to write anything. No quality bar whatsoever.

Of course, I do have an even more open blog than this (which is currently in Evernote).. and at some point I'd like to merge the two. We'll see how that goes.

glicko rating

I've been using the Elo rating system for talent court. I like that the update rules are very simple. However, people come into the system with a score of 1500, and if they lose, then their score drops below 1500, meaning that they are listed below people who have never played. I feel like this can be discouraging: "I'd be doing better if I never played." One hack I've seen for dealing with this is to only give people a score after they've played a few games, which is what I currently do.

More recently, I read a bit about TrueSkill, and saw a clever trick to deal with this problem. First, TrueSkill represents each player's skill with a gaussian distribution (mean and standard deviation) rather than a single number. If a player wins, their mean will increase, and their standard deviation will generally decrease, since more is known about their skill.

Now here's the trick: instead of showing people their mean skill, we show them their mean skill minus a few standard deviations. Conceptually, we show them a score that we are ~99% sure they are better than. This does something a little counter-intuitive -- if you play your first game, and lose, your score will increase. Actually, your mean score decreased, but your standard deviation decreased more, such that the mean minus three standard deviations actually increased. This has the emotionally pleasing property that people's scores generally always increase the more they play, where better players' scores increase faster.

This system also has a nice way to reduce people's scores over time. Why would we do that? The cynical reason is that we want to keep people playing, by making their score go down if they don't. A more diplomatic reason is something like: because the scores are only meaningful when compared to other people's scores, and because new people are coming into the system all the time, we want to keep people's scores up-to-date. Another way of saying this is that we become less and less sure of a person's score over time. This way of looking at the problem suggests a clever solution: we can represent this decrease in certainty by increasing a player's standard deviation, which has the side-effect we want of reducing the score that they see.

So why is this post titled "glicko rating" instead of "TrueSkill"? Well, TrueSkill is sortof an extension of the Glicko rating system, where the main contribution of TrueSkill seems to be adding support for teams. I don't care about teams for now, and the glicko system is simpler to implement, i.e., the update rules are on wikipedia.

Up above is a widget you can use to see what happens to two player's scores (labeled r, for rating) and standard deviations (labeled RD, for rating deviation, I think). Press one of the "win" buttons to see what happens when that player wins. You can also modify the r's and RD's directly. Note that I show gaussian distributions to represent red's and blue's skill, whereas I've read here that the Glicko system actually uses a logistic distribution.

Up above is also a JavaScript function that implements the update rules. You're free to take this. The glicko system itself is public domain, and so is this function. Note: don't go looking for the "actual" source code for that function in the html. What I "actually" do is call eval on the html pre element that you see above.

talent court

Summary

Check out TalentCourt.

Introduction

This post expands upon the ideas of the quality game post. It is an idea dump that I'd like to turn into a paper at some point (note that some of this 'dump' is written by Allison Moberger, who by coincidence is currently at the top of TalentCourt writing [or was, when this post was originally written], so the bad writing here is probably mine).

Motivation

In short, we want real-time access to experts to perform micro-tasks. For example, we might want to hire an expert JavaScript programmer for the next 10 minutes to help write a utility method. Here are some example use-cases:

Logo design: Imagine designing a logo in the following way: pay 20 expert sketch artists $1 each for a 5 minute sketch. Then use traditional crowdsourcing to find the best 5 sketches. Then pay 5 expert photoshop artists $5 each for a 20 minute mockup of the best sketches. Then use traditional crowdsourcing to find the best mockup, and pay 1 expert designer $20 for 1 hour to put the finishing touches on the best mockup. Now we've paid a bit over $100 for a logo. It would be interesting to see how this logo compares to what is generated for a similar amount on a site like 99 designs. The advantage of this approach is that nobody works for no pay, which may be more efficient, e.g., more quality per dollar.

Micro-outsourcing: Max Goldman has built a system that supports real-time collaborative programming called Collabode. One use-case mentioned in the Collabode paper is micro-outsourcing, where a main programmer delegates tasks, like filling in method bodies of a new class, in real-time as part of their flow. It would be interesting to see if this style of programming actually works. One could also imagine a similar working style for writing, where a main writer writes an outline for paragraphs of a paper, and then hires expert writers in real-time to flesh-out the outliness. This might allow the writer to spend more time thinking at a higher level about the overal ideas and organization of the paper, and "directing" the creation of the paper.

Self-repairing web services: Imagine that you have written a web service that periodically grabs data from one source (e.g. Rhapsody), munges it, and sends it to another service (e.g. last.fm). Now imagine that the data source changes the format of their data, which breaks some regular expressions in the parsing script, raising an exception. It might be cool if the system could handle the exception by automatically hiring an expert to repair the script.

Problem

The essential problem is this: if you are going to hire someone in real-time, you need some very fast method of determining whether they are qualified for your job, and currently there is no such method. There are lots of systems that try to do this, but they all fall short in some way:

Standardized tests: The most commonly used method of identifying skilled applicants is the standardized test. Nearly everyone has been subject to this, as the use of tests like the SAT, ACT, GRE, MCAT, and LSAT are widespread throughout all levels of the academic community. If an applicant performs poorly on a standardized test, the admissions office can simply dismiss the application and move on to others. However, there are downsides to standardized testing: they're expensive to create and administer, difficult to keep secret, and easily-gameable. Rather than learning and understanding all of the math or language through life experience, many simply take classes to study specifically for the test, and this could cause an applicant to receive a high score even though their knowledge is lacking. For online tests, cheating is even harder to prevent. For example, oDesk has standardized tests for many skills, but when I type "odesk java test" into Google, it auto-suggests "odesk java test answers", and the top link has the answers.

Ratings: Sites like eBay and oDesk allow users to rate prior transactions. On oDesk especially, this can create a "cold-start" problem, where one needs work to get a rating, but needs a good rating to get work. Ratings also have an added problem of "grade inflation"; as anything less than the best rating could hurt a candidate's future prospects, employers feel some pressure to hand out the best rating unless something went very bad. They are also static; a past employer's ratings tell a new potential employer nothing about any skills a candidate has gained since they last worked for the past employer. Finally, ratings do not include a notion of how hard a task was, or how closely related it is to the current employers needs.

Portfolio: Artists and designers often present a portfolio of their work that employers can use to judge their skill and style. Unfortunately, portfolios can also be gamed — people can post work that they didn't do. A more suble and less intentional way of cheating is for someone to include work in a portfolio that they contributed to, without explaining exactly what they did. This is understandable in a close collaboration, where it is difficult to tease apart who contributed what, but the net effect in any case is that portfolios cannot always be trusted.

Interviews: Another option, frequently used by technical employers, is the expert interview; that is, applicants have a face-to-face interview with an expert in the field, whose knowledge allows them to evaluate the skill of the applicants. This also tends to be gameable to some degree, as a large portion of the internet seems devoted to preparing interviewees to answer "Microsoft questions"; it also has a problem of scale, as the time and energy required to find the best candidates increases to prohibitive levels very quickly relative to the pool size. Finally, it also has another flaw at the meta-assessment level: how does one identify an expert interviewer, except through an interview with another expert? There is no good way to "assess the assessors".
The last two techniques are not really suitable for real-time hiring anyway, because they are subjective, and generally require a human to spend time converting the material in the portfolio or interview into a yes/no decision about whether to hire them. In practice, this assessment time can take more time than a micro-task itself.

Solution

Our essential idea is to build competitive games around different skills, and use an elo-style rating system to encode the skill of the players. One way to think of these games is as games with a purpose, where the purpose of each game is to evaluate the skill of the players. Depending on the skill, the game may involve players doing some or all of the following:
  • generating questions
  • answering questions
  • comparing answers
Why do we think this will work? We see a number of potential advantages:
  1. By generating random questions, or having players generate new questions as part of the game, we can afford to keep the lifetime of questions very small, to mitigate the danger of people posting "cheat sheets" online.
  2. By comparing questions, we can guard against needing to know how difficult a question is, since it will be the same difficulty for both players, and each player just needs to answer it better than their opponent.
  3. By having humans evaluate questions, we allow for "essay questions", or questions with subjective answers, which guards against many issues with multiple choice questions.
  4. By making each contest small, we allow players to increase their score without blocking out a large time in their schedule (e.g. 5 minutes games as opposed to a 30min or 1hour test).
  5. Because of the dynamic nature of the elo rating system, it notices as a player increases their skill (as opposed to standardized tests, which may require people to wait a month or a year before taking a test again.. presumably to prevent people from simply remembering the questions from the last time).
  6. Again because of the nature of the elo rating system, a player may not need to play many games before the system has a good idea how skilled they are (similar to the adaptive questions in the GRE).
  7. Because scores are relative to other people, the elo rating system can differentiate between people over a very broad range of skill levels, as opposed to standardized tests like the GRE math test, where many people get a perfect score, and the test has no ability to discriminate between these people.
What have we done so far?
TalentCourt currently includes a Writing and Drawing game. Each game proceeds as follows: two users are shown a random prompt (3 random words for writing, and 1 random word for drawing). Each user then has 5 minutes to write a short passage using all three of the words in some meaningful way, or draw the given word. Then three different people are asked to select the best passage or sketch. We previously gave voters instructions like choose "the most natural sounding paragraph", but the current version has no instructions for voting. The passage with the most votes wins, and this user's score increases, while the loser's score decreases. Scores are updated according to the elo rating system (and will probably move to the TrueSkill algorithm soon).

Cheating

We want to use the scores from these games for hiring decisions, which provides a lot of incentive to cheat. We designed the games to mitigate various methods of cheating:

Perhaps the most obvious way to cheat at either game is to use Google. In the English game, one could enter the three words as a query to try to find a paragraph someone else has written using the words. The search results tend to be webpages that contain all 3 words without all the words being close enough together to use as input in the game. Hopefully, under time pressure, finding a short passage with all 3 words in a sensible context is more difficult than writing one from scratch. If it becomes a problem, we could also use four words instead of three.

Similarly, in the Sketch game, one could search the word and find an image, but the data uploaded to the server includes a sequence of painting strokes (rather than the raw pixel data), making it difficult to cheat by uploading a pre-drawn image. This does not guard against the possibility of a user tracing an image or drawing using another image as a reference. (Note: actually, the current implementation does upload the raw image data in addition to the strokes, but this is a temporary work-around for some technical issues.)

A different way to cheat would be to game the votes, rather than the inputs. It is possible for users to have friends, or other accounts, vote for their entries; to help prevent this, we do not reveal the authors of each entry to voters until after their votes have been cast, but there is still the possibility that users could communicate outside the system. We also do not let users choose which contests to vote on, so in a liquid market, the likelihood of voting on a friend's work would hopefully be small.

A related way to cheat would be to have confederate competitors deliberately perform poorly. The design of the game makes this difficult for a number of reasons: users do not get to choose who to compete against, and are not told who they are competing against until the voting on their input is closed; the elo rating system also makes it difficult to gain substantial rating increases from this method, as large score increases are gained from defeating superior opponents, not those who perform poorly.

Future Work

What ideas do we have for the future?

Programming Game: There are two ways this might work:

Way 1: Do it similar to the writing game, but instead of words, we use methods from the standard API of the language being tested, and ask people to write a short program that uses all 3 methods in some meaningful way.

Way 2: Have people generate programming interview questions. This idea is less baked — it is not clear whether people will be good at coming up with questions, so our first tests will involve just asking programmers to come up with questions, to get an idea of what sorts of questions they'll ask.

Graphic Design Game: This could be similar to the sketch game, where there is an HTML-based drawing tool, except this one might include a set of shapes and text that the designer is allowed to move/scale/rotate and set the color of. The prompt might also include a word like "peaceful", "energetic", or "efficient" that the design should convey.