real gl: bayesian truth serum data

In a previous post, I took 100 yes/no questions and asked 20 turkers to answer each question. I also asked each turker to predict how other people would answer each question. [update: this is a refined version of this post]

That post also included an app where readers could answer questions themselves, and then see how turkers and other blog readers answered those questions.

At this point, there are about 40 answers for each question -- so about 20 answers from turkers, and 20 from blog-readers.

Here is an anonymized version of the data as JSON. The main object has three keys: users, questions, and answers*.

This plot shows people's average guesses as a function of the actual answers for a question. A point at X, Y on this plot means that for some question, X*100% of the people answered "yes", but the average guess people made about how many people would answer "yes" was Y*100%.

My main takeaway from this plot is that it seems like people are conservative in their guesses.

The next plot shows people's average guesses as a function of the actual answers for a question, grouped by how people answered the question, where blue is "yes" and red is "no". A red point on this plot at X, Y means that for some question, X*100% of the people answered "yes", but the average guess made by those who answered "no" was that Y*100% of the people would answer "yes".

I was about to write that people tend to over-predict that people will answer the same way they did, but that is not quite true. For instance, on questions where 90% of people say "yes", the people who say "yes" predict that only 70% will say "yes". Hence, I should instead say that people who answered "yes" to a question tend to predict that the outcome will have more yesses than predicted by those people who answered "no" to the same question.

If you run any analysis on the data, I'd love to hear about it! Here are some questions I'd like to know the answers to:

Are any questions correlated with each other? I'm not sure if there is enough data for this. We need instances where many people answered the same questions.
Which questions are the most popular? One way to find out may be to look at the "otherQuestion" data*, e.g., a user had a choice between two questions to answer, and they answered this one instead of the "otherQuestion", so perhaps they like this one more.
Are there any interesting differences between turkers and blog readers?
How many questions did people answer? I'm sure this follows a power law.
Are there any interesting results for specific questions?

* Note that some of the answers have extra fields: otherQuestion, otherQuestionAbove, and otherQuestionOlder. These provide data about what other question was visible in the interface when the given answer was provided, and whether it appeared above the answered question, and whether it had been in the interface longer. These fields should only appear in instances where there was only one other question available to answer. Note that the interface begins by showing three questions simultaneously, but it pretends that the top questions are "older" for the purposes of otherQuestionOlder.

real gl

5/26/12

bayesian truth serum data

No comments:

Post a Comment