That post also included an app where readers could answer questions themselves, and then see how turkers and other blog readers answered those questions.
At this point, there are about 40 answers for each question -- so about 20 answers from turkers, and 20 from blog-readers.
Here is an anonymized version of the data as JSON. The main object has three keys: users, questions, and answers*.
This plot shows people's average guesses as a function of the actual answers for a question. A point at X, Y on this plot means that for some question, X*100% of the people answered "yes", but the average guess people made about how many people would answer "yes" was Y*100%.
The next plot shows people's average guesses as a function of the actual answers for a question, grouped by how people answered the question, where blue is "yes" and red is "no". A red point on this plot at X, Y means that for some question, X*100% of the people answered "yes", but the average guess made by those who answered "no" was that Y*100% of the people would answer "yes".
If you run any analysis on the data, I'd love to hear about it! Here are some questions I'd like to know the answers to:
- Are any questions correlated with each other? I'm not sure if there is enough data for this. We need instances where many people answered the same questions.
- Which questions are the most popular? One way to find out may be to look at the "otherQuestion" data*, e.g., a user had a choice between two questions to answer, and they answered this one instead of the "otherQuestion", so perhaps they like this one more.
- Are there any interesting differences between turkers and blog readers?
- How many questions did people answer? I'm sure this follows a power law.
- Are there any interesting results for specific questions?
* Note that some of the answers have extra fields: otherQuestion, otherQuestionAbove, and otherQuestionOlder. These provide data about what other question was visible in the interface when the given answer was provided, and whether it appeared above the answered question, and whether it had been in the interface longer. These fields should only appear in instances where there was only one other question available to answer. Note that the interface begins by showing three questions simultaneously, but it pretends that the top questions are "older" for the purposes of otherQuestionOlder.