I ran an experiment on mturk here, and then I decided that I made a mistake here. I reran the experiment, and here are the results from that (note that only 90 people did it before the HIT expired).
The results were similar, but not quite as good. The most common answer was again 50%, followed by three-way tie between 100%, 75% and 70%. If we restrict ourselves to just people who guessed that the most common answer would be 50%, then the most common answer is again 50%, followed by a three-way tie between 67%, 33% and 20%. This result isn't as nice as before, where the next most common answer was just 66%. However, both 67% and 33% are more reasonable than 100% or 75%. I'm not sure where 20% is coming from.
I also ran an experiment with the same idea, but a different brain teaser:
Imagine you have 9 identical looking balls, and a weighing scale. All the balls weigh the same except one, which is heavier.
How many weighings on the scale do you need in order to identify the heavier ball?
What answer do you think most other people will give?
The results from this are here. Note that only 64 people did the HIT before it expired. Also note that there was more variance in how people answered: some people added words, and some of the answers appear to be percentages rather than numbers. The most common answers were 3 and 4 with 19% and 17% respectively. If we restrict ourselves to people who guessed that the most common answer would be 3 or 4, then all the answer are 3 and 4, except one person who answered 8. Only 3 people gave the correct answer of 2, but two of these people guessed that most people would answer 2, and one of these people guessed that most people would answer 1 -- which seems sketchy to me.
So, I'm less confident in this technique. I still think there may be some clever way to get turkers to reason through difficult brain teasers -- maybe it would help to ask people why they gave the answer they did, and have other people rate those bits of reasoning?