If we want to build a model of the steps involved in this process, we might consider three things:

- I : the distribution of description qualities when someone writes a description from scratch.
- M : the distribution of description qualities when someone writes a description based on a description of some quality.
- V : the probability of a voter correctly picking the description of higher quality given two descriptions with some difference in quality (presumably the probability would be 0.5 if the descriptions had equal quality).

Here is the data for M (each row has two number X, Y, where someone took a description of quality X and wrote a description of quality Y)

Here is the data for V (each row has three numbers, X, Y, Z, where someone saw a description of quality X and Y, and they voted for the "correct" one iff Z = 1)

Here's a histogram plot of I (R command: hist(I$rating, breaks=0:10) )

Here's a sortof 2d histogram (looking down) of M (thanks to Panos Ipeirotis for these R commands):

> library(KernSmooth)

> m <- read.csv("M.txt")

> model <- bkde2D(m, bandwidth=c(2,2), gridsize=c(200, 200), truncate=TRUE)

> filled.contour(model$x1, model$x2, model$fhat, xlim=range(c(1,10)), ylim=range(c(1,10)), nlevels=50, col=terrain.colors(70, alpha =1.0))

Here's an even better plot from Panos Ipeirotis:

> library(som)

> m <- read.csv("M.txt")

> model <- bkde2D(m, bandwidth=c(dpik(m$r1),dpik(m$r2)), gridsize=c(150,

150), truncate=TRUE)

> filled.contour(model$x1, model$x2, normalize(model$fhat, byrow=TRUE),

xlim=range(c(3,10)), ylim=range(c(3,10)), nlevels=40,

col=terrain.colors(50, alpha =1.0))

## No comments:

## Post a Comment