real gl: work notes

I woke up "early" and immediately checked my app. I slept all I could. It's still running.. phew.

my friend at work says "feel free to put your code in public githubs". done. I love my job :)

before doing so, I removed a couple of files completely from the repo history and then re-added them. There were a couple bits of confidential information that I feared might be in these files.

I'm using "connect-mongo" for sessions. There are 185 users, and 7492 sessions. That seems like more sessions than there really are.. I wonder if connect-mongo expects me to purge old ones..

Stack overflow says: the default is to never delete them, so you need to add some configuration:

var sessionStore = new MongoStore({
db: 'myappsession',
clear_interval: 3600
});

app.use(express.session({
secret: "myappsecret",
cookie: { maxAge: 24 * 60 * 60 * 1000 },
store:sessionStore
}));

Note that one time is expressed in seconds, and the other in milliseconds.

(Hm.. I miss the github style markdown for dealing with code blocks. And they look so pretty on github.)

Oh no, when I try git push heroku master, it says "Updates were rejected because the tip of your current branch is behind its remote counterpart"

that's true, I guess, since I removed those two files from the history.. so what now? I think the answer is shower and eat..

before showing, I searched a little to populate my mind with ideas while showering, but the solution is apparently simple: git push heroku master --force

heroku bought it. and the app appears to be running.. oh wait, it popped up my catch-all error dialog saying "something happened, try refreshing". I try refreshing, and it works.. er.. fantastic :) — I'm sure that error is going to be seen by a lot of people and frighten them, I should note the time to see what impact this has on writer throughput — 9:55pm.

well, I still see people doing things in the activity feed, so I guess it didn't frighten everyone away. phew.

now on to showering..

thoughts from shower:

if someone writes an answer and clicks submit, and then sees the error, which tells them to copy-paste their work and then refresh, they may no longer see the interface for answering their question after they refresh. If they don't, it probably means that the answer was successfully submitted, but they wouldn't know that, and may worry that their copy-pasted answer is now useless and un-paid-for.. but now that I think about it, that event is unlikely for a regular error. It is likely if I change the version though. Maybe I should add a message saying that their work was probably successfully submitted if they no longer see the task interface when they refresh..
..this wouldn't be an issue if people were paid by the hour rather than by the task. Then people would just grumble and say: "you're paying me to use a buggy interface.. fine, so long as you pay me."
..I'd like to make a case for pay-by-hour, but that might require getting some statistics on what our effective hourly rate is. I log when people submit tasks, but I don't log when they start tasks.. I should add that, though I can probably get what I need by looking at the time between submissions.. this may make more sense anyway, since we also want to count the time it takes them to navigate the interface selecting their next task.

oh.. I need to grab food. I hope Subway stays awake till 11.. nope.. frozen food from Safeway it is..

I had another thought on the way, which is that it would be nice to have a generic chat window in the app to communicate with the workforce — tell them, "hey, you probably saw an error, don't worry, everything's fine" — and allow them to communicate and commiserate with each other. I've heard of this being done to good success.

there's got to be some generic chat interface, like etherpad, but for chat.. looks like there are several. They cost money. Shoutmix is $50/month for 250 real-time users. How many real-time users will I have? Not sure.. about that many I suppose..

Hm.. after all that thinking, I think I'm going to not add any features for now. The most important thing is to keep the thing stable and running.

hm.. something is nagging in the back of my mind about the scalability of this app. Heroku allows me to add more dynos to support a higher load, fine — but what about the database. I'm not worried about the database growing too large.. but how much cpu am I using on the database?

how much scalability am I really gaining from heroku and mongodb? talentcourt spends 1 to 3 milliseconds processing requests. That's about 300 requests per second. I have no idea whether the mongoHQ database I'm using can handle 300 requests per second.

maybe I should test this. not on my running app, but in a new app. there's a load tester in heroku..

hehe, looking at Load Impact: "We are a cloud service. We use all the latest and most popular buzzwords."

..heroku's Blitz addon seems good. Much cheaper than Load Impact (which as far as I can tell wants.. well.. it's unclear. I think about $100 to simulate 5000 users for 15 minutes). Blitz will let me simulate 5000 users for 1 minute for.. 3 cents. Is that right? It says $1300/month, but I only pay for time I use.. I assume I'm paying for the service, not for each of the 5000 simulated users.. hm..

oh, note to self: I remembered an issue that is going to come up, that I probably do need to fix. I'm currently querying the database for random available tasks in a way that is fast, as long as most of the tasks are available, but will become slower as fewer tasks are available. this will not be trivial to just add an index for, I think.. here's the essence of the query:

find objects where _id >= random_md5_hash and availableToAnswerAt < current_time, sort by _id

if I simply added an index for availableToAnswerAt, there is still the issue of sorting by _id..

hm.. most available tasks will have availableToAnswerAt == 0. Only tasks that have been grabbed by someone will have a larger value. perhaps the right thing is to create an index on both availableToAnswerAt and _id, in that order, and change the query to this:.

find objects where availableToAnswerAt < current_time and _id >= random_md5_hash, sort by availableToAnswerAt and then by _id

that's probably good anyway. It favors un-grabbed tasks over grabbed ones, even if the grabbed ones were grabbed long ago, and may have been abandoned (if those users grab another task, it will reset the availableToAnswerAt to 0).

ok, good.. let's do it.. hm.. need to add _.extend to my utility library. done.

ok, now done with the original task. a couple notes:

- I was complaining before about not being able to use variables as keys in json literals. I experimented with a way of doing this inline. If I want {[a] : 1, [b] : 2}, I would normally do: var x = {}; x[a] = 1; x[b] = 2, but I can do it inline with: _.unPairs([[a, 1], [b, 2]]), where _.unPairs is a utility function that converts an array of tuples into an object.

- also note, the query for getting available tasks is generated dynamically, and I wanted to make sure it was generating the correct query. I would usually do this by putting the query into a temporary variable and printing it out so I could inspect it, but I found it easier to turn on profiling in mongodb to see the query in the log. I discovered I was right about mongoHQ only logging queries over 100ms, that is apparently the default setting in mongodb. It is possible to adjust it like this: db.setProfilingLevel(1,1). The first 1 is the profiling level, and the second 1 is the threshold in milliseconds of queries to log. That second number defaults to 100. Apparently I can't make it 0, which is unfortunate, because it was only luck that my index hadn't built the first time I ran the queries so they managed to take more than 1 millisecond to run. After the index was built, they took less than 1 millisecond, and didn't show up in the log for me to inspect.

real gl

2/11/13

work notes

No comments:

Post a Comment