real gl: work notes

apparently one user couldn't log in. I tried logging in with a brand new test oDesk account, and succeeded, so I don't think login is broken completely.. somehow this one user is getting an error, and from what they've told me, it sounds like a legitimate error. My best guess about what could be causing it is this code:

var o = new odesk(odeskApiKey, odeskApiSecret)
o.OAuth.accessToken = token
o.OAuth.accessTokenSecret = tokenSecret
o.get('auth/v1/info', function(err, data) {
if (err) return done(err)
var user = {
_id : data.auth_user.uid,
name : data.auth_user.first_name + " " + data.auth_user.last_name,
img : data.info.portrait_100_img,
country : data.info.location.country,
profile : data.info.profile_url
}
db.collection('users').insert(user, function () {
done(null, user)
})
})

..specifically the part in bold. As far as I can tell, that is the only piece of code that could be different for a particular user. I could see an exception being thrown if data.auth_user was null, or data.info, or more likely data.info.location.

I tried to find the exception in the logs — assuming there was an exception thrown somewhere — but heroku only gives me the most recent 1500 lines, and those quickly get consumed with normal request logs. I wish it kept the most recent 1500 error logs separately.

I'm thinking about storing my own error logs within the database. I could use a mongodb capped collection to make sure they don't grow too big if some error becomes very frequent.

First, I want to find out where exactly an exception thrown in the code above would have gone.. does express catch it? (If not, how does it serve up an HTTP 500?)

Ok, express does catch the exception. So express probably has some way for me to register a callback to be executed when exceptions happen..

Or better yet, there's an addon called Loggly that will store.. for free.. 25mb of logs.. which looks like it will be about 100,000 log items. And it keeps them for 24 hours. So assuming I don't hit my cap, I could tell the person to try logging in again, and the error should show up in Loggly..

But they probably won't login until tomorrow morning, and I'll probably wake up several hours after they do. I wonder how quickly I consume 100,000 logs. Let's look at the timestamp of the oldest log of the 1500 heroku will give me.. looks like 2.2 hours, so 1500/2.2 = 682 per hour, over 24 hours is 16,363, so the entry will probably survive until I get up. Of course, given that it's about 2am, the last 2.2 hours probably aren't the most eventful of the day. If the site had ten times as much traffic during the day, it would give me a 14 hour window between when they login, and when I check..

And I would need to make sure to check — if I forgot, and checked the next day, it would be gone..

And I'll need to search for the error in the logs. Loggly has a search tool, but the error will consume multiple lines. It's probably easy to find the first line by searching for "error", but will Loggly show me the lines immediately after that? Hm.. fiddling with the interface, it's not immediately clear to me how to even search for "error". They have a sort of command line, and I don't know what I'm meant to type in it.

I think I'll just log the exceptions to a "logs" collection, and make it a capped collection in mongodb.

Hm.. we're going to run "db.createCollection('logs', {capped:true, size:10000})" in the database. I wonder what a good place to note that is.. I could put it in the readme file.. it would be better if the code itself created this. I'll have to try creating it twice, and see if that's a problem. Or maybe there's a way to check for its existence first.. ok, apparently it doesn't care if I call it twice, so I'll just create it at the beginning, right after connecting to the database.. there's no callback for waiting to connect to the database, so I'll just assume it's connected, or that I'm allowed to pretend as though it is..

ok, done. generic error logging added. Let's push to heroku, and test it there.. Success! Now I'll just e-mail that person, tell them to try logging in, and see what my error log collection has to say tomorrow.

..I've been thinking about logging and monitoring.. I'd like some elegant catch-all solution to having a sense for whether the app is working, and how close it is to breaking down. I feel like the logs are probably a good heart-beat for the app, and they have timing information in them.

I got rid of Loggly and added Logentries. It doesn't have a size quota, just a day quota, and even the free version is 7 days. Also, it says it will e-mail me if any heroku specific bad log messages appear, saying things like "memory limit exceeded" and such.

I wish Logentries, or one of these other loggers, would parse the timing information from the request logs, and tell me averages times for various requests.. it seems like the monitoring tools want to profile my app by embedding code in it, which I'm wary about.

ok.. well.. I'm going to go to bed soon. I told the people using this app that they could add more people to it.. currently there are about 250. They might bring that number up to 600. I think I'm going to add a couple more dynos tonight, for peace of mind. I only pay by the hour, so that peace of mind will cost $2.50. Worthwhile I would say.

Hm.. I went to the logs to see if I could see my new dynos at work, but I don't see them. I still see references to web.1 and web.2, but no web.3 or web.4... oh.. Logentries doesn't populate the whole list of log entries, I need to keep scrolling to the bottom before it fetches more. Ok, I see web.3 and web.4. Horray.

all the dynos are working so hard. they're so cute. serving requests.. and such.. please don't get overwhelmed dynos!

on to bed.

real gl

2/12/13

work notes

No comments:

Post a Comment