real gl: at scale

so for the first time in my life, a web app I wrote is getting requests about every second.

I want to sleep, but I'm watching a live feed of the database provided by mongoHQ, which is hosting the database, and every request is taking over 100ms. Often 300ms or 600ms.

no query at all is taking under 100ms. Not a single one. The lowest I've seen is 105ms.

I can't find any description of this feed, but I'm guessing it only shows queries that take over 100ms, and that there are other queries happening faster than that. I've run some queries which did not show up in the feed, so I know it's not showing everything.

there was a query showing up a lot in the live feed with times sometimes over a second. It seemed like an index could help. I was going to add an ensureIndex into the code (and I did), but before deploying it, I realized I could just issue the command directly into the mongo terminal, so I did that.

that query has stopped showing up in the feed, and fewer things in general are showing up — though still a fair amount. nothing is obviously asking for an index though..

hm.. things stopped showing up in the feed altogether.. that's kind of scary. the site is still working. I can still run queries in the console. but things do appear to be faster.. hm.. can I conjure up a query that should take more than 100ms and see if it shows up in the feed, to verify that the feed is still working? apparently I cannot. my first attempt needed to scan every document, but finished in 58ms.

well, I'll assume for now that adding that index fixed it. or maybe because it's a shared database, it was being slowed down by something not related to me.

I really wish I understood better where the time goes in web applications. there are so many parts, and I don't know all the parts, and I don't know how long the parts typically take. if this were a desktop app running on my machine, I could profile it, and find out where the time is going. how can I profile a web app? how can I see everything that is going on, or even an statistical aggregate of everything that is going on? I'd really love to see a breakdown accounting for all the time of a web request involving a database access.

(ahh, the live feed is working.. I ran a query that scanned every document with "explain" and it took 140ms, and did show up in the feed. the first thing to show up for a long while. that's nice)

I feel like I'm stuck with looking at request logs, and seeing if there are requests that take a while to complete.

I guess what I really want to know is — how close is my app to reaching it's maximum capacity? what is my app's maximum capacity?

I suppose the limiting factors are cpu and memory for both my app and the database, as well as database locks that limit parallelism..

can heroku tell me what percent of the cpu I'm using on my dynos, or how much memory I'm using?

..the internet says "use new relic". I added new relic. New relic says "no node.js". fine. removed new relic..

..hm.. seems like something called nodetime could work, but it's in beta, and I'm too afraid to try it.

I think I'm just going to go to bed and hope for the best..

real gl

2/10/13

at scale

No comments:

Post a Comment