In the early days of my IT career, we had to actually physically own the hardware and contract with a colocation facility to host your servers. My fingers were worn numb by crimping cables to wire up our cage with connectivity. And, not that it happened that often, we would sometimes actually call our hosting facility and ask them to manually reboot servers.
We have come a long way since then.
With services like Heroku, procuring servers and deploying apps is now as easy as typing in a few commands on your computer, and increasing the number of machines that you need is as easy as clicking and dragging your mouse.
That said -- since it's so easy to scale up the size of your cluster at a service like Heroku, it's also really simple to quickly scale up your monthly Heroku bill.
At some point, instead of continually throwing resources (eg. money) at the problem, it's probably a good idea to try and optimize and tune your application. The first step that most people take is to add some basic availability monitoring. We use Pingdom to monitor the availability of our site -- which is a great layer on top of the de facto monitoring system of someone calling you and complaining that "the website is down." However, Pingdom monitors your website from the frontend, which means that their availability & performance metrics are measuring the whole stack, which would include, most notably, network latency and AWS issues. While this is really important to understand, when trying to optimize the app itself, we found that we needed a little more insight.
So, we decided to add two more add-ons to our Heroku app, Papertrail and New Relic. If your application generates a lot of console logs, like ours does, then Papertrail is a great way to store that output and make it actually useful. Papertrail makes your logs searchable -- and as long as your console logs actually write informative messages, then it can be very, very helpful in tracking down errors or performance issues in your app. Specifically with Heroku, it can be very frustrating when you encounter the dreaded "H10" error code, that is thrown not by your app, but by Heroku's system. But, at the very least, with Papertrail, you'll be able to see when these errors occur
Once you isolate the network issues, AWS issues and Heroku issues, you're finally able to start working on optimizing your own app. And for this, we installed New Relic. After integrating
Our application is written in node.js, so the New Relic tools are still officially in beta, but we still found them to be quite useful in providing a modicum of reporting about our applications' performance. Once we integrated the newrelic agent, we began getting helpful metrics describing the performance of our app. Their "Apdex" was a decent measure of our app's performance, and it definitely helped us figure out what and when performance issues were happening.
The difficult part, however, was trying to figure out the why. The process behind figuring out the cause of any performance issue was a weighty challenge. Every time the Apdex told us that we were performing sub-optimally, we would diagnose and solve the issue. Sometimes it would be external, like a network issue, that would cause our app to slow down, and other times it would really be caused by an error in our code. One very helpful integration between Heroku and New Relic is that deployments are clearly marked in the dashboard -- that way, we could clearly see when deployments happened, and if it positively (or negatively) affected performance. One helpful thing to add would be indications of AWS, network or Heroku issues.
That said, determining the "why" continues to be a constant chore in performance tuning -- especially with a language like node.js where external packages are constantly upgraded and changed. I understand how difficult it would be to make an agent that would identify the reason behind performance issues, but, just a few years ago, it would have been unfathomable to procure, set up and deploy clusters of servers with only a single command. So, I'm hopeful.