One of the most difficult realities for web developers to face is that their application code, elegant and beautiful as it may (or may not) be, does not run in the ivory tower of Code Perfection. It runs on a real machine (or several) in a real data center, competing for resources to serve real clients, and tripping over all-too-real limitations of the environment.
Operations people, those shadowy, pager-carrying folks that developers call "sysadmins", know that there is so much more to delivering a web application to its clients than simply deploying code. Web applications are not delivered the way packaged software was in the 90's, on a shrink-wrapped CD-ROM like a book. Web applications are not products at all, they are services, and services don't get to say "bring your own computer." Services must be delivered complete, with an entire stack of running programs and systems underneath them.
A web application, whether Java, Ruby, Python, PHP, or LOLcode, is incomplete until it is paired with a stack of servers and services on which to run it. Which language runtime must be installed? Which version of which web server(s)? How should the database server be tuned? How much RAM allocated to memcached? When should the logs be rotated? Developers often do not even think about these questions. When they do, the answers are usually provided as a narrative requirements list which some dedicated systems engineers must translate into a working system somehow.
Systems automation has now reached the point where this infrastructure can be delivered as code right along with the application code. Every web application should be delivered with Puppet configurations or Chef cookbooks to bring up a precisely tuned deployment stack designed for the application. Cloud-based infrastructure means you can even deliver the (virtual) hardware itself with the application. A good web application should come with a "deploy_to_ec2" script for instant production deployment.
Of course, there are other opinions. You may choose to outsource your operations work to a platform-as-service like Heroku or App Engine. If you want to live in a code-only world where infrastructure never crosses your mind, write your code to target deployment environments like these, and get used to the constraints they impose.
In my opinion, every web development team needs a systems engineer embedded as part of the team, developing and codifying the infrastructure alongside the application code. A web application delivered without infrastructure automation is incomplete.
2011-07-09
2011-06-19
Web Analytics for Operations
Web analytics packages, from free to exorbitant, have grown in complexity over the life of the web. That's great news for marketers using the web as a tool to deliver a message to an audience. These tools allow them to measure audience reach, time spent viewing a page, return visits, session length, and other useful customer engagement factors that helps shape the business strategy.
Unfortunately, while the marketers have won some great tools, where does that leave the techies who need to operate the infrastructure? We don't need to know how long a visitors spent on the site, nor to measure the difference between a "page view" and an "interaction", we need to know how many requests per second the application will generate. Where marketing-oriented analytics goes to great pains to filter out automated crawlers, we desperately need to know when a rampant robot is eating up server resources.
There isn't much in the way of off-the-shelf software to fit our needs. Mostly, we grow our own solutions, cobbled together with a tool here and a tool there.
Lately I've had a need to do some log analysis over a large farm of Apache web servers. I looked at a few open source packages that I knew about: AWStats and Webalizer being the perhaps the best known. But I wasn't happy with either of these solutions. I wanted a tool that would allow me to aggregate not just hits, but time spent generating each page (in milliseconds), and I wanted to break down traffic by five minute increments for a detailed shape in my graphs. So finally, and somewhat reluctantly, I settled on analog.
Analog is not pretty nor user-friendly by any means. The configuration file is touchy and somewhat arcane, and its convention for command line parameters is non-standard. However, analog generates 44 different reports, including time breakdowns from annual down to my desired five minute interval, reports for successes, failures, redirects, and other interesting outcomes, and a processing time report with fine resolution. It can read compressed log files, and it has no problem processing files out of chronological order.
Most importantly, analog is blazingly fast. It chewed through my 20 million lines of compressed Apache logs in six minutes. The speed at which it consumes log files seems to be limited more by I/O rate than CPU, though as a single-process, single threaded application, analog will only tax one of your CPU cores. If you find CPU a limiting factor on a multi-core system, you might try decompressing the files using gzip and piping the output to analog. This allows the decompression to happen in a separate process, and therefore on a separate CPU core, but I don't know if that would speed things up much.
I'm still not entirely pleased with this solution. I would prefer a solution that was a little more intuitive, and a little easier to customize. Analog has plenty of knobs to turn, but there is no built-in extension mechanism, so it makes me work pretty hard to pull out custom metrics.
I would love to hear what other folks are using to analyze your Apache logs. How do you get operational intelligence? Are you using remote logging? Shoot me an email or leave a comment.
Unfortunately, while the marketers have won some great tools, where does that leave the techies who need to operate the infrastructure? We don't need to know how long a visitors spent on the site, nor to measure the difference between a "page view" and an "interaction", we need to know how many requests per second the application will generate. Where marketing-oriented analytics goes to great pains to filter out automated crawlers, we desperately need to know when a rampant robot is eating up server resources.
There isn't much in the way of off-the-shelf software to fit our needs. Mostly, we grow our own solutions, cobbled together with a tool here and a tool there.
Lately I've had a need to do some log analysis over a large farm of Apache web servers. I looked at a few open source packages that I knew about: AWStats and Webalizer being the perhaps the best known. But I wasn't happy with either of these solutions. I wanted a tool that would allow me to aggregate not just hits, but time spent generating each page (in milliseconds), and I wanted to break down traffic by five minute increments for a detailed shape in my graphs. So finally, and somewhat reluctantly, I settled on analog.
Analog is not pretty nor user-friendly by any means. The configuration file is touchy and somewhat arcane, and its convention for command line parameters is non-standard. However, analog generates 44 different reports, including time breakdowns from annual down to my desired five minute interval, reports for successes, failures, redirects, and other interesting outcomes, and a processing time report with fine resolution. It can read compressed log files, and it has no problem processing files out of chronological order.
Most importantly, analog is blazingly fast. It chewed through my 20 million lines of compressed Apache logs in six minutes. The speed at which it consumes log files seems to be limited more by I/O rate than CPU, though as a single-process, single threaded application, analog will only tax one of your CPU cores. If you find CPU a limiting factor on a multi-core system, you might try decompressing the files using gzip and piping the output to analog. This allows the decompression to happen in a separate process, and therefore on a separate CPU core, but I don't know if that would speed things up much.
I'm still not entirely pleased with this solution. I would prefer a solution that was a little more intuitive, and a little easier to customize. Analog has plenty of knobs to turn, but there is no built-in extension mechanism, so it makes me work pretty hard to pull out custom metrics.
I would love to hear what other folks are using to analyze your Apache logs. How do you get operational intelligence? Are you using remote logging? Shoot me an email or leave a comment.
2011-06-12
Occam's Moving Parts
As an architect of complex applications, I spend my day aggressively applying Occam's Razor, attempting to simplify large systems by removing as much as possible. But the nature of the work is such that the system can never be truly simple. No matter how much I try to simplify, I am left with that feeling that there are too many moving parts.
As a geek, I apply a systems approach to almost everything in my life. I have a system for preparing meals, a system for loading the dish washer, a system for folding my underwear. I can't perform an activity more than once without thinking about optimizing and systemizing it somehow. I am always looking for patterns, and I am always looking for that piece that just doesn't fit.
This blog is intended to be a collection of my observations and ponderings on the systems of the world, particularly but not exclusively those in the technology and business realms. What are the moving parts and how do they fit together? How can we apply Occam's Razor to them? Which parts can be removed, and which parts are essential?
Like most of my writing, I expect to bore almost everyone, but hopefully fascinate and engage a few people.
As a geek, I apply a systems approach to almost everything in my life. I have a system for preparing meals, a system for loading the dish washer, a system for folding my underwear. I can't perform an activity more than once without thinking about optimizing and systemizing it somehow. I am always looking for patterns, and I am always looking for that piece that just doesn't fit.
This blog is intended to be a collection of my observations and ponderings on the systems of the world, particularly but not exclusively those in the technology and business realms. What are the moving parts and how do they fit together? How can we apply Occam's Razor to them? Which parts can be removed, and which parts are essential?
Like most of my writing, I expect to bore almost everyone, but hopefully fascinate and engage a few people.