The Ultimate Website Deployment Solution

Most basic websites tend to be deployed something like this:
  1. Write code
  2. Copy code to website hosting via FTP
  3. Refresh page to see latest changes
And.. that works. But it leaves much to be desired, both in terms of performance and convenience, for you and for your users. I’ll go over some of those issues here, then dive into a little more technical detail in an upcoming post (or posts..) So, here are some of the things we would like to add to the deployment process:
  1. Version Control! If you haven’t used version control before, go check out Subversion. I would also recommend the windows client TortoiseSVN. Honestly, version control has the potential to save your ass by allowing you to revert bad changes. The ability to compare current and past versions of files also makes it an invaluable development tool. Plus a million more things.

    As far as the auto-deployment utopia though, version control is key because it allows you to automatically pull the latest versions of files (or the latest stable versions if you prefer) directly to your production server. No hunting around for changed files and uploading via ftp. Plus, in the worst case, if the update breaks the live version of the site, you can easily revert to how it was before.

  2. Compression. One of the goals of good web development is of course to have pages load as quickly as possible. Yahoo offers a fantastic tool call YUI Compressor, which will compress (or minify) CSS and Javascript files to a fraction of their initial size, while maintaining full functionality. It does this by eliminating unnecessary whitespace, optionally renaming variables (in a consistent manner of course), cutting out redundant braces, etc. You obviously wouldn’t want to actually work on the files in this format, but doing it before making them live helps page loads and essentially costs nothing.

    The more common way of compressing files to speed up downloads is through an algorithm like gzip. Generally this is done by a module built into the web server, such as Apache’s mod_deflate. (The files are then automatically extracted by the web browser once they’re received.) Fortunately, this is not mutually exclusive with the step above. You can (and should) absolutely minify then gzip and stack the gains!

    There’s a better way to do it than just using mod_deflate however. That’s fine for dynamic files (like PHP) that need to be compressed on the fly. But what about static files, like Javascript or CSS? Or even static html files? These don’t change, so what’s the point in waiting to re-compress them ever single time they’re sent to a user? Instead, we would like to gzip these files once, when they’re pushed to the live site, just like we’ll do for minification.

    By the way, if you’re wondering how much of a difference these steps can make, here’s a real world example from one of the SearchTempest results javascript files:

    • Uncompressed: 47.9 kB
    • Minified only: 26.9 kB
    • Gzipped Only: 12.9 kB
    • Minified then Gzipped: 7.4 kB

    Looks worthwhile to me!

  3. Caching (and cache breaking): Modern web browsers have the ability to cache static files to avoid the unnecessary traffic (and time) caused by downloading files that haven’t changed. The problem is, by default the browser has to essentially guess whether a file is likely to have changed since your last time visiting a site, and it can often be wrong. (That’s why in the ‘naive’ deployment steps at the top, you have to refresh the page to get the latest version.) I would rather not expect my users to do that…

    Fortunately, we can do better. Ideally we would like to solve two opposite problems:

    • Files are cached too long, resulting in broken web pages and requiring manual refreshes.
    • Files are cached.. too short, resulting in unnecessary web traffic and delays, fetching files that haven’t changed.

    The solution in a nutshell is to gives files new names every time you change their contents. HUH?? Lemme ‘splain. Say I want to include the script main.js in my homepage. What I will do instead is include main-tv1234.js. (tv=Tempest Version 🙂 ) I have the web server set up so that whenever it sees a file name ending in -tv####.., it automatically strips that part out and internally redirects – in this case to main.js. So the browser thinks it’s getting a file called main-tv1234.js, and what it actually gets is main.js. So far so good. Now let’s say I make some changes to main.js. Now I will have the page call main-tv1235.js instead. From the web browser’s perspective, this is a completely different file, so it goes and grabs it. From the web server’s perspective, it’s still main.js, so you don’t need to go renaming all your files all the time!

    Of course, this probably still sounds tedious to you, and it would be if you had to do any of it manually. Instead, you use variables to reference the static files, and your deployment script automatically generates a database table (or a file) to set those variables. Nothing easier. 😉 And the best part is, now that you never have to worry about stale caches, you can explicitly set caches to expire a full year after download (ie: using mod_expires), reducing unnecessary traffic to practically zero.

    Note that some suggest simply adding an argument to the file call, rather than changing the name; ie main.js?tv=1234. This will work sometimes, but not on all caches, and it’s only marginally easier than the above anyway. There are also plenty of partial solutions out there, using last-modified headers, ETags, etc., but none has performance or the reliability of this technique. (Even setting headers to disable caching completely, aside from being a huge performance hit, in fact does not prevent browsers from caching files in the short term.)

  4. No-Break Updates. One problem with updating a live site (especially a high volume one) via ftp is that it takes some time for the files to be copied. Any users who access the page in mid-transfer will see some hybrid, likely broken, version with some old files and some new. Switching from FTP to SVN doesn’t actually solve this.

    My ‘solution’ to this one is pretty simple. I have a parallel directory tree that I use to grab the newest files from Subversion and perform all the compression and auto-versioning described above. Then, I simply copy all updated files into the live tree at once. Because it’s a local copy rather than over the interwebs, it takes a fraction of a second and therefore mostly eliminates the problem. Of course technically a user could still try to load a page in that split second and get a mixture of files. I haven’t dealt with that yet because, honestly, eventually you just have to work on more important things. 🙂 You could just stop your web server, copy the files, then restart, but that’s like swatting a fly with a sledgehammer. I’m guessing the best way would be to (automatically of course) keep a mirror copy of the live site’s files. Before copying in an update, you would redirect the entire site to the mirror, update, remove the redirect, then update the mirror copy. But ya.. probably not a big deal except for very high volume sites.

And that about covers it. The nice thing is, once these mechanisms are set up, deploying updates to your site(s) is actually simpler than the 3-step process above. My process now looks like this:
  1. Write code and commit changes
  2. Log into server via SSH. (Using PuTTY and Pageant, so I don’t even have to type a username.)
  3. Script asks me which site I want to update: (A)utoTempest, (M)ovieTempest, or (S)earchTempest.
  4. Profit
Actually, to be fair, I also have the script ask for a confirmation after doing all the compression and such, but before pushing to the live site. Better safe than blah blah. But still. As promised, in an upcoming post I’ll get into a bit more detail on how to do the things described above. And of course, there are many more ways to improve your sites’ performance:
  • combining files to reduce http overhead
  • reducing number and size of cookies
  • using cookie-free domains for static files
  • moving non-vital javascript to the bottom of the body
  • pre-caching files for future pages
  • etc.
If you’re interested, I may cover some of those in more detail later too. For now, I’ll leave you with one more great tool: YSlow, also by Yahoo. (Who knew Yahoo had all these great developer tools? YQL anyone?) Basically you feed it a site, and it identifies potential areas for optimization, along the lines of the above. Its recommendations shouldn’t be followed blindly, but when filtered through a human brain, they can be extremely beneficial. OK. Happy coding!