Normalize Accept-Encoding Headers via HAProxy for Squid Cache

I noticed that when browsing one of our heavy traffic website’s under different browsers, I would see completely separate versions of a page depending on which browser I used. We use a cluster of Squid servers to cache the many pages of this particular website, but I believed the only thing that needed to happen was to configure the HAProxy load balancer to balance requests based on URI so each page would always be served by the same Squid server (to optimize hit rate). Apparently not so!

So… what’s the deal?

Apparently, the Vary: Encoding header passed by Squid, in response to the browser requests, is used to make sure browsers receive only pages with encoding, if any, that they support. So, for example, Firefox would tell Squid that it accepts “gzip,deflate” as encoding methods, while Chrome was telling Squid it accepts “gzip,deflate,sdch”. Internet Explorer was only different from Firefox by a single space (“gzip, deflate”), but that was enough for Squid to cache and serve a completely separate object. I felt like this was a waste of resources. The “Big 3” browsers all support gzip and deflate (among other things), so in order to further optimize performance (and hit rate), I decided to normalize the Accept-Encoding headers.

My first attempt was to do something on the Squid servers themselves. However, the “hdr_replace” function I looked into did not have any impact on how Squid actually handles those requests, so I could hdr_replace all day long and each browser would still see separate cache objects for each individual page.

The alternative was to use HAProxy, but it turns out this works well! After a bit of reading, I found the “reqirep” function that allows one to rewrite any header passed down through the back-end servers. It uses regular expressions to do the deed, and after some testing/serverfaulting/luck I ended up adding the following command to the HAProxy backend configuration:

reqirep ^Accept-Encoding:\ gzip[,]*[\ ]*deflate.* Accept-Encoding:\ gzip,deflate

This regex will match for IE (gzip, deflate), Chrome (gzip,deflate,sdhc), and Firefox (gzip,deflate). It should also be noted that Googlebot uses the same header as Firefox. I didn’t bother looking into other browsers, as I was most concerned with the major browsers. If someone wants to contribute a better Regex to get the job done, let me know!

An additional reason normalized Accept-Encoding headers are good is that your cache will be primed much more quickly if everybody that visits your website are retrieving the same version of a page. Less work on your web server farm, database servers, and all that.

Speed is good. Good luck!

Improve Scalability and Performance in ASP.NET Apps

If you’re looking to scale out ASP.NET applications, here is an interesting article that goes into length on important aspects to improve performance for .NET applications in high-traffic environments.

    The article covers the following topics:

  • Optimizing the ASP.NET pipeline system
  • ASP.NET, AJAX caching, and what you need to know
  • Deploying ASP.NET from a staging to a production environment
  • Optimizing the ASP.NET process configuration
  • Using a Content Delivery Network (CDN) with your ASP.NET apps
  • ASP.NET 2.0 Membership tables
  • Progressive UI loading for a smoother, end-user, browser experience
  • Optimizing ASP.NET 2.0 Profile provider

Original article: Performance & Scalability in ASP.NET

Cloud Hosting – Scaling Websites the Easy Way

One often has to make a choice when it comes to website hosting. You weigh the variables and decide on the best solution for your hosting needs. Cloud hosting makes this decision a WHOLE lot easier. Let’s break it down.

Price. You want to get the best deal possible. Shared hosting probably comes to mind first. In the classic sense, shared hosting means a company has a server, and they load as many websites onto this server in order to make the most profit from one server. Sometimes, this can mean hundreds of websites on one box. One box… susceptible to the same physical hardware limitations as any other server. Sure, they might even include RAID, redundant power supplies, and a lot of disk space.

However, what happens when your website actually starts getting traffic? I had an experience where my company put their trust in a shared hosting company (*cough* Dreamhost *cough*). When it came down to it, one of our websites had a lot of visitors one evening, and after battling to keep things running smoothly, the host ultimately disabled our website via renaming the index file to index.php_disabled_by_host. Seriously? So much for saving money and “unlimited” space and bandwidth… which brings me to my next point.

Scalability. If you have a website that has outgrown shared hosting, what is your next move? Many people consider purchasing dedicated equipment for their website. A dedicated server is usually the first move. Not enough? Scaling out from this point then usually requires the purchase of another dedicated server and a load balancer, then it just gets pricier from there with a dedicated database server, file servers, caching servers, and more to handle growing traffic and load. We’re talking a significant amount of expenses just to get the ability to scale.

Scale My Site is the answer. The concept of a cloud host is that it takes the best of the scalable, dedicated world and lets you just pay for what you use. You put your website in the cloud and instantly your application is scaled across multiple webservers. Your files are stored on a redundant SAN mirrored across many physical drives. Database queries are performed on powerful, multi-node database clusters. You don’t have to think about “how am I going to handle all of that traffic?” because it just happens automatically. You no longer have to think about “do I need a Windows or Linux based account?”. It doesn’t matter. You can run ASP.NET applications side-by-side PHP web sites. It’s the cloud that doesn’t mind – it’s cool with whatever you want to do. I highly recommend checking out Ninja Systems, the cloud hosting company, if you are serious about scaling your website, and if you don’t want to waste your time recreating another scalable infrastructure that you need to manage yourself.