Exclude Squid Cache from updatedb.mlocat

One of the Squid servers in a cluster I manage had an unusually high load, while the rest of the Squid servers with a relatively equal amount of connections were within acceptable load levels.

While reviewing the process list, I found that updatedb.mlocat was running during the busiest time of the day. Turns out it is enabled by default in cron.daily on Debian Lenny, and it was using a noticeable amount of CPU resources. Wtf?

It was attempting to index all the files added to the Squid repository. On these servers, I have a separate partition for Squid in /mnt/squid/, and so UpdateDB can’t tell if it’s supposed to keep track of all these files alongside everything else in the system. Luckily, there is an easy solution.

Exclude your Squid cache location via the “PRUNEPATHS” variable in /etc/updatedb.conf (in my case it was /mnt/squid)

I added my Squid Cache location to PRUNEPATHS, then the process dropped off the radar, and server load returned to acceptable levels.

Normalize Accept-Encoding Headers via HAProxy for Squid Cache

I noticed that when browsing one of our heavy traffic website’s under different browsers, I would see completely separate versions of a page depending on which browser I used. We use a cluster of Squid servers to cache the many pages of this particular website, but I believed the only thing that needed to happen was to configure the HAProxy load balancer to balance requests based on URI so each page would always be served by the same Squid server (to optimize hit rate). Apparently not so!

So… what’s the deal?

Apparently, the Vary: Encoding header passed by Squid, in response to the browser requests, is used to make sure browsers receive only pages with encoding, if any, that they support. So, for example, Firefox would tell Squid that it accepts “gzip,deflate” as encoding methods, while Chrome was telling Squid it accepts “gzip,deflate,sdch”. Internet Explorer was only different from Firefox by a single space (“gzip, deflate”), but that was enough for Squid to cache and serve a completely separate object. I felt like this was a waste of resources. The “Big 3” browsers all support gzip and deflate (among other things), so in order to further optimize performance (and hit rate), I decided to normalize the Accept-Encoding headers.

My first attempt was to do something on the Squid servers themselves. However, the “hdr_replace” function I looked into did not have any impact on how Squid actually handles those requests, so I could hdr_replace all day long and each browser would still see separate cache objects for each individual page.

The alternative was to use HAProxy, but it turns out this works well! After a bit of reading, I found the “reqirep” function that allows one to rewrite any header passed down through the back-end servers. It uses regular expressions to do the deed, and after some testing/serverfaulting/luck I ended up adding the following command to the HAProxy backend configuration:

reqirep ^Accept-Encoding:\ gzip[,]*[\ ]*deflate.* Accept-Encoding:\ gzip,deflate

This regex will match for IE (gzip, deflate), Chrome (gzip,deflate,sdhc), and Firefox (gzip,deflate). It should also be noted that Googlebot uses the same header as Firefox. I didn’t bother looking into other browsers, as I was most concerned with the major browsers. If someone wants to contribute a better Regex to get the job done, let me know!

An additional reason normalized Accept-Encoding headers are good is that your cache will be primed much more quickly if everybody that visits your website are retrieving the same version of a page. Less work on your web server farm, database servers, and all that.

Speed is good. Good luck!

Google Search Gets a Facelift

I was trying to be productive and did a quick Google search when I noticed that the SERPs (search engine results pages) had just gotten a facelift. Apparently not everyone is seeing this yet, so Google is probably slow-releasing these changes, but here is a preview!

Screen-shot #1 shows the new, clean header with side navigation:

This second screen-shot shows the updated (and more colorful) pagination:

I like it. It appears that location-based search will play a larger role in upcoming search results, as the currently set location is directly under the search box.

Sorry, China, but you probably won’t get to see the new style anytime soon. :/

How to Install SQL Server Management Studio 2005 on Vista 64-bit

If you have tried to install the 64-bit version of SQL Server Management Studio on Vista 64-bit, then you have probably run into the following error:

Product: Microsoft SQL Server Management Studio Express — The installer has encountered an unexpected error installing this package. This may indicate a problem with this package. The error code is 29506.

The problem is that User Access Controls (UAC) interferes with the installation process and causes it to fail. In order to get around the UAC issue, follow these few simple steps:

  1. Save the SQL Server Management Studio MSI file to, say, “C:\temp”
  2. Go to Start > All Programs > Accessories
  3. Right-click on “Command Prompt”, and click “Run as Administrator”
  4. Once you open the command prompt, browse to “C:\temp”
  5. Run the MSI file by typing in the name of the file and hitting Enter

The installation will run using the administrator privileges inherited by the command prompt.

You should now be free of any error code 29506. Good luck!

Delete Duplicate Rows/Records in MySQL Table

Most articles on removing duplicate rows from a MySQL table involve 3 steps, but the following query is what I use for purging dupe records in one simple query.

DELETE FROM `myTable` WHERE id NOT IN (SELECT t1.id FROM (SELECT id, groupByColumn FROM `myTable` ORDER BY id DESC) as t1 GROUP BY t1.groupByColumn)

– “myTable” is the name of the table with duplicate rows
– “id” is the name of the primary key identifier in “myTable”
– “groupByColumn” is the name of the column used to differentiate records as duplicates

Example: Table of Videos with the duplicate match being made on the “title” field.

DELETE FROM `videos` WHERE id NOT IN (SELECT t1.id FROM (SELECT id, title FROM `videos` ORDER BY id DESC) as t1 GROUP BY t1.title)

It’s a good SQL query to save or bookmark for those times when you need do some maintenance or cleanup during development.

Improve Scalability and Performance in ASP.NET Apps

If you’re looking to scale out ASP.NET applications, here is an interesting article that goes into length on important aspects to improve performance for .NET applications in high-traffic environments.

    The article covers the following topics:

  • Optimizing the ASP.NET pipeline system
  • ASP.NET, AJAX caching, and what you need to know
  • Deploying ASP.NET from a staging to a production environment
  • Optimizing the ASP.NET process configuration
  • Using a Content Delivery Network (CDN) with your ASP.NET apps
  • ASP.NET 2.0 Membership tables
  • Progressive UI loading for a smoother, end-user, browser experience
  • Optimizing ASP.NET 2.0 Profile provider

Original article: Performance & Scalability in ASP.NET

Recovery.gov Search Engine Conspiracy?

CNET reports that the new government website Recovery.gov has been using a special file to prevent being indexed by search engines. The robots.txt file, which has since been removed, contained the following code to prevent Google, Yahoo, and other reputable search engines from indexing any of it’s content:

# Deny all search bots, web spiders
User-agent: *
Disallow: /

The website hailed by Obama and driven by the motivation to have a “transparent” government is attempting to hide the content on it’s website? Seems kind of fishy to me.

Personally, I believe it’s a simple mistake on the part of the developers, and that disallowing search engine traffic via robots.txt is a common practice. However, I think it’s a topic worthy of discussion.

Original CNET article here.

Cloud Hosting – Scaling Websites the Easy Way

One often has to make a choice when it comes to website hosting. You weigh the variables and decide on the best solution for your hosting needs. Cloud hosting makes this decision a WHOLE lot easier. Let’s break it down.

Price. You want to get the best deal possible. Shared hosting probably comes to mind first. In the classic sense, shared hosting means a company has a server, and they load as many websites onto this server in order to make the most profit from one server. Sometimes, this can mean hundreds of websites on one box. One box… susceptible to the same physical hardware limitations as any other server. Sure, they might even include RAID, redundant power supplies, and a lot of disk space.

However, what happens when your website actually starts getting traffic? I had an experience where my company put their trust in a shared hosting company (*cough* Dreamhost *cough*). When it came down to it, one of our websites had a lot of visitors one evening, and after battling to keep things running smoothly, the host ultimately disabled our website via renaming the index file to index.php_disabled_by_host. Seriously? So much for saving money and “unlimited” space and bandwidth… which brings me to my next point.

Scalability. If you have a website that has outgrown shared hosting, what is your next move? Many people consider purchasing dedicated equipment for their website. A dedicated server is usually the first move. Not enough? Scaling out from this point then usually requires the purchase of another dedicated server and a load balancer, then it just gets pricier from there with a dedicated database server, file servers, caching servers, and more to handle growing traffic and load. We’re talking a significant amount of expenses just to get the ability to scale.

Scale My Site is the answer. The concept of a cloud host is that it takes the best of the scalable, dedicated world and lets you just pay for what you use. You put your website in the cloud and instantly your application is scaled across multiple webservers. Your files are stored on a redundant SAN mirrored across many physical drives. Database queries are performed on powerful, multi-node database clusters. You don’t have to think about “how am I going to handle all of that traffic?” because it just happens automatically. You no longer have to think about “do I need a Windows or Linux based account?”. It doesn’t matter. You can run ASP.NET applications side-by-side PHP web sites. It’s the cloud that doesn’t mind – it’s cool with whatever you want to do. I highly recommend checking out Ninja Systems, the cloud hosting company, if you are serious about scaling your website, and if you don’t want to waste your time recreating another scalable infrastructure that you need to manage yourself.

Canon 5D Mark II is Insane

If you haven’t yet heard of the Canon EOS 5D Mark II… I’ll fill you in.

It’s a brand new camera from Canon that features 21 Megapixel resolution with insanely sharp HD video recording. One of the first people to get their hands on the camera was Vincent Laforet, who produced a short film shot exclusively with the 5D using a wide array of lenses, and created REVERIE

My friend Ron Hsu already blogged about it earlier, but when you see the quality of this camera, you’ll agree the good word deserves to be spread.

If anyone decides to get one when it is officially released, I’d highly recommend uploading your videos to Infinovision so you will retain the highest quality videos possible.

Amazing!

How-to Build Your Own Joomla Video Community

You want people to take your Joomla website seriously. You want them to think that your “video gallery” makes your site pretty awesome. You want them to believe that you are the only one who has figured out how to embed videos from YouTube and all other 3rd-party video hosting sites… and you don’t want them to accidentally click on any of those 3rd-party site ads that take them away from your website.

What do you do?

To build your own Joomla video community from the ground up, you will need to:

  1. Create your own video player – most likely it would be in built in Flash
  2. Convert raw videos to an Internet-appropriate size & quality
  3. Upload your converted videos to your server (or hosting provider’s server)
  4. Keep a close eye on your server resources: disk space and bandwidth in particular. Videos will fill up your free space much more quickly than images, web pages, or any other type of file.
  5. Install a Joomla video gallery component that provides support for using your own files.
  6. Optional: If you want to allow users to “stream” videos (meaning they can jump around different parts of the video without downloading the entire thing), it will usually require a separate streaming server, and someone with the technical know-how to get things running smoothly.

For those who don’t have the opportunity, time, know-how, or resources to do the above, then what other option do you have if you don’t want to have low-quality videos with another business’ branding?

The answer: JVideo!

JVideo is a Joomla video gallery component – with it you can upload videos, organize videos, stream videos, and more. You don’t need to purchase additional hardware or worry about your videos directing your customers elsewhere. JVideo uses an API powered by Infinovision, so all you need to do is setup an account, install the component, and you’re good to go. It’s a new way to think about video hosting. In fact, you pretty much don’t have to think about it.

Check out the video below from the JVideo demo site and compare it to the YouTube equivalent (try full-screen for a real eye-opener):

YouTube’s version:

Check out the JVideo demo to see what they’re all about.