Exclude Squid Cache from updatedb.mlocat

One of the Squid servers in a cluster I manage had an unusually high load, while the rest of the Squid servers with a relatively equal amount of connections were within acceptable load levels.

While reviewing the process list, I found that updatedb.mlocat was running during the busiest time of the day. Turns out it is enabled by default in cron.daily on Debian Lenny, and it was using a noticeable amount of CPU resources. Wtf?

It was attempting to index all the files added to the Squid repository. On these servers, I have a separate partition for Squid in /mnt/squid/, and so UpdateDB can’t tell if it’s supposed to keep track of all these files alongside everything else in the system. Luckily, there is an easy solution.

Exclude your Squid cache location via the “PRUNEPATHS” variable in /etc/updatedb.conf (in my case it was /mnt/squid)

I added my Squid Cache location to PRUNEPATHS, then the process dropped off the radar, and server load returned to acceptable levels.

Normalize Accept-Encoding Headers via HAProxy for Squid Cache

I noticed that when browsing one of our heavy traffic website’s under different browsers, I would see completely separate versions of a page depending on which browser I used. We use a cluster of Squid servers to cache the many pages of this particular website, but I believed the only thing that needed to happen was to configure the HAProxy load balancer to balance requests based on URI so each page would always be served by the same Squid server (to optimize hit rate). Apparently not so!

So… what’s the deal?

Apparently, the Vary: Encoding header passed by Squid, in response to the browser requests, is used to make sure browsers receive only pages with encoding, if any, that they support. So, for example, Firefox would tell Squid that it accepts “gzip,deflate” as encoding methods, while Chrome was telling Squid it accepts “gzip,deflate,sdch”. Internet Explorer was only different from Firefox by a single space (“gzip, deflate”), but that was enough for Squid to cache and serve a completely separate object. I felt like this was a waste of resources. The “Big 3” browsers all support gzip and deflate (among other things), so in order to further optimize performance (and hit rate), I decided to normalize the Accept-Encoding headers.

My first attempt was to do something on the Squid servers themselves. However, the “hdr_replace” function I looked into did not have any impact on how Squid actually handles those requests, so I could hdr_replace all day long and each browser would still see separate cache objects for each individual page.

The alternative was to use HAProxy, but it turns out this works well! After a bit of reading, I found the “reqirep” function that allows one to rewrite any header passed down through the back-end servers. It uses regular expressions to do the deed, and after some testing/serverfaulting/luck I ended up adding the following command to the HAProxy backend configuration:

reqirep ^Accept-Encoding:\ gzip[,]*[\ ]*deflate.* Accept-Encoding:\ gzip,deflate

This regex will match for IE (gzip, deflate), Chrome (gzip,deflate,sdhc), and Firefox (gzip,deflate). It should also be noted that Googlebot uses the same header as Firefox. I didn’t bother looking into other browsers, as I was most concerned with the major browsers. If someone wants to contribute a better Regex to get the job done, let me know!

An additional reason normalized Accept-Encoding headers are good is that your cache will be primed much more quickly if everybody that visits your website are retrieving the same version of a page. Less work on your web server farm, database servers, and all that.

Speed is good. Good luck!

Using the ACL in HAProxy for Load Balancing Named Virtual Hosts

Until recently, I wasn’t aware of the ACL system in HAProxy, but once I found it I realized that I have been missing a very important part of load balancing with HAProxy!

While the full configuration settings available for the ACL are listed in the configuration doc, the below example includes the basics that you’ll need to build an HAProxy load balancer that supports multiple host headers.

Here is a quick example haproxy configuration file that uses ACLs:

    log local0
    log local1 notice
    maxconn 4096
    user haproxy
    group haproxy

    log global
    mode http
    option httplog
    option dontlognull
    retries 3
    option redispatch
    maxconn 2000
    contimeout 5000
    clitimeout 50000
    srvtimeout 50000

frontend http-in
    bind *:80
    acl is_www_example_com hdr_end(host) -i example.com
    acl is_www_domain_com hdr_end(host) -i domain.com
    use_backend www_example_com if is_www_example_com
    use_backend www_domain_com if is_www_domain_com
    default_backend www_example_com

backend www_example_com
    balance roundrobin
    cookie SERVERID insert nocache indirect
    option httpchk HEAD /check.txt HTTP/1.0
    option httpclose
    option forwardfor
    server Server1 cookie Server1
    server Server2 cookie Server2

backend www_domain_com
    balance roundrobin
    cookie SERVERID insert nocache indirect
    option httpchk HEAD /check.txt HTTP/1.0
    option httpclose
    option forwardfor
    server Server1 cookie Server1
    server Server2 cookie Server2

In HAProxy 1.3, the ACL rules are placed in a “frontend” and (depending on the logic) the request is proxied through to any number of “backends”. You’ll notice in our frontend entitled “http-in” that I’m checking the host header using the hdr_end feature. This feature performs a simple check on the host header to see if it ends with the provided argument.

You can find the rest of the Layer 7 matching options by searching for “7.5.3. Matching at Layer 7” in the configuration doc I linked to above. A few of the options I didn’t use but you might find useful are path_beg, path_end, path_sub, path_reg, url_beg, url_end, url_sub, and url_reg. The *_reg commands allow you to perform RegEx matching on the url/path, but there is the usual performance consideration you need to make for RegEx (especially since this is a load balancer).

The first “use_backend” that matches a request will be used, and if none are matched, then HAProxy will use the “default_backend”. You can also combine ACL rules in the “use_backend” statements to match one or more rules. See the configuration doc for more helpful info.

If you’re looking to use HAProxy with SSL, that requires a different approach, and I’ll blog about that soon.

Generate a PKCS #12 (PFX) Certificate from Win32 CryptoAPI PRIVATEKEYBLOB

We had an accounting system that used a Microsoft Win32 CryptoAPI blob to encrypt/decrypt credit card information for recurring customer information. It was time for an upgrade to .NET land. Keith, the lead developer for this project, decided it would be beneficial to switch to x509 certificates for improved key management (and I wasn’t going to argue).

So what we physically used to encrypt/decrypt cards in the legacy system was a PRIVATEKEYBLOB and our ultimate goal was to use a certificate in the PKCS #12 format. My system at the office is Windows XP, and I wanted to use OpenSSL to accomplish the task of converting the private key blob to something more suitable for our new system, but I didn’t want to transmit any of our top secret keys across the VPN or even across the network for that matter.

OpenSSL did not begin supporting PRIVATEKEYBLOB as an acceptable format until 1.0.0 Beta, but 0.9.8h was the only Windows binary readily available. So I grabbed the OpenSSL source (here) and compiled it using GCC within Cygwin. If you don’t have Cygwin (get it here), it’s very easy to get started, and you can select from a large variety of Linux packages during setup. So, during setup, look for GCC and make sure you enable it.

Here’s how to compile OpenSSL 1.0.0 Beta on your native Linux environment or with Cygwin:

[code lang=”bash”]$> cd /usr/local/
$> wget http://www.openssl.org/source/openssl-1.0.0-beta3.tar.gz
$> tar -xzf openssl-1.0.0-beta3.tar.gz
$> cd openssl
$> ./config && make && make install && make clean[/code]

If something broke during install, check the online docs, or re-run Cygwin setup to make sure you selected the gcc toolset. I’ll assume from this point forward you are using OpenSSL 1.0 in either a native Linux or a Cygwin environment. If you aren’t sure, start OpenSSL and type “version” to check your ::drumroll please:: version number.

Let’s get started.

The OpenSSL command below will take your PRIVATEKEYBLOB and output an RSA private key in PEM format. Please note the use of “MS\ PRIVATEKEYBLOB” instead of the alternative “PRIVATEKEYBLOB”. Backspace is required to escape the blank space after “MS” in Linux when passed as a parameter on the command line. So, if all goes well, you should have a PEM file. If it doesn’t, try specifying a different input form (e.g. DER or PRIVATEKEYBLOB instead of MS\ PRIVATEKEYBLOB).

[code lang=”bash”]$> openssl rsa -inform MS\ PRIVATEKEYBLOB -outform PEM -in private.pvk -out private.pem[/code]

Now that we have a PEM file with an RSA private key, we can generate a new certificate based on that private key (command below). This will generate an x509 certificate valid for 5 years. Once you run this you’ll be prompted with the usual country/state/city/company information, but what you specify there is up to you. I would recommend adding a passkey when it prompts you at the end

[code lang=”bash”]$> openssl req -new -x509 -key private.pem -out newcert.crt -days 1825[/code]

If all continues to go well, you should have a private key in PEM format and your brand new certificate. One last command is needed to generate the PKCS #12 (aka PFX) certificate bundle.

[code lang=”bash”]$> openssl pkcs12 -export -in newcert.crt -inkey private.pem -name “My Certificate” -out myCert.p12[/code]

If you didn’t receive any errors, then congratulations! You can now import this PKCS12 bundle into any Windows certificate repository and no longer need to hard code blobs into your code.

Hope this helps save someone a few hours time.

How to Copy or Move a Joomla Website


If you manage one or more Joomla websites, eventually you’ll have to move them elsewhere. It’s pretty much fact. Performance requirements will change, you’ll find better pricing elsewhere, your dedicated server died, etc.

There are a few options most people have to choose from:

  • FTP client and phpMyAdmin method (aka the long, boring method)
  • SSH/Shell method (aka the cool, quick method)
  • PHP system() method (aka middle of the road and kind of fun method)
  • Joomla “clone” component (your mom could do it)

Move Joomla with an FTP Client and phpMyAdmin

  1. Download the entire Joomla website via FTP client (you’re using S-FTP to connect, right?)
  2. Use phpMyAdmin to export a SQL dump of your database
  3. Upload the entire Joomla website via FTP client
  4. Use phpMyAdmin on the new server to import the SQL dump from the old website
  5. Update configuration.php:
    1. Update the MySQL database credentials
    2. Update the tmp/logs path
    3. If you use FTP Layer, update the credentials
  6. Update .htaccess to match any changed server requirements

Easy and straightforward. Long, slow process, but any Jr. Network Admin  could handle this for you if you don’t want to get your hands dirty.

Clone Joomla with SSH (shell) Access

  1. Login to your server via SSH
  2. Browse to your Joomla website root
  3. Run these commands:
    [code lang=”bash”]tar -czf ../backup-example-com-20090619.tar.gz .

    mv ../backup-example-com-20090619.tar.gz ./

    mysqldump -u yourUsername -p -h yourMySQLHostname yourDatabaseName > backup-example-com-20090619.sql[/code]

  4. Do you need to move this to a remote server or another location on the same server?
    1. Local Path
      1. Copy both backup files to the new website root
      2. Browse to the new Joomla website root
    2. Remote Path
      1. Login to remote server via SSH
      2. Browse to the new Joomla website root
      3. Use wget to download the archive and SQL dump to this server:
        [code lang=”bash”]wget http://www.example.com/backup-example-com-20090619.tar.gz
        wget http://www.example.com/backup-example-com-20090619.sql[/code]
  5. Run this command:
    [code lang=”bash”]tar -xzf backup-example-com-20090619.tar.gz[/code]
  6. Run this command (assuming you have made a new, blank database)
    [code lang=”bash”]mysql -u yourNewUsername -p -h yourNewMySQLHost yourNewDatabase  < backup-example-com-20090619.sql[/code]
  7. Update configuration.php & .htaccess as shown in the first example

More complicated (obviously), but if you like doing things the hard fun way, then it’s a great way to go.

Using PHP’s system() or back-tick Commands to Copy Joomla Website

I wasn’t made aware of this method until after I started managing a whole slew of websites in a cloud hosting platform (Scale My Site). Cloud hosting (and many shared hosting platforms) do not provide access to SSH because it’s simply not feasible. Cloud hosting in particular due to your website running across hundreds of different server nodes. You can perform the same functions as the SSH procedure above using system execution commands in PHP.

  1. Create a new file called copy-me.php
  2. Write the following code into this file:
    [code lang=”php”]echo `tar -czf ../backup-example-com-20090619.tar.gz . && mv ../backup-example-com-20090619.tar.gz ./`;
    echo `mysqldump -u yourUsername -p -h yourMySQLHostname yourDatabaseName > backup-example-com-20090619.sql`;[/code]
  3. Execute the PHP file by accessing it from a browser:
    Browser> http://www.example.com/copy-me.php
  4. Create a new file on the destination website called update-me.php
  5. Write the following code into this file:
    [code lang=”php”]echo `wget http://www.example.com/backup-example-com-20090619.tar.gz`;
    echo `wget http://www.example.com/backup-example-com-20090619.sql`;
    echo `tar -xzf backup-example-com-20090619.tar.gz`;
    echo `mysql -u yourNewUsername -p -h yourNewMySQLHost yourNewDatabase  < backup-example-com-20090619.sql`;[/code]
  6. Execute the PHP file by accessing it from your browser:
    Browser> http://www.exampleDestination.com/update-me.php
  7. Update the configuration.php and .htaccess files as needed

Cool, huh?

Use a Backup or Clone Component from Joomla Extension Directory

If you prefer not to do anything yourself, and want to keep it as simple as possible, then a backup component from the JED is the way to go:


I have only used one of those components before, and I found that there were a few bugs needing to be worked out, and it ended up taking more time to do the backup, move, and clone that I needed to do than when I did so manually.


There are shortcuts you can take here depending on your environment. For instance, you never need to create archives at all, as you can pipe the mysqldump output directly to another mysql command (with the new database’s credentials). However, I prefer to use archives and solid files especially when using PHP-based method, because you could end up accidentaly accessing the cloner file and wiping an existing MySQL database (if you aren’t careful). So, on top of all this, I’d recommend removing the update-me and copy-me files after using them.

Cloud Hosting – Scaling Websites the Easy Way

One often has to make a choice when it comes to website hosting. You weigh the variables and decide on the best solution for your hosting needs. Cloud hosting makes this decision a WHOLE lot easier. Let’s break it down.

Price. You want to get the best deal possible. Shared hosting probably comes to mind first. In the classic sense, shared hosting means a company has a server, and they load as many websites onto this server in order to make the most profit from one server. Sometimes, this can mean hundreds of websites on one box. One box… susceptible to the same physical hardware limitations as any other server. Sure, they might even include RAID, redundant power supplies, and a lot of disk space.

However, what happens when your website actually starts getting traffic? I had an experience where my company put their trust in a shared hosting company (*cough* Dreamhost *cough*). When it came down to it, one of our websites had a lot of visitors one evening, and after battling to keep things running smoothly, the host ultimately disabled our website via renaming the index file to index.php_disabled_by_host. Seriously? So much for saving money and “unlimited” space and bandwidth… which brings me to my next point.

Scalability. If you have a website that has outgrown shared hosting, what is your next move? Many people consider purchasing dedicated equipment for their website. A dedicated server is usually the first move. Not enough? Scaling out from this point then usually requires the purchase of another dedicated server and a load balancer, then it just gets pricier from there with a dedicated database server, file servers, caching servers, and more to handle growing traffic and load. We’re talking a significant amount of expenses just to get the ability to scale.

Scale My Site is the answer. The concept of a cloud host is that it takes the best of the scalable, dedicated world and lets you just pay for what you use. You put your website in the cloud and instantly your application is scaled across multiple webservers. Your files are stored on a redundant SAN mirrored across many physical drives. Database queries are performed on powerful, multi-node database clusters. You don’t have to think about “how am I going to handle all of that traffic?” because it just happens automatically. You no longer have to think about “do I need a Windows or Linux based account?”. It doesn’t matter. You can run ASP.NET applications side-by-side PHP web sites. It’s the cloud that doesn’t mind – it’s cool with whatever you want to do. I highly recommend checking out Ninja Systems, the cloud hosting company, if you are serious about scaling your website, and if you don’t want to waste your time recreating another scalable infrastructure that you need to manage yourself.

How-to Backup Joomla! 1.5 to Amazon S3 with Jets3t

Introduction to backing up a Joomla website to Amazon S3 storage using Jets3t.

We all know backups are important. I’ve found what I consider a pretty good backup solution using Amazon S3. It’s super cheap, your backups are in a secure location, and you can get to them from anywhere. For my backup solution, I’m using Debian Linux (Etch), but this whole setup is not dependent on your current favorite flavor of Linux because it uses Java.

  1. Signup for Amazon S3: http://aws.amazon.com/s3/
  2. Install the latest Java Runtime Environment: http://java.sun.com/javase/downloads/index.jsp
  3. Download Jets3t: http://jets3t.s3.amazonaws.com/downloads.html
  4. Extract Jets3t installation to a location on your server.Example: /usr/local/jets3t/
  5. Add your AWS account key and private key to the “synchronize” tool configuration file:Example: /usr/local/jets3t/configs/synchronize.properties
  6. Use an S3 browser tool like Firefox S3 Organizer to add two buckets: one for file backups and one for MySQL backups.
  7. Add a MySQL user whose primary function is dumping data. Let’s call it ‘dump’ with the password ‘dump’:
    [code lang=”bash”]mysql>GRANT SELECT, LOCK TABLES ON exampleDB.* to ‘dump’ identified by ‘dump’;[/code]
  8. Build your backup script (replace paths with your own) called s3backup.sh:
    [code lang=”bash”]JAVA_HOME=/usr/local/j2re1.4.2_17
    export JAVA_HOME
    export JETS3T_HOME
    # Perform backup logic
    dayOfWeek = `date +%a`
    mysqldump -u dump -pdump exampleDB | gzip > “${MYSQLDUMPDIR}/${dumpSQL}”
    # Compress the website into an archive
    cd ${WWWROOT}
    tar -czf “${WWWDUMPDIR}/${dumpWWW}” .
    # Perform Jets3t synchronize with Amazon S3
    $SYNC –quiet –nodelete UP “${WWWBUCKET}” “${WWWDUMPDIR}/${dumpWWW}”
    rm -f “${WWWDUMPDIR}/${dumpWWW}”
    $SYNC –quiet –nodelete UP “${MYSQLBUCKET}” “${MYSQLDUMPDIR}/${dumpSQL}”
    rm -f “${MYSQLDUMPDIR}/${dumpSQL}”[/code]
  9. Make sure your script has execute permission
  10. Add a cron job to perform daily backups:
    [code lang=”bash”]$>crontab -e
    0 0 * * * /root/s3backup.sh[/code]

That’s it. Good luck!

Tips on Load Balancing a Joomla Cluster with HAProxy

For the past several weeks, I have been working with Joomla in a clustered environment. We have a single load-balancer running HAProxy that sends requests to two web servers synchronized with unison. One server is a hybrid and includes both the MySQL database as well as Apache2/PHP5. The other web server is strictly Apache2/PHP5. We have been renting two super fast dedicated servers temporarily until we acquire some new hardware, so I had to make do with what few servers I had.

Update: Having written this blog post almost a full year ago, I have since then completely switched all of my Joomla websites to the automatically scaling website cloud host: Scale My Site. Since doing so, we haven’t had to deal with HAProxy, load balancing, or anything with regard to scaling due to the hosting cloud’s seamlessly clustered environment. I highly recommend anyone reading this article right now to check out cloud hosting to get load balancing/scaling for your Joomla website without breaking a sweat.

The load balancer is located at our own colo. I followed the tutorial on Setting Up A High-Availability Load Balancer (With Failover and Session Support) With HAProxy/Heartbeat On Debian Etch to set up two servers at our colo in an ActivePassive fashion using Heartbeat for redundancy.

Weighted Load Balancing

Since I’m using only two web servers and one needs to serve database requests, I decided to set weights in HAProxy so that the hybrid server receives half as many requests as the dedicated web server. Here is an example of what my haproxy.cfg file contains:


    log local0
    log local1 notice
    maxconn 4096
    user haproxy
    group haproxy

    log global
    mode http
    option httplog
    option dontlognull
    retries 3
    maxconn 2000
    contimeout 5000
    clitimeout 50000
    srvtimeout 50000

listen webfarm
    mode http
    balance roundrobin
    cookie SERVERID insert nocache indirect
    option forwardfor
    option httpchk HEAD /check.txt HTTP/1.0

    # Stats
    stats enable
    stats auth admin:password

    # Web Node
    server SBNode1 cookie Server1 weight 20 check
    # Web + MySQL Node
    server SBNode2 cookie Server2 weight 10 check

How to Use the Administrator Control Panel in a Joomla Cluster

Many people understand that it’s a super big pain to work with the administrator control panel in a Joomla clustered environment. First of all, you’ll keep getting kicked out every few page requests, even while using sticky/persistent load balancing. Second, working with backend WYSIWYG rich-text editors is nearly impossible. I figured out how to do it, and here’s what I did.

  1. Decide upon the management node
  2. Give the management node a public host entry in DNS (e.g. node1.yourdomain.com)
  3. Open configuration.php for editing
  4. Locate the “live site” variable ($mosConfig_live_site)
  5. Replace with “http://” . $_SERVER[“HTTP_HOST”];
  6. Save

Using the current host as the live site allows you to use node1.yourdomain.com as an access point for the control panel. You can work in the control panel without doing this, but you will run into tons of problems with rich-text editors and custom components that request the live site URL in their underlying code.

Update: Recently, I implemented a load balancing solution using HAProxy that used the ACL system to send all traffic with /administrator/ in the URL to one “master” node, and it provided a way around the Joomla configuration change mentioned above. Check out this blog post for more info.

XenServer 3.2.0: Upgrade Debian Linux from Sarge to Etch

If you are running XenServer 3.2.0, then you have a built in Debian Sarge image. If you happen to want to upgrade an instance to Debian Etch (the latest stable build as of February 1, 2008), you should follow these steps. It isn’t a simple apt-get dist-upgrade command as other websites may have you believe. The following steps are a summary of what commands I performed while following the official upgrade guide.

If you aren’t running a XenServer instance, then I would suggest following the official guide yourself to prevent anything bad from happening (Debian Etch Upgrade Guide). My instance that I used for this installation was a fresh install of the Sarge image, so I won’t be going into any special circumstances that may need to be addressed by those who have installed a whole load of extras.

Let’s go!

Choose a Mirror
Go to Debian Mirror List and select a mirror. I happened to choose http://ftp.us.debian.org/debian as my mirror because I’m in the United States.

Update /etc/apt/sources.list with…

deb http://ftp.us.debian.org/debian etch main contrib

Perform the Upgrade

rm /etc/apt/preferences
mount -o remount,rw /
aptitude update
aptitude upgrade
aptitude install initrd-tools
aptitude dist-upgrade
aptitude update

See? Not so hard. That is all I needed to do to upgrade my XenServer 3.2.0 Debian Sarge instance to Debian Etch. I am not saying these simplified steps will work for everyone, but for those few that have the same type of setup as we do, these instructions should simplify the upgrade process. Please comment with questions and/or suggestions… and if all else fails, use the official guide!

– Matt