Exclude Squid Cache from updatedb.mlocat

One of the Squid servers in a cluster I manage had an unusually high load, while the rest of the Squid servers with a relatively equal amount of connections were within acceptable load levels.

While reviewing the process list, I found that updatedb.mlocat was running during the busiest time of the day. Turns out it is enabled by default in cron.daily on Debian Lenny, and it was using a noticeable amount of CPU resources. Wtf?

It was attempting to index all the files added to the Squid repository. On these servers, I have a separate partition for Squid in /mnt/squid/, and so UpdateDB can’t tell if it’s supposed to keep track of all these files alongside everything else in the system. Luckily, there is an easy solution.

Exclude your Squid cache location via the “PRUNEPATHS” variable in /etc/updatedb.conf (in my case it was /mnt/squid)

I added my Squid Cache location to PRUNEPATHS, then the process dropped off the radar, and server load returned to acceptable levels.

Using the ACL in HAProxy for Load Balancing Named Virtual Hosts

Until recently, I wasn’t aware of the ACL system in HAProxy, but once I found it I realized that I have been missing a very important part of load balancing with HAProxy!

While the full configuration settings available for the ACL are listed in the configuration doc, the below example includes the basics that you’ll need to build an HAProxy load balancer that supports multiple host headers.

Here is a quick example haproxy configuration file that uses ACLs:

global
    log 127.0.0.1 local0
    log 127.0.0.1 local1 notice
    maxconn 4096
    user haproxy
    group haproxy
    daemon

defaults
    log global
    mode http
    option httplog
    option dontlognull
    retries 3
    option redispatch
    maxconn 2000
    contimeout 5000
    clitimeout 50000
    srvtimeout 50000

frontend http-in
    bind *:80
    acl is_www_example_com hdr_end(host) -i example.com
    acl is_www_domain_com hdr_end(host) -i domain.com
    
    use_backend www_example_com if is_www_example_com
    use_backend www_domain_com if is_www_domain_com
    default_backend www_example_com

backend www_example_com
    balance roundrobin
    cookie SERVERID insert nocache indirect
    option httpchk HEAD /check.txt HTTP/1.0
    option httpclose
    option forwardfor
    server Server1 10.1.1.1:80 cookie Server1
    server Server2 10.1.1.2:80 cookie Server2

backend www_domain_com
    balance roundrobin
    cookie SERVERID insert nocache indirect
    option httpchk HEAD /check.txt HTTP/1.0
    option httpclose
    option forwardfor
    server Server1 192.168.5.1:80 cookie Server1
    server Server2 192.168.5.2:80 cookie Server2

In HAProxy 1.3, the ACL rules are placed in a “frontend” and (depending on the logic) the request is proxied through to any number of “backends”. You’ll notice in our frontend entitled “http-in” that I’m checking the host header using the hdr_end feature. This feature performs a simple check on the host header to see if it ends with the provided argument.

You can find the rest of the Layer 7 matching options by searching for “7.5.3. Matching at Layer 7” in the configuration doc I linked to above. A few of the options I didn’t use but you might find useful are path_beg, path_end, path_sub, path_reg, url_beg, url_end, url_sub, and url_reg. The *_reg commands allow you to perform RegEx matching on the url/path, but there is the usual performance consideration you need to make for RegEx (especially since this is a load balancer).

The first “use_backend” that matches a request will be used, and if none are matched, then HAProxy will use the “default_backend”. You can also combine ACL rules in the “use_backend” statements to match one or more rules. See the configuration doc for more helpful info.

If you’re looking to use HAProxy with SSL, that requires a different approach, and I’ll blog about that soon.

How-to Backup Joomla! 1.5 to Amazon S3 with Jets3t

Introduction to backing up a Joomla website to Amazon S3 storage using Jets3t.

We all know backups are important. I’ve found what I consider a pretty good backup solution using Amazon S3. It’s super cheap, your backups are in a secure location, and you can get to them from anywhere. For my backup solution, I’m using Debian Linux (Etch), but this whole setup is not dependent on your current favorite flavor of Linux because it uses Java.

  1. Signup for Amazon S3: http://aws.amazon.com/s3/
  2. Install the latest Java Runtime Environment: http://java.sun.com/javase/downloads/index.jsp
  3. Download Jets3t: http://jets3t.s3.amazonaws.com/downloads.html
  4. Extract Jets3t installation to a location on your server.Example: /usr/local/jets3t/
  5. Add your AWS account key and private key to the “synchronize” tool configuration file:Example: /usr/local/jets3t/configs/synchronize.properties
  6. Use an S3 browser tool like Firefox S3 Organizer to add two buckets: one for file backups and one for MySQL backups.
  7. Add a MySQL user whose primary function is dumping data. Let’s call it ‘dump’ with the password ‘dump’:
    [code lang=”bash”]mysql>GRANT SELECT, LOCK TABLES ON exampleDB.* to ‘dump’ identified by ‘dump’;[/code]
  8. Build your backup script (replace paths with your own) called s3backup.sh:
    [code lang=”bash”]JAVA_HOME=/usr/local/j2re1.4.2_17
    export JAVA_HOME
    JETS3T_HOME=/usr/local/j3ts3t
    export JETS3T_HOME
    SYNC=/usr/local/jets3t/bin/synchronize.sh
    WWWROOT=/var/www/fakeuser/
    MYSQLBUCKET=example-bucket-mysql
    WWWBUCKET=example-bucket-www
    MYSQLDUMPDIR=/usr/local/mysql-dumps
    WWWDUMPDIR=/usr/local/www-dumps
    # Perform backup logic
    dayOfWeek = `date +%a`
    dumpSQL=”backup-www-example-com-${dayOfWeek}.sql.gz”
    dumpWWW=”backup-www-example-com-${dayOfWeek}.tar.gz”
    mysqldump -u dump -pdump exampleDB | gzip > “${MYSQLDUMPDIR}/${dumpSQL}”
    # Compress the website into an archive
    cd ${WWWROOT}
    tar -czf “${WWWDUMPDIR}/${dumpWWW}” .
    # Perform Jets3t synchronize with Amazon S3
    $SYNC –quiet –nodelete UP “${WWWBUCKET}” “${WWWDUMPDIR}/${dumpWWW}”
    rm -f “${WWWDUMPDIR}/${dumpWWW}”
    $SYNC –quiet –nodelete UP “${MYSQLBUCKET}” “${MYSQLDUMPDIR}/${dumpSQL}”
    rm -f “${MYSQLDUMPDIR}/${dumpSQL}”[/code]
  9. Make sure your script has execute permission
  10. Add a cron job to perform daily backups:
    [code lang=”bash”]$>crontab -e
    0 0 * * * /root/s3backup.sh[/code]

That’s it. Good luck!

Tips on Load Balancing a Joomla Cluster with HAProxy

For the past several weeks, I have been working with Joomla in a clustered environment. We have a single load-balancer running HAProxy that sends requests to two web servers synchronized with unison. One server is a hybrid and includes both the MySQL database as well as Apache2/PHP5. The other web server is strictly Apache2/PHP5. We have been renting two super fast dedicated servers temporarily until we acquire some new hardware, so I had to make do with what few servers I had.

Update: Having written this blog post almost a full year ago, I have since then completely switched all of my Joomla websites to the automatically scaling website cloud host: Scale My Site. Since doing so, we haven’t had to deal with HAProxy, load balancing, or anything with regard to scaling due to the hosting cloud’s seamlessly clustered environment. I highly recommend anyone reading this article right now to check out cloud hosting to get load balancing/scaling for your Joomla website without breaking a sweat.

The load balancer is located at our own colo. I followed the tutorial on Setting Up A High-Availability Load Balancer (With Failover and Session Support) With HAProxy/Heartbeat On Debian Etch to set up two servers at our colo in an ActivePassive fashion using Heartbeat for redundancy.

Weighted Load Balancing

Since I’m using only two web servers and one needs to serve database requests, I decided to set weights in HAProxy so that the hybrid server receives half as many requests as the dedicated web server. Here is an example of what my haproxy.cfg file contains:

/etc/haproxy.cfg

global
    log 127.0.0.1 local0
    log 127.0.0.1 local1 notice
    maxconn 4096
    user haproxy
    group haproxy

defaults
    log global
    mode http
    option httplog
    option dontlognull
    retries 3
    redispatch
    maxconn 2000
    contimeout 5000
    clitimeout 50000
    srvtimeout 50000

listen webfarm 63.123.123.100:80
    mode http
    balance roundrobin
    cookie SERVERID insert nocache indirect
    option forwardfor
    option httpchk HEAD /check.txt HTTP/1.0

    # Stats
    stats enable
    stats auth admin:password

    # Web Node
    server SBNode1 63.123.123.101:80 cookie Server1 weight 20 check
    # Web + MySQL Node
    server SBNode2 63.123.123.102:80 cookie Server2 weight 10 check

How to Use the Administrator Control Panel in a Joomla Cluster

Many people understand that it’s a super big pain to work with the administrator control panel in a Joomla clustered environment. First of all, you’ll keep getting kicked out every few page requests, even while using sticky/persistent load balancing. Second, working with backend WYSIWYG rich-text editors is nearly impossible. I figured out how to do it, and here’s what I did.

  1. Decide upon the management node
  2. Give the management node a public host entry in DNS (e.g. node1.yourdomain.com)
  3. Open configuration.php for editing
  4. Locate the “live site” variable ($mosConfig_live_site)
  5. Replace with “http://” . $_SERVER[“HTTP_HOST”];
  6. Save

Using the current host as the live site allows you to use node1.yourdomain.com as an access point for the control panel. You can work in the control panel without doing this, but you will run into tons of problems with rich-text editors and custom components that request the live site URL in their underlying code.

Update: Recently, I implemented a load balancing solution using HAProxy that used the ACL system to send all traffic with /administrator/ in the URL to one “master” node, and it provided a way around the Joomla configuration change mentioned above. Check out this blog post for more info.

XenServer 3.2.0: Upgrade Debian Linux from Sarge to Etch

If you are running XenServer 3.2.0, then you have a built in Debian Sarge image. If you happen to want to upgrade an instance to Debian Etch (the latest stable build as of February 1, 2008), you should follow these steps. It isn’t a simple apt-get dist-upgrade command as other websites may have you believe. The following steps are a summary of what commands I performed while following the official upgrade guide.

If you aren’t running a XenServer instance, then I would suggest following the official guide yourself to prevent anything bad from happening (Debian Etch Upgrade Guide). My instance that I used for this installation was a fresh install of the Sarge image, so I won’t be going into any special circumstances that may need to be addressed by those who have installed a whole load of extras.

Let’s go!

Choose a Mirror
Go to Debian Mirror List and select a mirror. I happened to choose http://ftp.us.debian.org/debian as my mirror because I’m in the United States.

Update /etc/apt/sources.list with…

deb http://ftp.us.debian.org/debian etch main contrib

Perform the Upgrade


rm /etc/apt/preferences
mount -o remount,rw /
aptitude update
aptitude upgrade
aptitude install initrd-tools
aptitude dist-upgrade
aptitude update

See? Not so hard. That is all I needed to do to upgrade my XenServer 3.2.0 Debian Sarge instance to Debian Etch. I am not saying these simplified steps will work for everyone, but for those few that have the same type of setup as we do, these instructions should simplify the upgrade process. Please comment with questions and/or suggestions… and if all else fails, use the official guide!

– Matt