BUILDING A CLUSTER OF LOAD-BALANCED WEB SERVERS

The purpose of this article is to explain how to build a cluster of load-balanced web servers using the 'nix distribution of your choice. The cluster I built was a personal project, built at home on a single cable modem connection with a dynamic IP address using discarded equipment. My objective was no more than to be the first kid on my block to have load-balanced servers running in his basement. The resulting complex serves a single copy of the html using a single cgi directory, both of which are remotely maintainable so that my son and his friends can work on their sites from any of their houses. The design and techniques are scalable to larger and better-equipped commercial installations such as you may be considering. The article will show how my cluster was built and tested in phases and explain some of the puzzles which I encountered along the way.

OVERVIEW

First we need to understand what we are going to build. We are going to expand a site's web server capacity on a single connection by processing visitors in parallel rather than by distributing the load over multiple connections on different IP addresses. The latter can be accomplished via round-robin DNS (Domain Name Services) though it could also be accomplished via the redirection technique used here. In our scheme, the visitor will arrive at the default port 80 at the site's single IP address and then he will be forwarded by a screening router to the primary web server. The primary web server will respond with a page which will take the visitor to another port on the same IP address. When the visitor returns on the new port, the screening router will forward him to the secondary web server which is listening on that port. With thousands of free port numbers, secondary servers can be added until either the primary server on port 80 is saturated, the bandwidth of the connection or the screening router is exceeded, or the heat dissipation from the servers becomes too intense. In other words, the web serving process becomes scalable.

Other than the increased capacity and the fact that the servers help our furnace heat the house in the winter, what other benefits can be derived from a scheme like this? The primary web server could look for a cookie on the browser and use it to send the visitor to the secondary server which he had visited before. If you have multiple internet connections from different ISP's, the visitor could be sent to the server attached to the connection which has the least latency to his subnet. This article includes scripts which allow the primary web server to determine which secondary severs are available (failover) and scripts which allow the primary server to determine the secondary server with the least load at the moment (balancing). The downside of this scheme can be that an intelligent primary server has to do some work before it can hand the visitor off to a secondary server. Two techniques will be shown which allow this work to be offloaded to other machines.

Synchronizing content (HTML) between web servers is boring, so we are not going to keep the content on the web servers. To do so would force the authors to store it in three places and force is to have our cgi data such as hit counters spread over three machines. The complex which we can build will have a central location for html, cgi code, and cgi data which will be securely maintainable from the internet. Does this cause a bottleneck between the content server and the web servers? I do not have the bandwidth to drive my complex hard enough to find out, but as long as the web servers are performing some sort of transform on the data, either through server side includes or a full blown PHP implementation, they should be busier with their work than the content server would be with its file serving. If the content is served as stored but follows the 80/20 rule, I would hope that the frequently retrieved 20% of the pages would be resident in the buffers of the web servers.

If sharing the content via NFS does not seem appropriate to you, use rdist instead to push the files out from a single repository. Though all of my servers are in my basement, yours could be physically dispersed so long as you can secure the communications between them. In all cases the redirection to a secondary server will be binding so that the visitor will remain on a single secondary web server. This supports a PHP/MySQL type of environment where the visitor would be tracked using a session key stored in a local database on the secondary server thereby distributing this load while less dynamic global information such as inventory levels, prices, and orders would be stored in a centralized database on the content server.

PLANNING AND SKILLSET

The server farm which you build can be as large or expensive as you want it to be, but all you really need is a screening router with port forwarding and four 486 or faster processors with 32 megabytes or more of ram, a nic, a patch cable and about a gigabyte of disk each. A more capable cluster can be built by making each of the three web servers dual-homed with 2 nic cards in each so that you can create an internal subnet for intra-server communication. One of the servers will be used as a common repository for site content, one will be the primary web server and two will be secondary web servers. One public IP address is sufficient but more can be used if they are available to you. If you have a public address for each web server, you will not need the screening router. Some possible topologies are shown in Figures 1, 2, 3, 4, and 5.

In order to build this cluster, you need to have some skills not covered by this article. You need to be able to install and apply security patches to the 'nix distribution of your choice on the hardware you have obtained. If you are brave enough to choose topology 5 (running naked on the internet), you will need to know how to harden your 3 web servers for exposure to the internet, how to set up one of the web servers to perform as a firewall for the content server and how to protect the open NFS client and NTP ports on the web servers from the internet. If you choose to clothe yourself with a screening router (recommended), you need to be able to configure it to port-forward so that web connections on specific ports are mapped to the same ports on the intended machines on your internal network.

If you choose topologies 2, 3 or 4 where the content server cannot be seen from the screening router, you will need to know how to enter a static route into the router so that the router will know to which host to hand off incoming packets which are intended for the content server. If you choose any topology other than the first, you will need to know how to configure your machines as dual-homed hosts and to allow one of them to forward packets to the content server. In any case, you will need to know how to administer services using either Sys-V scripts or rc.conf and know how to install ports, packages or RPM's such as S/Key, NTPD and Apache and how to set up server side includes in Apache. Advice on how to do these things is readily available on the internet and in the RTFM man pages.

PORT FORWARDING

Port forwarding is the method by which we are going to use the screening router as a relay to our internal machines. Eventually we will have seven paths into our network from the internet which are shown in Table 1. The packets from the internet will have their destination IP address changed to that of the server on the internal network and be presented to the internal server on the same port. The server's reply will be sent back to the requestor on the web as if it had come from the screening router which is all that your visitor can see. This process is formally known as Network Address Translation or NAT. Some routers can port map whereby they would change the port number as they forward the packets so that messages on the non-standard incoming http ports would be translated to the normal port 80 before they arrive at the secondary web servers, but this is not necessary.

Since you will be serving a web site, you might want to obtain a domain name and DNS services for it. Because I have a dynamic connection, I use the free services of www.dyndns.org to associate a hostname with my variable external IP address. If you have a static address which you are using for non-commercial purposes, I recommend the free services of http://soa.granitcanyon.com to provide DNS for your domain. If you are a commercial user, your company can obtain these services from their ISP. If you do not use a domain name, you can test your site by merely entering http:// followed by your external IP address from a machine which is outside of your internal network.

In all topologies other than the infamous number 5, the separate machines which comprise the cluster are mapped so that they appear to serve from the external IP address which is seen from the web. You will not be able to see how your site ends up looking from your own network. If you enter the external IP address, you will reach the router which may be very confused since it will see you as internal traffic. If you try port 80 on the internal address of your primary webserver, your browser will be sent to a different port on the same IP address by the redirection script. To actually reach the destination to which you are being sent, you will need to change the IP address yourself to that of the secondary server which serves on that port. To avoid this confusion, you may want to use a dial-up connection (an AOL CD?), your neighbor's connection (you are testing a what?), or your local library to make sure that things are working properly. Another alternative is to disconnect your router from the internet (GASP) and then use a cross-over cable to connect it to a spare PC with a browser which will then simulate a visitor from the internet.

STEP 1 - THE WEB SERVERS

Enough already, lets get started building. We are going to construct this beast in three steps: first we will get Apache and load balancing working on the web servers; then we will centralize the content; and finally we will make the content securely maintainable from anywhere on the internet. The first thing you need to do is to sketch your topology out on a napkin (or a more expensive design tool) and assign IP addresses to each nic card in your machines. Then, use your design document to set up your screening router if you have one. Send external queries on port 80 to the primary web server, queries on port 8086 to secondary web server 1 and queries on port 8087 to secondary web server 2. While you are logged into your router, look up your external IP address and write it down on a Post-It. Remember, if it is not static, it may change on you over the course of this project.

After you have planned, next build your hardware including the dual nic cards on selected machines and then install the distro of your choice on your 3 web servers. Resolve any issues with the dual nic cards and then install Apache and Lynx on them. Start Apache and set it up to start on boot. Use Lynx to browse to http://localhost and make sure that you see the default Apache splash screen. After you are satisfied with your initial installation, connect the external nic card to your router and then make the following changes to the your Apache configuration file, httpd.conf, as follows:

  1. On the secondary web servers only, change the port to which they listen by adding or changing the port directive in httpd.conf for secondary web server 1 to 'Port 8086' and adding or changing the port directive for secondary web server 2 to 'Port 8087'. Look for either a line which says 'Port 80' and change it or uncomment the port directive which is commented out and waiting there for you.

  2. On all of the web servers, enable server side includes and make note of the values of the following parameters:

    1. DocumentRoot - this is where your html documents are. Mine were in /var/www/htdocs and I will assume that yours are the same in the configurations below. If yours is different, change my configuration files to match your installation.

    2. ScriptAlias - the second operand is your cgi-bin directory. Mine was /var/www/cgi-bin and I will assume that yours are the same in the configurations below. If yours is different, change my configuration files to match your installation.

    3. User and Group which set the credentials used to server the pages. My user was www and my group was apache. These appear as file owners in Table 2.

Restart Apache using apachectl restart or the equivalent kill -HUP command to make your changes effective.

Now cd to the DocumentRoot directory which holds the html documents on your web servers and make the following changes:

  1. On all of the web servers, copy index.html to splash.html using the -p option to preserve the permissions on the file.

  2. On the primary webserver only, add the html and shtml documents which appear in Listings 1, 2, 3, 4, 5 and 6.

  3. Make sure that Apache can read these pages by running chmod 644 * from the DocumentRoot directory.

Next add the check scripts shown in Listings 7 and 8 to the cgi-bin directory on your primary web server and make sure that Apache can run them by running chmod 655 check* from the cgi-bin directory. These scripts are each used in two places. The first web page of each type runs the script as it is served and uses the data which is directly echoed from it. The second version of each type uses the output which the script has echoed to a file. Notice that the final choice of destination is made by running JavaScript on the visitor's machine. I look upon this as a way to offload processing load onto the best kind of machines, ones which are not mine. Purists may wish to have the shell script issue a redirect to the visitor's browser to send it to the appropriate destination. Also please note that the scripts and web pages will need to be reworked if you are using topology 4 as the IP address changes instead of the port.

Now fire up your external connection and test your router configuration by browsing to splash.html on ports 80, 8086 and 8087 of your external IP address by name or by number. The URL will be in the form of http://192.168.0.1:8086/splash.html, except that your external IP address or fully qualified domain name should be used. You should see 3 Apache splash screens with different port numbers at the bottom. If not, check your router setup. Assuming that this works, navigate to your external IP address only without a port or document name. This will hit port 80 on the screening router which will be forwarded to port 80 on your primary web server and open document index.html because this is the default document. If your browser has scripting enabled, index.html will be send it to the Apache splash page on one of your secondary web servers. Pump on the refresh button for a while. You should be bounced from secondary server to secondary server like a Ping-Pong ball. If scripting is not enabled, you will be served the Apache splash screen by your primary web server. You are successfully distributing load between your servers.

After you are bored with watching the port numbers change, navigate to the failover1.shtml document on port 80. It should work in the same manner but perhaps not quite as quickly. If it does not work, there is either a typo in failover1.shtml or in the checkping script or a problem with the ability of Apache to perform server side includes on your installation. You can browse to ohno.shtml on port 80 to look at the end of your Apache error log to see what is going wrong or tail the log directly. If failover1.shtml works, failover2.shtml, balance1.shtml and balance2.shtml should work too, if tried in that order. Notice that balance2.shtml serves faster than balance1.shtml but always sends you to the same secondary web server. That is because it merely includes the file previously created by checksite when it was run by balance2.shtml and returns that stale information to the client. After we have the content server running in the next step, we will make it run the check scripts periodically in order to keep their output files updated. This will allow failover2.shtml and balance2.shtml to distribute the load quickly and evenly over the available secondary servers without placing a load on the primary web server.

STEP 2 - THE CONTENT SERVER

Now that you are able to split the load, let's get rid of the bothersome task of synchronizing content between the web servers and consolidate our cgi directories so that we can have one hit counter updated by our secondary servers rather than two. Install the distro of your choice (again) on the content server. As you install, create an additional mount point to hold the shared content for your web site(s). I called mine /pool and will refer to yours as such in the configurations below. If you are using topology 4, deal with any issues which surround the creation of a triple-homed host and remember to use cross-over cables.

Next create the group and user which Apache uses to serve pages on your web servers on the new machine. Look for the line for the group in /etc/group on one of your web servers and then add the same line to /etc/group on the content server. Run vipw on one of your web servers to see the user entry for the Apache user and very carefully add the same line on the content server. This approach assumes that you are using the same distribution on all of your machines so that the same uid's and gid's will be present with the same meaning on all. This is essential so that NFS will not have to map users across machines, a frustrating process at best. If, like me, you used dissimilar distributions, create new a user and a new group on each of the machines using the same uid and gid for each and change httpd.conf to make Apache use these credentials instead of the default user and group. When you are done, install Lynx, plug the content server into your network, and verify that you can browse splash.html on the appropriate ports on your other boxen.

Now install and set up anonymous FTP on your content server with write privileges to an incoming directory using the standard instructions for your distro. Use the special mount point (/pool) which you created above for ftp user's home directory. Make sure that nothing is forwarding the FTP ports from the internet to your content server before you do this so that only your documents get published! Later we will open and secure this path to your content, but right now it is important that it does not exist. From the DocumentRoot directory on the primary web server, ftp to the content server and create a htdocs directory under the incoming directory which is under /pool, the ftp root. Use cd to go to this directory and mput all of the html and shtml to it. Quit and then cd to the cgi-bin directory on the primary web server, ftp to the content server again, create a cgi-bin directory under /pool/incoming and then populate it with the two check scripts.

Go back to the content server and add the wrapper script shown in Listing 9 to the /pool/incoming/cgi-bin directory. Then use chmod to set the permissions of your new htdocs directory and all of its contents to 744 and to set your cgi-bin directory and its contents to 750. Use chown to put the files in both directories into the group which runs Apache and make the owner root. This is key to allow Apache group access to these directories while denying it owner-level access. Now start the wrapper script in background add it to rc.local so that it will start on boot. By running the check scripts, it created pingfile and sitefile in the cgi-bin directory. Run chown www:apache *file from the cgi-bin directory to insure that the apache user will be able to read and update them also. Notice that anonymous ftp users who have neither group nor owner credentials will be able to see the HTML (which does no harm) but are unable to view the contents of the cgi-bin directory which should be hidden from them. Do an ls -l on both directories under /pool/incoming to check that the file owners and groups are being resolved and that the permissions are acceptable.

While you are logged onto the content server, also complete your NFS server setup by doing the following:

  1. Making the following entry in /etc/exports after replacing x with the numbers which are right for your topology and changing the apache user and group if needed:
            /pool -alldirs -mapall=www:apache 192.168.x.2 192.168.x.3 192.168.x.4
    
  2. Setting up NFS to start on boot with the appropriate flags (usually -r). Change the number of NFS clients allowed if you have more servers than I do.

  3. Dealing with any other NFS server security issues particular to your distribution.

Manually start NFS to see if it comes up but remember that you will need a client to really test it.

To effectively run NFS, it is important to make all of your machines agree on the time. You can run timed on your internal network to let the band of clock chips agree on the proper time (kind of like suits in a meeting) or you can use a public NTP server and make one of your web servers into its slave. Pick a web server and make your screening router forward the NTP ports (37 and 128) to it. Install NTPD on that server, configure it to start on boot, read about ntpd at http://www.ntp.org. Use the public server page referenced there to find a public stratum 2 NTP server and create the /etc/ntpd.conf configuration shown in Listing 10. If you are using topology 1, you will need to adjust the file since your server only has one address. Also touch /etc/ntp.drift and /var/log/ntp.log to create them and then reboot (yes, reboot) your server.

The reboot allows NTPD to make friends with the kernel so that it will be able to adjust the time. It will also start mountd and NFS properly so that you will get an RPC: Program unregistered error. Read the man page on ntpq and then use it to check that you are indeed the slave of your chosen master on the internet. If so, you are now the proud owner of a stratum 3 time server. Repeat the process including the touch and reboot on the other 3 boxen using the /etc/ntpd.conf file shown in Listing 11 to make them stratum 4 servers synced to your new stratum 3 master.

Now that all of your machines agree on the time, visit your web servers one by one and do the following:

  1. Add the following lines to /etc/fstab (change the IP address for topology 4):
             192.168.1.5:/pool/incoming/htdocs  /var/www/htdocs  nfs ro,-i,-s,-x60,-b 0 0
    
             192.168.1.5:/pool/incoming/cgi-bin /var/www/cgi-bin nfs rw,-i,-s,-x60,-b 0 0
    
  2. Perform the NFS mounts from the data in fstab by running
        mount /var/www/htdocs
    
    and
        mount /var/www/cgi-bin
    
    from the command line. These will be automatic at boot.

  3. Check that the mount commands worked by running df. The following lines should appear at the bottom of the listing:

             Filesystem                       512-blocks  Used   Avail Cap Mounted on
               < local filesystems listed here >
             192.168.1.5:/pool/incoming/htdocs   3593564  1680 3304400  0% /var/www/htdocs
             192.168.1.5:/pool/incoming/cgi-bin  3593564  1680 3304400  0% /var/www/cgi-bin
    
  4. If they mounted, ls -lt your cgi-bin directory and look at the dates last changed on the sitefile and the pingfile. They should be very current since the script on your content server should be refreshing it. Check that the user and group names resolve and that the permissions are appropriate. Then cd ../htdocs and check them there too, just like you did on the content server. All files in these directories need to belong to the group which is used by Apache to serve the pages. If your mounts did not work, play with /etc/fstab as you umount and mount until it does.

After you are all done with all of the web servers, go to your extrernal connection (your neighbor's house) and try your site again. Does it still work? Does balance2.shtml now send you to both of your secondary servers? Call home and ask somebody to halt one of your secondary servers. After it comes down, retry the site and verify that you always get sent to the survivor.

By now I am sure that you have developed some opinions about these balancing pages. Go back home, log onto the primary web server and change the DirectoryIndex directive in httpd.conf to specify your favorite script. Run apachectl restart from the command line of the primary web server to make your choice effective. Now this will become the first page served to visitors who do not explicitly specify another document as their initial page.

Unless you changed the javascripts in the pages, the visitors are being sent to splash.html. Change this page on the content server to thank you neighbor for his help and then call him up and ask him to browse your site. He should see your new page and chuckle. You are now at the point where you can develop content for your servers, put it onto a diskette, mount that diskette on the content server and copy the files to the appropriate directory with the appropriate permissions.

STEP 3 - REMOTE ACCESS

We are now going to finish the project by allowing you to change the content of your site(s) when you are not physically there. There are inherently three levels of authorization for the content of our site: anonymous FTP users will be able to read any file based on the 'world' level of authorization, a mode ending in 4, users in the same group as the Apache user (see set-up in Step 2, above) will be able to read, write and execute depending upon the 'group' level of authorization, and root will be able to read and write the pages in the /pool/incoming/htdocs directory of the content server and the code and data in /pool/incoming/cgi-bin directory based in the 'owner' level of authorization. In this step we are going to establish users who will be authorized to remotely maintain these files in place of root.

These users will be set up to ftp files to and from content server from the internet. As FTP users, they will not have their .profiles executed to initialize their umasks and so they will use the system default unless we override it. The system default umask of 022 is fine for the htdocs which can contain world-readable files and directories which are only writeable by the owner. It will not work for cgi-bin directory which must contain group-writeable subdirectories and contents which must not be world-readable. To set up a special umask to support this directory, read man login.conf and then add the following lines to /etc/login.conf after the definition of the default user to create a new login class:

    cgi:\
        :umask=007:\
        :tc=default:

Make the class effective by running cap_mkdb /etc/login.conf and then add the following 2 users without creating their skeleton files:

  1. A user called data which would have the default login class, be a member of the same group as the Apache user which you added in step 2, above, and have a home directory of /pool/incoming/htdocs. Files created by this user will have a mode of 644 and directories will have a mode of 755.

  2. A user called code which would have the new cgi login class, be a member of the same group as the Apache user, and have a home directory of /pool/incoming/cgi-bin. Files created by this user should have a mode of 660 and directories will have a mode of 770.

We will look at how to make new cgi files executable in a bit, but first log onto your content server as root. Go through the cgi-bin and htdocs directories and their contents using chown and chmod to alter the permissions and ownerships to match those shown in Table 2. Then add the two users to /etc/ftpchroot and use them to establish an ftp session from your primary web server. Check that the id's work as intended by using each to put a file and mkdir a directory and then perform an ls to check the new credentials and ownerships. Be sure that the files created by each id are owned by the group which was set up for the Apache user.

Also check your work by opening an anonymous ftp session to your content server. Make sure that you can only see the files in htdocs and that you cannot create files in any directories or cd to the directories which you cannot see from the ftp session. If this is so, open a path from the internet to your content server by having your router forward the ftp ports, 20 and 21 to the content server. If you are not using the simple topology, give your router a static route to your content server and enable IP forwarding on an intermediate machine.

Now go over to your neighbor's house again, browse to your site and then refresh it in his browser to verify that your site still works. If you get a 404 error meaning document not found or a 403 error meaning that you are not authorized, check your work. Also verify that you can establish an anonymous ftp session to your public IP address which will be hosted by your content server to check your forwarding and routing configuration. You could FTP as your 2 new users but you dare not to because their passwords would be disclosed. Notice that your router will prevent them from establishing a rash telnet or rlogin session from the internet as it should because those ports are not forwarded.

I know that many people would use SSH for remote access, but I never promised my user community shell remote accounts and I am a bit tired of upgrading SSHD as every script kiddie in the world tries to penetrate it. I choose S/Key as an alternate way to secure maintenance of my site. S/Key is a one-time password system (similar to a one-time code pad) which allows the password to be remotely generated and then entered in plaintext over the internet. It is fairly secure because even when it is sniffed, it is already too late. The host (the content server in this case) will not accept the same password again as it will be expecting the next password in the predetermined sequence for the account.

Since S/Key does not require a special FTP client or tunneling software, it can be used to modify your site from just about any internet-connected machine in the world. It's sequence of one-time passwords is breakable using a cracker like Monkey, but I doubt that the ability to own the content of my site is worth the trouble. Unlike SSH, the data will be transfered in plaintext after the session is established, but most is web content anyway. Use another method to remotely transfer credit card numbers or sensitive cgi scripts over the internet or wait until you can access these files from your internal subnet.

If S/Key does not exist on your content server, (try which keyinit to check), install it. Have a look at the discussion of RFC 1938 at http://www.lodestone.org/users/hoss/ops/, man skey and section 10.5 of the FreeBSD Handbook at http://www.freebsd.org). Touch /etc/skey.access to ensure that S/Key users have to use S/Key to login when they are not physically at the content server console and then run keyinit code and keyinit data to set up S/Key for each of the new users. Each time you run keyinit you will be asked for a secret password (use a pass phrase) and will receive a sequence number, a seed and an initial password made up of 6 short words.

Now login as one of the users and note that you are challenged with the sequence number and the seed. Respond with the 6 short words and you should be logged in. To get the password for the next sequence number, read man key and then run key with the next sequence number (one lower than the one you used to log in) and the same seed, reenter the pass phrase and receive another set of 6 short words. You can write these words down and take them with you or you can find an key program which uses the same hash function and as the content server and which runs on your target machine. Some may be identified at http://www.cs.colorado.edu/csops/FAQ/proxy.html#4 or at http://www.msri.org/local/computing/skey. My son (who prepared the illustrations) carries a key generator on a diskette which he uses when ftping in from a dos window on a Windows machine at his high school.

Before you quit the FTP sessions, see if the new users can chmod. I found that S/Key authentication was not sufficient to allow mine to chmod from their FTP session. To get around this problem, I added the checkexe script in Listing 12 into the cgi-bin directory and made it executable by hand. This script is run every 40 seconds by wrapper. If I upload a file called change2exe which contains a list of scripts into that directory, checkexe will make those scripts group-executable so long as they are owned by the user named code. Do this if you need to. Then get accustomed to using S/Key to authenticate you FTP sessions across your local subnet and finally take your ability to update your site out onto the road.

CONCLUSION

I hope that you are able to use this information to create your own web-serving cluster, limited only by the bandwidth of your connection, the number or ports in your switch and your physical infrastructure. I would be interested to hear about clusters built from these instructions and can be reached through http://fredscottthompson.com.