The Two Servers

After moving our email facilities to the new server already in Juli, we finally got around to moving the various websites we host in November. Having moved all mission-critical services away from the old machine enabled us to give it a serious update. Since it still ran the by-now unsupported Debian Sarge OS, with a 2.4 linux kernel, this was somewhat overdue. With a fresh Debian Lenny install, a bit more RAM and some improvements on the cooling system, it’s now ready for action again. The package selection and configuration was mostly copied from the new server, so we now have, as far as software is concerned, two identical machines, which makes the management a lot easier.

The next step is configuring the old machine as a proper backup server. Using it as a secondary (fallback) email server is of course fairly easy, though it needs to be able to filter spam just as well as the primary, or else our excellent spam defense will break. Providing failover capabilities for other services is not always that straightforward. Dynamic websites need to have their databases in sync and mailboxes need to be consistent between (potentially concurrent) sessions as well. Because our servers are in seperate datacentres, making this synchronisation work is a bit tricky. We’re determined to make this work, however, because we want to have failover capabilities beyond a single datacentre. Modern datacentres have very robust power and network infrastructures, but still it is not inconceivable that whole racks or even cages suffer an outage. The only way to avoid those problems and the risk of administrative mistakes is to spread your equipment across different locations and hosting companies.

Doing automatic failover between locations is difficult, but not impossible. Lacking the use of hardware load-balancers, DNS needs to be used to spread the load on machines and also to move clients away from failing servers.  There will always be some delay there that cannot be avoided. Keeping the TTL low in your DNS zones helps, but values below 1 hour are not always honoured. The problem the other way around, clients arriving at inactive backup servers, we intend to solve using tunnels between the servers. That way, the client can be served from the active machine running the authoritative database, without the need to notify or otherwise redirect the client.

Leave a Reply