Archive for the ‘sysadmin’ Category

Drobo

Tuesday, June 2nd, 2009

Some time ago, I came across an intriguing device online, called Drobo. I watched the demonstration video and was immediately impressed. Basically the device is an external, 4-bay SATA enclosure that internally uses a RAID-like mechanism that allows you to hot-swap drives in and out while also allowing you to mix drive sizes and brands. I found this very attractive as it gives you the security of RAID without the usual hassle. Since I was working on small servers with backup-facilities at the time, I was interested in using Drobo to deliver a large and flexible pool of storage. And, as my servers run, Debian Linux in text-mode only, I was interested to see whether I could make Drobo work that way.

Linux is not supported officially by the producer of Drobo, Data Robotics Inc. They did however add support for the ext3 filesystem, though this is still in beta. Luckily, an excellent management interface for linux called Drobo-Utils was developed by Peter Silva. It features both a commandline interface as well as a GUI, though I have only used the former.

The design-choices made for Drobo do raise some issues, however. Firstly, the current 4-bay Drobo models are connected through USB2 or Firewire. This limits the possible throughput speed in such a way that any of the usual gains from striping in RAID are probably irrelevant. The peak sequential throughput rates using Firewire800 that are published on their website (52 MB/s read and 34 MB/s write) are beaten by most modern single 3.5″ SATA drives. So Drobo is useful for secure bulk storage, not for high-performance. Secondly, to avoid the user having to grow or shrink their filesystem when the drives in Drobo are changed, it uses thin-provisioning to present more diskspace to the OS than is (initially) available. This defaults to 2TB, which is also the maximum most stock linux kernels will support. If more than 2TB of usable diskspace is put into Drobo, it will  present the OS with a second volume. Because this sort-of defeats the idea of ‘one big pool of storage’, I spent a lot of time looking for a way around this. As it turns out, Drobo does have a method to increase the size of the logical units (LUNs) that it presents to the OS. Drobo-Utils also supports this through the ‘setlunsize’ command. However, it is somewhat uncharted ground. After a lot of Googling, I found the proper kernel configuration settings to make linux support these larger LUN-sizes:

File Systems
   Partition Types
     [*] Advanced partition selection
     [*] EFI GUID Partition support (NEW)
Enable the block layer
   [*] Support for Large Block Devices

After compiling such a kernel, it is possible to connect a Drobo with larger LUN-sizes. Because ext3 with the default of 4kB block-size doesn’t support more than 8TB, I settled on this LUN-size value. Following the instructions in the Drobo-Utils README, I was able to partition Drobo to a virtual 8TB and create an ext3 filesystem on it. Initial tests with copying and removing a few tens of gigabytes worked very well. Due to reports about the capacity lights not turning off after removing the data, I filled it up to 15%, at which point the second of the ten capacity lights came on (so they actually indicate 5-15-25% etc). After removing this data, the lights went off over the course of about half an hour. So, I figured reclaiming free space was functional, though at a delay.

Unfortunately, after testing Drobo in the actual production environment, where it is filled up to 60-70% of its effective capacity and large amounts of data are added and deleted daily, it turns out that the device does lose track of the available free space on the volume. It regularly happens that I delete 50-60 GB and Drobo shows no drop in used space, not even 12 hours later. Reported space usage now hovers between 82% and 87%, giving a yellow alert whenever at 85% or more. It appears the situation is stable like this, but I don’t really trust it very much. It seems the only solution is to go back to a 2TB LUN-size. That’s being reported as fully functional. Apparently the beta support for ext3 fails at these large volume sizes.

My alternative idea to solve the multiple volume issue was to use LVM to bring multiple Drobo LUNs together in a single logical volume. However, because LVM uses a different disk layout than ext3, support for this would need to be added to Drobo’s firmware. Since Data Robotics explicitly states that LVM is not supported, there is little reason to expect that that will happen anytime soon.

Concluding, I’d say that Drobo is great for flexible bulk storage for your desktop or laptop computer. Its form-factor and connectivity support this notion. Maybe it was my mistake, trying to coerce Drobo into providing high-performance, highly-reliable storage for servers. It’s just not very well suited for that task. Coincidentically, as I was struggling with Drobo on linux, I learned quite a lot about software RAID and LVM2. This make me realise it’s quite well possible to create a very flexible and secure storage setup using those, with little more than a case with hotswap SATA-bays and my favourite linux distro. So there will be more about this soon!

The Two Servers

Wednesday, January 7th, 2009

After moving our email facilities to the new server already in Juli, we finally got around to moving the various websites we host in November. Having moved all mission-critical services away from the old machine enabled us to give it a serious update. Since it still ran the by-now unsupported Debian Sarge OS, with a 2.4 linux kernel, this was somewhat overdue. With a fresh Debian Lenny install, a bit more RAM and some improvements on the cooling system, it’s now ready for action again. The package selection and configuration was mostly copied from the new server, so we now have, as far as software is concerned, two identical machines, which makes the management a lot easier.

The next step is configuring the old machine as a proper backup server. Using it as a secondary (fallback) email server is of course fairly easy, though it needs to be able to filter spam just as well as the primary, or else our excellent spam defense will break. Providing failover capabilities for other services is not always that straightforward. Dynamic websites need to have their databases in sync and mailboxes need to be consistent between (potentially concurrent) sessions as well. Because our servers are in seperate datacentres, making this synchronisation work is a bit tricky. We’re determined to make this work, however, because we want to have failover capabilities beyond a single datacentre. Modern datacentres have very robust power and network infrastructures, but still it is not inconceivable that whole racks or even cages suffer an outage. The only way to avoid those problems and the risk of administrative mistakes is to spread your equipment across different locations and hosting companies.

Doing automatic failover between locations is difficult, but not impossible. Lacking the use of hardware load-balancers, DNS needs to be used to spread the load on machines and also to move clients away from failing servers.  There will always be some delay there that cannot be avoided. Keeping the TTL low in your DNS zones helps, but values below 1 hour are not always honoured. The problem the other way around, clients arriving at inactive backup servers, we intend to solve using tunnels between the servers. That way, the client can be served from the active machine running the authoritative database, without the need to notify or otherwise redirect the client.

MMVI Spam Filters

Thursday, February 14th, 2008

I’ve never thought that I would one day involve myself with the fight against spam. In general I don’t like the idea of computers examining the content of our email and trying to decide for us whether we want to receive that email or not. Computers are appallingly bad at interpreting human writing and images, especially so if the data at hand was created with the specific purpose of fooling them.

The change came with the realisation that the vast majority of spam these days is sent from virus-infested home-computers. I have extensive experience with these drones (as we call them) from my activities as an IRC operator and it soon came to me that it must be possible to differentiate between these end-user computers and proper email-servers. That way, it is possible to accept or deny email based on its source, rather than its content.

Besides allowing you to deny the email even before the actual body text is sent, it also allows you to do so with the sender still on the line. This is a very important benefit. If your filters decide to deny the delivery of the message, your server can tell this directly to the sending party. Since the From and Sender header lines of spam are usually faked, this is the only time you know for sure that you are talking back to the spammer. This way, if it was truly spam, the spammer may notice that you’re not buying it, but more importantly, if it wasn’t spam (a so-called false-positive), then the legitimate sender is also notified, instantly. It may be bothersome for a legitimate sender if their email didn’t make it through, but it’s much worse if their message was just silently discarded, for it may take them a long while to realise that it didn’t arrive. Not sending notifications of failed delivery to the faked From-address is very good practice. If you do, you’re just adding to the problem and run the risk of ending up on blacklists like backscatterers.org.

I’ve been developing the MMVI spam filters for 2 years now, so the ruleset has become quite complicated. But the 3 basic principles it is based on are:

  • denying mail from hosts with obvious generic hostnames or consumer-identifying tokens in their hostnames (for instance 123-123-123-123.isp.com or 294a7g2.adsl.isp.com).
  • denying mail from hosts that send a very wrong HELO/EHLO name in the SMTP transaction. Many home-computers are behind a NAT device, making them unable to know their external IP-address. As such they have difficulties with properly introducing themselves. Also, a lot of spamming software is a hastily hacked-together mess, that tends to mess up things like this.
  • denying mail from hosts that have no or incorrectly configured reverse DNS. These tend to be hosts on poorly configured networks, that seems to go hand-in-hand with poorly secured as well. In many cases, you don’t want anything to do with these.

These rules are not black-and-white on MMVI, I’ve made a lot of attempts to redeem proper mailservers that have slight misconfigurations. The only thing I’ve not yet discussed is the lonely few who run their own mailserver off of their ADSL line at home. Being one of those, I do feel sympathy for them. To make it possible for them to keep doing so, each denial is sent with a specifically generated whitelist address, that can be used by a legitimate sender to whitelist their own hostname. Since both the rejected message and the whitelist request must be delivered by the same system (as they will be in the above scenario), this is very difficult to do for a spammer, who utilises thousands of systems but rebuilds the target list of email-addresses centrally.

Admittedly, this system would not hold up if spammers were to adapt their software specifically against our defenses. At that point though, we could switch the whitelisting procedure to something that properly ensures human-intervention, for instance a Captcha-like mechanism. For now though, small as we are, working with cheap hardware and free software, we enjoy being 99+% spam-free.

Dell PowerEdge R200

Wednesday, January 23rd, 2008

The Dell PowerEdge R200, a 1U rackserver, has been delivered. It’s wonderfully engineered. The case opens by just one hand-tightened screw. Inside is room for two 3.5″ harddrives, that can be serviced by just pulling out one pin and then taking out the HDD frame. Cooling is done by two sideways-blowing fans, taking the air through the tunnel-shaped CPU heatsink and a seperate flow past the memory DIMMs. A single smaller fan takes care of the cooling for the PSU. The system comes configured with an advanced diagnostics program, available as a boot option from the BIOS. This program does take up 2.2 GB on two primary partitions of the first harddisk.

Both the harddrive as well as the slimline DVD/CDRW combo drive are SATA. This gave us a little trouble, since the linux version we intended to use on it (Debian Etch, the current stable branch) does not properly support these. The installer would not be able to read packages from the cdrom drive, nor would the harddisk appear in the partitioning menu. Trying the various related boot options we could find online did not solve the problem. In the end, we downloaded the Debian testing version (Lenny). This one does support both types of SATA drives, right out of the box. Lenny already seems rather stable and also offers the benefit of the most modern software versions, so we’ve decided to go ahead with it. With luck, by the time we’re ready to go live with the machine, Lenny may have reached stable.

Apache blackhole

Wednesday, December 12th, 2007

This tip helps limit the server resources (CPU and bandwidth) taken by worms, probes and misconfigurations hitting your Apache webserver. I’m assuming you already have virtual hosting and mod_rewrite enabled (most sites do).

Generally, requests to the server without a valid server hostname (like www.yourserver.com) will be answered by the topmost entry in your vhost configuration. In almost all cases, this will be either a worm just connecting to your IP-address directly without knowing which sites you run, or a DNS misconfiguration of someone else has sent an unwitting user to your doorstep. In both cases, serving out your ‘default’ website is pointless. You don’t want the worm to probe around your site looking for vulnerabilities and if it really is a misdirected user, they’re not likely to be interested in your default site.

My solution is to create a default site that isn’t actually a site, but rather a short, simple message saying that the web address entered (if any) is wrong. This can simply be done by adding the following vhost entry at the top of the vhost configuration (in the case of apache2: in the file ‘default’ in /sites-available), just below “NameVirtualHost *”:

<VirtualHost *>
  ServerName nohost
  ErrorDocument 403 "The website you requested was not found on this server"
  RewriteEngine on
  RewriteRule . - [F]
</VirtualHost>

Second server

Wednesday, December 12th, 2007

Joeri and I are looking to make the next step up in the MMVI project by adding a second server. This will give us some redundancy to prevent outages and also a lot more bandwidth so that we can hopefully put everything online that we always wanted to.

Initially we intended to build the second server ourselves (like the previous one) from used hardware. However, some very tempting offers from Dell have made us reconsider. It turns out that we can get a new, Dell-assembled and -tested machine for relatively little extra money. We’re now looking at different hosts, aiming as high as 1 TB (terabyte) of allowed data transfer per month.