Censorship in the Netherlands

January 11th, 2012

Today, a judge in the Netherlands ruled that two major national ISPs, Ziggo and XS4ALL, are to block 3 IP-addresses and 24 domainnames belonging to The Pirate Bay. Worse still, a Dutch lobby group for copyright enforcement, Stichting Brein, is allowed to provide additional domains and IP-addresses in the future, that the ISPs are also required to block. No neutral check of these additional blocks has been provided for.

To prevent this first act of censorship in the Netherlands and to show that this is not the way to deal with undesirable content on the internet, I propose that those of us who have this ability donate subdomains and IP-addresses to fight it. You can do this by creating a subdomain like piratebay.yourdomain.nl and then configuring your webserver to act as a proxy, routing traffic from the real Pirate Bay to users who may no longer be able to access it. Doing this only takes a bit of bandwidth, but not that much because The Pirate Bay only serves .torrent-files and not the actual downloads of course.

So here are the step-by-step instructions:

  • Create a subdomain like piratebay.yourdomain.nl in your domain management interface
  • Enable proxying in Apache (assuming that’s what you’re running for webserver) with: a2enmod proxy
  • Add the following virtualhost to your apache site configuration:
<VirtualHost *>
  ServerName piratebay.yourdomain.nl
  <Proxy *>
    Order deny,allow
    Allow from all
  </Proxy>
  ProxyPass / http://thepiratebay.org/
  ProxyPassReverse / http://thepiratebay.org/
</VirtualHost>
  • Restart the webserver; under Debian and Ubuntu the command is: /etc/init.d/apache2 restart
That’s it. You’ve now added one more domain and IP-address that Stichting Brein will need to get blocked to fulfil their role as the first Dutch Ministry of Censorship. Do be mindful that they may in fact do that, regardless of the collateral  damage they will be causing by blocking more than just The Pirate Bay. Other sites hosted on the same IP-address may also end up being blocked. But if you can suffer it, this gives us another strong argument against blocking on the internet: there will always be collateral damage and it’s just a matter of time until this aspect is used to someone’s advantage.

IPv4 exhaustion

February 3rd, 2011

Today IANA has allocated the 5 last remaining IPv4 address blocks to the 5 regional registries. The global pool of unused internet addresses has been exhausted. What remains is two or three levels of buffers between that and the end-user. The first level are those regional registries, but they are projected to burn through their stock in somewhere between 3 and 6 months. After that, there’s a buffer with the various internet access providers. They were, under the current policy, allowed to request IP-addresses for a projected use up to 2 years. So if they play their cards right, they may have leeway for up to 2.5 years from now. But it’s not a matter of who holds out the longest, it’s a matter of who runs out first. Because as soon as someone does, there will be, out of necessity, computers and servers on the internet that can only talk IPv6. So if you, even though you’re secure with your existing IPv4 allocation, want to talk to those machines, you will need to have IPv6 as well, or some form of translation in between. The first IPv6-only machines will probably come online in the Asia/Pacific area, where growth of the number of internet-connected machines is the largest. Whether this immediately impacts you or not depends on who you talk to and/or do business with. But it won’t be much longer before the other regions of the world will have to follow suit. If you want to continue to enjoy an open internet where you can talk to anyone and get information from anywhere, then you are going to need IPv6 connectivity. And that’s where the silver lining to this global problem appears.

See, the internet was built as a open, single tiered network, where everyone could talk to everyone else if they wanted to. We call it the end-to-end principle. In recent times this has more often been talked about as ‘peer to peer’ (p2p), but it encompasses much more than just being able to share files. The fact that nowadays the term ‘p2p’ is used for a few particular activities, rather than all the rest, indicates that the times have changed. Most of our internet use has changed from an end-to-end model to a client-server model, or a ‘content provider’ and ‘content consumer’ model. While this has brought us a number of great services, such as YouTube, webmail and social networks, it has also made us heavily dependent on a small number of commercial entities, with our privacy most precariously at stake. The upcoming shortage of IP-addresses, which has been foreseen for decades, has only made this worse. To save addresses, every house and most businesses are issued only 1 IP-address. Some clever tricks were devised to allow you to connect more than one computer in your house, but because you have only one address that can be reached from elsewhere on the internet, it’s impossible to make end-to-end connections between any combination of computers connected that way. In other words, you are dependant on central servers, and as such often dependant on a third party, to get in contact with those other people hidden away behind their single address.

And that’s where the new IP-protocol IPv6 comes in. It gives us so ridiculously many addresses that there really is no reason conceivable to give you only one address. Due to the way the protocol works, you will be given *at least* 2^64 addresses to use at home. If you want to have multiple separate networks at home, which in the future you very probably will, then you should be given 2^72 addresses (65536 networks of 2^64 addresses) or even 2^80. My preference would be the latter, because it’s basically the one-size-fits-all approach. No matter whether you are a large corporation or a lowly home user, everyone will get the same size of address-range. It’ll certainly make administration a lot easier, and it will support incremental growth.

So there it is. IPv4 is dead, long live IPv6! And hurray for the return of end-to-end, barrier-free internet for everyone!

Name & Shame

March 16th, 2010

One of the advantages of running your own mailserver is the ability to easily and limitlessly add mail aliases. I’ve been doing that for years now for every website I have to sign up for with an email address. When I get spam on any of these aliases, this gives me two advantages:

  1. I know which website leaked my address, and
  2. I can stop the spam coming in by simply removing the alias.

Point 1 unfortunately doesn’t tell you whether the website willingly sold their list of email-addresses to spammers or whether they just had a security breach that they didn’t tell us about. But it matters little, it tells me that I can’t trust that particular site.

I don’t see a point in writing to them, in either case they’re not likely to admit to me what happened. But naming and shaming them here at least puts this information out in the open. If a lot of bad press is generated, business websites at least may be enticed into respecting our right to privacy more.

So here they are, the websites who disclosed the email-address I entrusted to them:

  • Opus Supplies [ www.opussupplies.nl ]
  • Teamdrive [ www.teamdrive.com ]
  • Longtail Video [ www.longtailvideo.com ] – the makers of JW Player (surprised me)
  • Perry Marshall / Cosmic Fingerprints [ www.cosmicfingerprints.com ] – okay, I could’ve expected this one
  • PC Megastore [ www.pcmegastore.nl ] – went bankrupt last August, so they won’t bother anyone anymore

Drobo

June 2nd, 2009

Some time ago, I came across an intriguing device online, called Drobo. I watched the demonstration video and was immediately impressed. Basically the device is an external, 4-bay SATA enclosure that internally uses a RAID-like mechanism that allows you to hot-swap drives in and out while also allowing you to mix drive sizes and brands. I found this very attractive as it gives you the security of RAID without the usual hassle. Since I was working on small servers with backup-facilities at the time, I was interested in using Drobo to deliver a large and flexible pool of storage. And, as my servers run, Debian Linux in text-mode only, I was interested to see whether I could make Drobo work that way.

Linux is not supported officially by the producer of Drobo, Data Robotics Inc. They did however add support for the ext3 filesystem, though this is still in beta. Luckily, an excellent management interface for linux called Drobo-Utils was developed by Peter Silva. It features both a commandline interface as well as a GUI, though I have only used the former.

The design-choices made for Drobo do raise some issues, however. Firstly, the current 4-bay Drobo models are connected through USB2 or Firewire. This limits the possible throughput speed in such a way that any of the usual gains from striping in RAID are probably irrelevant. The peak sequential throughput rates using Firewire800 that are published on their website (52 MB/s read and 34 MB/s write) are beaten by most modern single 3.5″ SATA drives. So Drobo is useful for secure bulk storage, not for high-performance. Secondly, to avoid the user having to grow or shrink their filesystem when the drives in Drobo are changed, it uses thin-provisioning to present more diskspace to the OS than is (initially) available. This defaults to 2TB, which is also the maximum most stock linux kernels will support. If more than 2TB of usable diskspace is put into Drobo, it will  present the OS with a second volume. Because this sort-of defeats the idea of ‘one big pool of storage’, I spent a lot of time looking for a way around this. As it turns out, Drobo does have a method to increase the size of the logical units (LUNs) that it presents to the OS. Drobo-Utils also supports this through the ‘setlunsize’ command. However, it is somewhat uncharted ground. After a lot of Googling, I found the proper kernel configuration settings to make linux support these larger LUN-sizes:

File Systems
   Partition Types
     [*] Advanced partition selection
     [*] EFI GUID Partition support (NEW)
Enable the block layer
   [*] Support for Large Block Devices

After compiling such a kernel, it is possible to connect a Drobo with larger LUN-sizes. Because ext3 with the default of 4kB block-size doesn’t support more than 8TB, I settled on this LUN-size value. Following the instructions in the Drobo-Utils README, I was able to partition Drobo to a virtual 8TB and create an ext3 filesystem on it. Initial tests with copying and removing a few tens of gigabytes worked very well. Due to reports about the capacity lights not turning off after removing the data, I filled it up to 15%, at which point the second of the ten capacity lights came on (so they actually indicate 5-15-25% etc). After removing this data, the lights went off over the course of about half an hour. So, I figured reclaiming free space was functional, though at a delay.

Unfortunately, after testing Drobo in the actual production environment, where it is filled up to 60-70% of its effective capacity and large amounts of data are added and deleted daily, it turns out that the device does lose track of the available free space on the volume. It regularly happens that I delete 50-60 GB and Drobo shows no drop in used space, not even 12 hours later. Reported space usage now hovers between 82% and 87%, giving a yellow alert whenever at 85% or more. It appears the situation is stable like this, but I don’t really trust it very much. It seems the only solution is to go back to a 2TB LUN-size. That’s being reported as fully functional. Apparently the beta support for ext3 fails at these large volume sizes.

My alternative idea to solve the multiple volume issue was to use LVM to bring multiple Drobo LUNs together in a single logical volume. However, because LVM uses a different disk layout than ext3, support for this would need to be added to Drobo’s firmware. Since Data Robotics explicitly states that LVM is not supported, there is little reason to expect that that will happen anytime soon.

Concluding, I’d say that Drobo is great for flexible bulk storage for your desktop or laptop computer. Its form-factor and connectivity support this notion. Maybe it was my mistake, trying to coerce Drobo into providing high-performance, highly-reliable storage for servers. It’s just not very well suited for that task. Coincidentically, as I was struggling with Drobo on linux, I learned quite a lot about software RAID and LVM2. This make me realise it’s quite well possible to create a very flexible and secure storage setup using those, with little more than a case with hotswap SATA-bays and my favourite linux distro. So there will be more about this soon!

BartOS

February 23rd, 2009

I have many ambitious projects in mind, most of which will never see the light of day, and programming my own operating system is certainly way off the realism scale. But nevertheless, it doesn’t hurt thinking about what such an OS would look like. So, here goes nothing, the main features of BartOS 1.0:

  • it will dedicate a least one core of the (no doubt many) available cores on the host system to handle user interaction. That means you will always be able to control the system, even if it is extremely overloaded or has locked up processes. The reason for this is that, in my experience, responsiveness of the user interface is the primary factor that makes the difference between ‘a pleasure to use’ and ‘excruciatingly masochistic to use’.
  • it will follow its users actions and try to predict what the next action will be. Based on this it can:
    • prepare the action, giving the impression of being faster
    • suggest the action, saving the user the time to specify the command
    • auto-execute the action, if it’s been given authorisation to do so previously

    The reason for this is that I think we build computers to do the hard and repetitive work for us, not to make them let us do it. Given a sufficiently overpowered computer for the tasks at hand (which most of us already have), it’s fine for the computer to prepare operations that may never be executed or gather data that may never be used. Doing so allows them to make our life easy whenever possible and prevents us from doing the repetitive work computers do so much better.

  • it will present a user-interface based on icons, separated into categories like input, output and functions. That way, operations become clear, easy to specify and flexible. They can also be stored to be executed later, or at a certain interval. The reason for this is that it may, hopefully, bring a measure of intuitiveness to more complex operations than just moving files around. We got the latter down pretty well, using the graphical user interface found in most current operating systems, but anything beyond that (converting between file formats, converting a datastream to a file, manipulating data in basic ways, etc…) still requires specialised knowledge and software.
  • a running foreground process will always have some visual representation in the GUI; it must never vanish, even for a second, making the user wonder whether it is still running or not.

Another important thing I’d like to see an operating system have, is something that I call a ‘bottleneck indicator’. I’m sure we all regularly wonder ‘what am I waiting for?’ when a computer is busy with something. If it would be able to tell you whether you are low on free physical memory, your CPU is 100% busy or the system is waiting on disk I/O, this would be very good to know and would allow you to get an idea what part of your system would benefit most from an upgrade.

One last observation: I fully expect speech recognition to be the next big jump in computing. Clicking with a mouse or typing on a keyboard will never be as fast as simply saying something. Any future OS will do well by being set up to be easily controlled by speech commands. In this light, it is important to realise that there are two major ways of getting a computer to do something for you. Either you are starting an existing function, that has previously been programmed in, or you are programming/scripting one on the fly. The latter will be particularly difficult at first using speech recognition. This is because all programming/scripting-languages so far require a special syntax that is not normal speech. To program using speech, you will have to speak in such a special syntax, which will be very difficult and require a lot of corrections. If an OS would manage to provide a scripting language that is based on normal speech, it would completely blow away all others in a speech controlled computer age.

The Two Servers

January 7th, 2009

After moving our email facilities to the new server already in Juli, we finally got around to moving the various websites we host in November. Having moved all mission-critical services away from the old machine enabled us to give it a serious update. Since it still ran the by-now unsupported Debian Sarge OS, with a 2.4 linux kernel, this was somewhat overdue. With a fresh Debian Lenny install, a bit more RAM and some improvements on the cooling system, it’s now ready for action again. The package selection and configuration was mostly copied from the new server, so we now have, as far as software is concerned, two identical machines, which makes the management a lot easier.

The next step is configuring the old machine as a proper backup server. Using it as a secondary (fallback) email server is of course fairly easy, though it needs to be able to filter spam just as well as the primary, or else our excellent spam defense will break. Providing failover capabilities for other services is not always that straightforward. Dynamic websites need to have their databases in sync and mailboxes need to be consistent between (potentially concurrent) sessions as well. Because our servers are in seperate datacentres, making this synchronisation work is a bit tricky. We’re determined to make this work, however, because we want to have failover capabilities beyond a single datacentre. Modern datacentres have very robust power and network infrastructures, but still it is not inconceivable that whole racks or even cages suffer an outage. The only way to avoid those problems and the risk of administrative mistakes is to spread your equipment across different locations and hosting companies.

Doing automatic failover between locations is difficult, but not impossible. Lacking the use of hardware load-balancers, DNS needs to be used to spread the load on machines and also to move clients away from failing servers.  There will always be some delay there that cannot be avoided. Keeping the TTL low in your DNS zones helps, but values below 1 hour are not always honoured. The problem the other way around, clients arriving at inactive backup servers, we intend to solve using tunnels between the servers. That way, the client can be served from the active machine running the authoritative database, without the need to notify or otherwise redirect the client.

Micro-topographical survey

March 31st, 2008

This January’s expedition to Koroneia hill has been quite succesful. Working with the high-precision DGPS set turned out to be fairly easy. It also was very fast to work with. In our 11 days of fieldwork we’ve taken almost 9000 point measurements. Creating a digital elevation model (DEM) of a hill like this in less than 2 weeks is, as far as I know, revolutionary. Because we were specifically looking for anomalies in the hill’s topography, and because the hill has both fairly even areas and highly irregular ones, we’ve decided to select the points to measure in the field, based on what we saw, rather than trying to follow any kind of regular grid. This to avoid our grid smoothing out small ridges that may be quite important to our research. Below you can see an overview of the measurements we took, overlaid on the aerial photographs we have and a preliminary 3D result.

Koroneia hill: surface map using Kriging (2.5m grid)Koroneia hill: DGPS measurements on aerial photo

MMVI Spam Filters

February 14th, 2008

I’ve never thought that I would one day involve myself with the fight against spam. In general I don’t like the idea of computers examining the content of our email and trying to decide for us whether we want to receive that email or not. Computers are appallingly bad at interpreting human writing and images, especially so if the data at hand was created with the specific purpose of fooling them.

The change came with the realisation that the vast majority of spam these days is sent from virus-infested home-computers. I have extensive experience with these drones (as we call them) from my activities as an IRC operator and it soon came to me that it must be possible to differentiate between these end-user computers and proper email-servers. That way, it is possible to accept or deny email based on its source, rather than its content.

Besides allowing you to deny the email even before the actual body text is sent, it also allows you to do so with the sender still on the line. This is a very important benefit. If your filters decide to deny the delivery of the message, your server can tell this directly to the sending party. Since the From and Sender header lines of spam are usually faked, this is the only time you know for sure that you are talking back to the spammer. This way, if it was truly spam, the spammer may notice that you’re not buying it, but more importantly, if it wasn’t spam (a so-called false-positive), then the legitimate sender is also notified, instantly. It may be bothersome for a legitimate sender if their email didn’t make it through, but it’s much worse if their message was just silently discarded, for it may take them a long while to realise that it didn’t arrive. Not sending notifications of failed delivery to the faked From-address is very good practice. If you do, you’re just adding to the problem and run the risk of ending up on blacklists like backscatterers.org.

I’ve been developing the MMVI spam filters for 2 years now, so the ruleset has become quite complicated. But the 3 basic principles it is based on are:

  • denying mail from hosts with obvious generic hostnames or consumer-identifying tokens in their hostnames (for instance 123-123-123-123.isp.com or 294a7g2.adsl.isp.com).
  • denying mail from hosts that send a very wrong HELO/EHLO name in the SMTP transaction. Many home-computers are behind a NAT device, making them unable to know their external IP-address. As such they have difficulties with properly introducing themselves. Also, a lot of spamming software is a hastily hacked-together mess, that tends to mess up things like this.
  • denying mail from hosts that have no or incorrectly configured reverse DNS. These tend to be hosts on poorly configured networks, that seems to go hand-in-hand with poorly secured as well. In many cases, you don’t want anything to do with these.

These rules are not black-and-white on MMVI, I’ve made a lot of attempts to redeem proper mailservers that have slight misconfigurations. The only thing I’ve not yet discussed is the lonely few who run their own mailserver off of their ADSL line at home. Being one of those, I do feel sympathy for them. To make it possible for them to keep doing so, each denial is sent with a specifically generated whitelist address, that can be used by a legitimate sender to whitelist their own hostname. Since both the rejected message and the whitelist request must be delivered by the same system (as they will be in the above scenario), this is very difficult to do for a spammer, who utilises thousands of systems but rebuilds the target list of email-addresses centrally.

Admittedly, this system would not hold up if spammers were to adapt their software specifically against our defenses. At that point though, we could switch the whitelisting procedure to something that properly ensures human-intervention, for instance a Captcha-like mechanism. For now though, small as we are, working with cheap hardware and free software, we enjoy being 99+% spam-free.

WordPress Mobile Edition

February 12th, 2008

As a minor update, I’ve used the wonderful “WordPress Mobile Edition” plugin by Alex King (found here) to make this site accessible from mobile devices as well. Just navigating to the normal site address with a mobile browser should give you the lightweight version. If it doesn’t for you, let me know.

Dell PowerEdge R200

January 23rd, 2008

The Dell PowerEdge R200, a 1U rackserver, has been delivered. It’s wonderfully engineered. The case opens by just one hand-tightened screw. Inside is room for two 3.5″ harddrives, that can be serviced by just pulling out one pin and then taking out the HDD frame. Cooling is done by two sideways-blowing fans, taking the air through the tunnel-shaped CPU heatsink and a seperate flow past the memory DIMMs. A single smaller fan takes care of the cooling for the PSU. The system comes configured with an advanced diagnostics program, available as a boot option from the BIOS. This program does take up 2.2 GB on two primary partitions of the first harddisk.

Both the harddrive as well as the slimline DVD/CDRW combo drive are SATA. This gave us a little trouble, since the linux version we intended to use on it (Debian Etch, the current stable branch) does not properly support these. The installer would not be able to read packages from the cdrom drive, nor would the harddisk appear in the partitioning menu. Trying the various related boot options we could find online did not solve the problem. In the end, we downloaded the Debian testing version (Lenny). This one does support both types of SATA drives, right out of the box. Lenny already seems rather stable and also offers the benefit of the most modern software versions, so we’ve decided to go ahead with it. With luck, by the time we’re ready to go live with the machine, Lenny may have reached stable.