Most people in Australia know that the night of August the 9th was census night. What was new about this one is that this was the first one that could be filled out online.
What most people also now know is that the online census was a complete failure. Tens of thousands of people were unable to complete the census, and the ABS website failed. For hours, people were unable to access the site.
This was no surprise to me, and most likely anyone else who has a decent understanding of how to engineer network services to handle heavy loads.
The loads we are talking about astronomical compared to normal loads experienced by many systems.
A newspaper report suggested that the ABS had load tested the system (as they should) prior to implementation and it could handle 1,000,000 hits an hour. Yeah that’s great. Unless your system gets a million hits a second. This is the most likely scenario that occurred as virtually every family in Australia said after dinner on Tuesday night, “let’s get this out of the way”.
I used to work for Tabcorp and year after year, their systems crashed on Melbourne Cup day (this was a dozen years ago, so it’s most likely much better engineered now). The problem was that the spike in load for that brief time leading up to the cup is far more that the system’s regular load. We’re not talking double here, we’re talking about 50 times your normal traffic. The point is, it’s very hard to engineer for such peak loads.
The ABS chiefs had an excuse. It wasn’t the load; we’d tested for that. It was a denial of service (DOS) attack.Well having worked for the public service before (like many Australians) I could see that one for what it most likely was; a bureaucrat’s excuse. I may be cynical, but I wouldn’t be surprised if they had meetings beforehand, planning excuses if the system didn’t work. After all, may as well be prepared.
I call that excuse as most likely bull. Calling on past experience as a computer security professional I find it very hard to believe that the systems implemented by the ABS wouldn’t have DOS protection. Why? Because they have things called firewalls and intrusion prevention systems that detect these and stop them.
But what is a denial of service attack? It’s a fairly basic attack. What the attacker does is send a large amount of requests to a site in an attempt to overwhelm it.
One way to do this is to open thousands and thousands of half open connections. The way a PC connects to a server is by a thing known as a 3 way handshake. The PC requests a connection. The server responds by saying I have a connection and the PC responds back saying I’ll take that connection. If the PC doesn’t responds, the server waits for a time with a connection waiting for the response. Do this thousands of times and the server runs out of connections.
That is the description of a simple attack. To the best of my knowledge, modern systems are completely immune to simple attacks like this, but there are likely to be other types of similar attacks that are more effective.
Anyway, the bottom line is that a DOS attack either isn’t likely or shows that they weren’t very prepared, as this article in the Sydney Morning Herald says.
Update: What we are seeing now is typical public services shenanigans as those responsible for the debacle duck for cover. All very predictable as public service heads blame IBM. This is why they hire external consultants; so that they can ensure that when heads roll, it’s not theirs. IBM will be doing the same thing; seeking a scapegoat. Most likely a senior contractor. Definitely not management Either that or looking at the contract and pointing out that they delivered as per the specifications in the contract.
what has also come out in the press is that other computing experts have come out and said exactly what I have said. That a DOS attack is unlikely and that the load experienced was simply beyond the capacity of the system to cope.