June 2005
Andrew Pollock


Sunday, 26 June 2005

There's no outage like an unscheduled outage

At about 21:15 on Friday night daedalus seems to have shat itself. It was still pingable, and attempts to make a TCP connection on port 22 resulted in a connection being made and then unceremoniously closed before an SSH banner was made. HTTP requests just timed out. It looked like a good bit of resource starvation to me. I had an SSH connection open, and attempts to get it to do anything resulted in the packets being acknowledged, but no actual response.

Friday outages always suck because generally the earliest someone can reboot the box is Monday morning. Fortunately, Ben was kind enough to go in for me on Sunday morning and kick it in the guts. Ah the joys of being 1000 kilometres from my box... I think it might have also got wind that I was thinking of replacing it with something a bit newer and gruntier, and got offended or something.

There's no good evidence of what actually happened. It looks as if it was mostly dead from around 21:15. No cron jobs ran, no log entries, nothing. These are the worst "crashes" to try and diagnose. I know for a fact it's short on RAM, and a UDMA-66 IDE cable would help reduce I/O bottlenecks.

