Unscheduled downtime
Yesterday evening at 23:40 half of the cluster and a switch went offline due to a power-supply failure in one of our cabinets at the hosting center.
At 01:00 we rebooted the cluster at half capacity, which was enough to service the players at that time.
Power was restored to the cabinet at 05:15 and the rest of the nodes were promptly started to rejoin the cluster.
Apart from the 1 hour 20 min total outage between 23:40 and 01:00 this should not have been noticeable.
The source of the power failure is being investigated and our hosting team is working with Cable & Wireless on preventive measures to avoid any similar problems in the future.