A module on an network switch connecting our servers to the internet failed at 950pm EST. After several attempts to get the switch to operate normally, the switch module was replaced. Phanfare was available at 10:59pm,
Upon analysis, we could have avoided downtown by further replicating some of our network and firewall infrastructure to provide for multiple network paths through redundancy. We are evaluating the cost effectiveness of adding the required hardware to avoid this type of outage in the future.
To give some perspective, being down one hour per month provides us with 99.86% uptime, not the 99.999 that Ma Bell used to provide, but pretty good. It is believed that Amazon, for example, strives for 99.99% uptime for their systems.
Sorry for the disruption in service!