Erlang’s 99.9999999% (nine nines) reliability

The reliability figure wasn’t supposed to measure the total time any part of AXD301 (project in question) was ever shut down for over 20 years. It represents the total time over those 20 years that the service provided by the AXD301 system was ever offline. Subtle difference. As Joe Armstrong says here:

The AXD301 has achieved a NINE nines reliability (yes, you read that right, 99.9999999%). Let’s put this in context: 5 nines is reckoned to be good (5.2 minutes of downtime/year). 7 nines almost unachievable … but we did 9.

Why is this? No shared state, plus a sophisticated error recovery model.

If you dig a bit deeper, in the PhD thesis written by Joe, the original author of Erlang (which includes a case study of AXD301), you read:

One of the projects studied in this chapter is the Ericsson AXD301,
a high-performance highly-reliable ATM switch.

So, as long as the network that the switch was a part of was running without downtime, the author can state “nine nines reliability” for AXD301 (which was all he ever said, avoiding specifics). It doesn’t necessarily mean Erlang is the only cause of such high reliability.

EDIT: In fact, “20 years” itself seems like a misinterpretation. Joe mentions a figure of 20 years in the same article, but it’s not actually connected to the nine-nines reliability figure, which potentially came out of a much shorter study (as others have mentioned).

Leave a Comment