Erlang is awesome. Really. Syntax aside the language has a lot going for it, like most good products it is highly opinionated. Pure process oriented message passing semantics with per-process mailboxes and garbage collection is a very clean approach. These fundamentals make it an excellent choice for distributed and concurrent programming, and barring Java with Akka there are few options for language-based distributed operation.

The importance of this is very much up for debate, given that the message-queue has become a ubiquitous proxy for language support. Discrete components written in multiple languages hooked up to some agnostic bus makes for a simple model for distributed computation and insulates programmers from many of the complexities of “real” distributed programming. Light-weight libraries like 0mq / and google protocol buffers have blurred the lines between the traditional message queue and full inter-process RPC. Despite these advantages some major software messaging platforms have been written in Erlang, including WhatsApp, and portions of major software infrastructure like Github. I’ve also heard that it has a role at Heroku, and frankly it makes sense.

Erlang has three absolutely killer features that make it a serious contender for any environment trying to have massive uptime. First, is the OTP platform itself. The official language is called Erlang/OTP for a reason, and thats because the Open Telecom Platform bakes in a lot of critical software components necessary to build resilient software. Not using OTP and building distributed code is like reinventing the wheel all over again. The second is hot code reload. You can upgrade code on a per module basis as the system is running, largely due to the enforced immutable state, but also a lot of back end plumbing. Doing live upgrades while a system is running is an extremely tricky business, but if uptime is absolutely critical it is a very powerful capability to have baked in to the core libraries. Finally, Erlang has a built in database. Mnesia is an acid-compliant distributed data store, capable of multi-node replication, sharding, and a host of other features that interface seamlessly with the language itself. Being able to store data in an active-active state across your application nodes, Mnesia makes it possible.

I do have one major gripe, and that is the real subject of this post. The open Telecom platform is fantastic for what it is, clearly probably the best framework out there for developing a 2 or 3 node high availability application. Takeover capabilities, distributed in memory operation, and hot code reload make for a potent mix when trying to design something capable of throughput and uptime. The problem is that it really falls short when we talk about large distributed applications. Once you want to run your code on ten nodes, or fifty, or a hundred the paradigms that OTP introduces kind of fall apart. The built in tools don’t handle network partitions very well if at all, they don’t have dynamo style consistent hashing, paxos support, or multi-cast and distributed messaging constructs between processes, nor can you expect to find built in service discovery (beyond PG2/Gproc). If you want these things (Ulf Wiger’s excellent Gproc library aside) you end up rolling your own. As any experienced programmer knows, rolling your own anything is difficult, and ultimately prone to error. When you are building a new application you generally want battle tested components, but in Erlang your options are limited.

Basho has done an excellent job with Riak, Riak Core is a really nice dynamo-style library, but its just one component in the toolbox. It is also low on examples, and although there are some interesting presentations on using it for service oriented architecture there is little in the way of open source code demonstrating that usage. I can understand why, certainly after building a nice distributed framework around Riak core many companies would simply keep it internal as part of their competitive arsenal. I think what Erlang needs is an Open Cloud Platform. A set of rock solid components that leverage Erlang’s robust distributed features, builds on OTP as a foundation, but really extends it’s capabilities for building applications that are highly distributed. Instead of having to cobble together a bunch of third party libraries, a single uniform set of interfaces around service discovery, messaging, and partition recovery would go a long way to keeping Erlang competitive in the long term.

Despite it’s success in certain industries Erlang will probably remain a niche language. Certainly some of it’s best features are being emulated in the JVM, and co-opted by languages like Scala, but it seems so clear with a serious effort on the tooling side it could cement itself as a viable alternative for a wide range of applications.