What tools do distributed programmers lack?

OK, let me start.

A distributed logger with a high-precision global time axis – allowing to register events from different machines in a distributed system with high precision and independent on the clock offset and drift; with sufficient scalability to handle the load of several hundred machines and several thousand logging processes. Such a logger allows to find transport-level latency bottlenecks in a distributed system by seeing, for example, how many milliseconds it actually takes for a message to travel from the publisher to the subscriber through a message queue, etc.

Syslog is not ok because it’s not scalable enough – 50000 logging events per second will be too much for it, and timestamp precision will suffer greatly under such load.

Facebook’s Scribe is not ok because it doesn’t provide a global time axis.

Actually, both syslog and scribe register events under arrival timestamps, not under occurence timestamps.

Honestly, I don’t lack such a tool – I’ve written one for myself, I’m greatly pleased with it and I’m going to open-source it. But others might.

P.S. I’ve open-sourced it: http://code.google.com/p/greg

Leave a Comment