Vladimir Sedach

Have Emacs - Will Hack

August 18, 2007

Memory doubts

Topic: Distributed systems

Patrick Logan recently posted a note expressing misgivings about the promise of transactional memory:

I can see someone making the argument in some domain I don't usually work in that shared memory threads are better than shared nothing message passing for performance reasons. Some hard-real time scenario (I think I was real tired when I wrote that. Must have been thinking back to the days when I'd get into "Is automatic garbage collection practical?" debates. Nobody has those anymore, do they? Because I've got some real good arguments for GC.), some huge number crunching scenario, etc. where every byte and every cycle has to count in the extreme. But the irony then is that STM seems far more suited to a language like Haskell, which is also unlikely to be suited for these performance scenarios.

My only fear is that for the masses including myself, we need *simple* mechanisms, and the fewer of those, the better. Shared nothing messages seem to scale up, out, and down. STM seems more complicated, and an incomplete solution, and only a solution for shared memory. No thanks, at least until bigger brains than my own can show me why I should change my mind. They seem sidetracked on this one.

With regards to transactional memory, the "performance argument" does not even need to be brought up - the second biggest objection to transactional memory (the first being that it is different than what is currently being used), is that even in the best case of no contention, there is a large overhead penalty imposed versus code using manual synchronization. Of course in the worst case you would have to throw out a whole bunch of transactions' worth of work and start again.

When it comes to scalability, transactional memory is just as limited as any other kind of transactional system. Published benchmarks already show a performance plateau on some of the larger Sun servers, and those have less cores and probably a much lower CPU-speed to memory latency (and bandwidth) ratio than any kind of upcoming 80-core Intel chip.

If anything, out of all available approaches message-passing is by far the fastest that can be implemented. Cache-coherency protocols introduce delays and seriously limit scaling in the number of processors in an SMP system, and even atomic instructions are so expensive that I have heard the overhead means that some of the lock-free datastructures currently being worked on do not perform well compared to the lock-based ones they are supposed to be replacing.

However, I would not by any means call transactional memory complicated. In fact it is the easiest approach to concurrency available for those who are used to traditional imperative programming. This is by no means a good thing, since without needing to understand concurrency people will inevitably write "concurrent" programs that do not scale or worse (it is not too hard to imagine a transactional system where performance suddenly drops off a cliff when you add processors because transactions are now being aborted all the time).

A final comment I want to make is to go back to considering objection #1 to transactional memory. While the state-of-the-art in distributed systems research is advancing well, the state of experience in actually implementing them is lagging much further behind, to the point where many people have a very misguided view about what is hard and what is not in programming distributed systems. This was illustrated in an incident a while ago where a research professor in the field whose work and talents I otherwise highly respect told a class that message passing was too difficult for programmers and that they would prefer to use locks and semaphores to program, so how can we implement those in terms of the former? Of course I had to object. I suspect there are and will be a lot of similar arguments repeated over and over, and the familiarity card will be brought out again and again.

If someone does it to you, point out the fact that out of all software, only a very small amount is written with lock-based concurrency, and out of all those programs, only an insignificant percentage is actually written to be real parallel software (as opposed to multi-threaded GUIs and network servers). In this case, the familiarity card is really worth far less than it seems.