Why futures don't work - alternatives for parallel programming

The Embrace Of Meh - Why Futures Are Not Safe

The futures model of programming seems like a very a very stable and reliable way of making multi-threaded development safe and easy. The model is beautifully simple:

1) Create a task
2) Submit it to an executor
3) The executor submits it to a thread pool and returns a Future to yourself.
4) When you need the result of the task you can call Get on the future (or whatever the api defines) and wait till the result appears.

The key point being that tasks are queued up. If we submit tasks to the executor faster than they are being finished the queue grows. The executor will have a given number of execution threads. This might be dynamic or fixed. In general though there will be a maximum number of threads.

Sadly - there is a problem:

What is 'you' in the above description? I kept saying ’you do this’ and ’you do that’ but never defined you. Well, ’you’ are a thread of execution. Our simple model has a hidden complexity: there are two types of thread,  submitters a executors. This is fine for really old fashioned programming like the sort of stuff we did in Single core machines and such in years gone by. Nowadays we need a more flexible approach to threading. We need to inject threading anywhere we can just to squeeze performance from the ever growing core count of modern machines.

Exponential parallelism.
What we need in modern programming is to go parallel everywhere. Anything which can be parallel should be. We nee to avoid ’this is the parallel bit’ thinking. Consider that we make a task, why not have that task make more tasks. Why not have those tasks make more tasks. If we make a two tasks which make two tasks which make two asks we have eight tasks. Breaking the outer problem eight ways might have been very tricky. By looking for parallelism everywhere a least we have hit eight ways for part if the execution path.

Such an ’go parallel anywhere you see a chance’ approach fits well with modern iterative software development. We dang progressive add parallelism throughout the lifetime of a program. Yes - really! The idea that ’oh dear this is single threaded and so will remain so unless we re-architect it all’ is not good enough. We need to be able to add parallel constructs as and when we see the chance.

Back To That Problem
If we submit tasks from anywhere in our code at any point without caring about the larger multi-threaded context then sooner of later we will submit a task from inside a task. This breaks the model of submitter/submitted. We could have a separated executor and thread pool for each task. That works but leads to unbounded thread counts so breaks the benefit of thread pools (too many threads burdens the system with administration overhead until it grinds to a standstill just managing threads).

A naive understanding could lead one to think that submitting tasks from within tasks is no problem. When I say naive - errr well -I’ve made that mistake. What is really annoying is that I realised it, built a task model to avoid it, went and did something else for a year and.... Well you can guess the rest. So, this time around I am writing a bog post to burn the concepts so firmly into my brain that I never make that mistake again.

The embrace of meh:
If you are above 30 and don’t have kids it could well be the case that you have not yet learned the amazing word “meh”. It seems to mean “I cannot be bothered to care about this”. It is a kind of passive aggressive version of giving up.

Now, thread deadlock is sometimes (rather graciously) referred to as the embrace of death. Interlocked threads can make no progress and so processing grinds to a halt. I though this is what had happened to my Sonic Field process just the other day. I installed some watchdog code which checks for thread deadlock. Sonic Field ground to a halt in queue but no deadlock was detected. Basically, I asked it to work but it's just said “meh”.

What would cause this terminal meh event? Tasks queuing tasks for execution does the trick. Consider a task running in a pool of just one thread (not likely in reality but it describes the problem). This task the queues another task for execution and waists for the subtask to end. Now we have a task in a waiting thread. It is waiting for a subtask. But there are no free threads in the pool, so the subtask will never get chance to execute and the outer task will never get out of a wait state. We have meh!

The underlying Problem
Tasks and futures (promises or what ever you call them) make a fundamental error. It happens a lot in software engineering. The model assumes some resource is infinite and yet it is actually precious. Threads (pre-emptive threads on modern system) are very expensive indeed. This is the very reason we have thread pools in the first place. If threads were cheap why pool them? So we have a “threads are free” based model on top of a “threads are precious” model and someone somewhere is just bound to go meh.

Let us get some ideas straight. Threads are expensive. 1024 threads in on process is ridiculous on many systems. However, if a task creates 8 tasks which all create 8 tasks which all create 8 themselves we have 584 (if my mental arithmetic is ok) tasks. One thread per task is just not going to cut it as an approach if we want to make task creation a trivial activity for the programmer.

One Solution
A way around this which has gained a lot of popularity recently is to have light weight threads. These are not true threads in the pre-emptive sense. These are some notion of cooperative multi-tasking. Threads of activity actively relinquish their execution. Let's see how that works in our simple meh model.

Task A queues sub task B. Now a waits for B to finish. By so doing task A tells the underlying framework it no longer requires execution and so relinquishes the thread of execution. The schedular can set that threat of execution off doing something else. In effect, threads of task and threads of execution are separated.

But there are problems. The model only works for truly light weight operations. Anything which might lock (io for example) breaks the model. Frameworks based on this approach exist (task stealing in TBB from Intel or Fork/Join in Java) and work; nevertheless, they are not as easy or robust as would be ideal.

Do Semantic Heuristic Schedular
This solution is the one I am currently using in Sonic Field. Do semantics are an approach where closures are submitted as tasks. We just “do the task contained in the closure”. The program automatically waits for their completion value simply by dereferencing the value returned from task submission.

def myTaskA():
    # code which closes around
    #the current namespace

print x,y

In the above two tasks are run in parallel and their results printed. We can embed tasks within tasks and so have composability of tasks. In other words, we can build up large parallel tasks from many smaller tasks which also use parallelisation. Clearly, we will hit thread exhaustion quickly in this model.

The Do schedular in Sonic Field uses a thread pool. Before it submits a task to the pool it takes an estimate of the number of free threads in the pool. If that number is below a cutoff then rather than scheduling the task for execution it executes it straight away in the submitting thread.

This model causes a 'soft landing' where more and more thread are scheduled until the limit is reaches at which scheduling rapidly reduces and a peak level is attained. Note that the peak is above the cutoff due to the estimated natured of the thread pool free thread count. Avoiding this would require synchronisation in thread submitter which is simply not required at would cause a massive bottleneck.

Which is better?
Neither! I suspect the task stealing light weight approach as a potential to be more effective. However, it requires more programmers effort than Do scheduling. The latter approach is so simple to use it offers a reach chance of retro-fitting parallelisation to existing code and for letting developers write parallel code as a habit.

Knowing exactly what is happening - the lost art of programming

The comfortable delusion of infinitely increasing computer speed is shattering - time to learn some real programming!

We are often faced with the idea that hand writing assembler will give the fastest code. Next up the food chain is very low level C and then C++ followed by, maybe, Java and tailed by something akin to Python. Does this all make sense? Is this the real situation or maybe correlation does not imply causality? In this post I would like to suggest it does not. I posit that knowledge of the underlying mechanisms employed by a program is a much bigger factor than the choice of language. If we were to read a file one line at a time from disk (a real spinning disk) then the performance of the system would be entirely dominated by the buffering strategy and not by the code reading the lines. Which language (Java, C and such) is use would make no noticeable difference at all. In such a case, hand writing assembler would be a very bad design choice as the chances of implementing a sub-optimal buffering strategy would be high; maintaining that choice correctly over many software and hardware updates would be near impossible.

So what am I saying? I guess I am saying that understanding the way the computer's I/O system actually works is more important than the code we write. Understanding the underlying 'exactly what is happening' is more important than the details of the implementation language. The difficulty we face as developers is that software practice has moved towards abstraction. Mathematics gets in the way of computer programmers in so many ways. Some are foolish enough to see computer programming as a branch of mathematics. This makes about as much sense as saying architecture is a branch of physics. Similarly, educators, scared of frightening off young people from programming, hide the electronic nature of a computing machine behind interpreted, easy to understand programming techniques. This may well be a valid approach except the step to developer maturation - knowing what the computer is actually doing never happens.

Sadly, the notion that simply writing code in a low level language like C will make it run fast is also misguided. Indeed, writing 'next to the machine' with assembler or C is no guarantee of performance. humans lack the ability to process large amounts of related data which is required to perform some of the best optimisations for which modrern compilers are getting so profissiant. We might be brilliant at writing 20 instructions which are tailored for absolute maximum performance and not notice they these instructions can be elided in the majority of code paths due to branch speculation optimisation.

Nevertheless, we cannot rely on compilers to always arrive at an optimal solution and we definitely cannot expect them to work around poorly written code. What is required is a cooperation between compiler and developer. To achieve that we developers need to start to learn more about the dirty mechanics of code execution; we need to stop thinking in highly abstract ways and start to disassemble generated code. We need to look inside library objects like share_ptr and find out the mechanical details of what they do, not just skim over the high level theory.

Similarly, there is nothing inherently bad with using interpreters; as a high abstraction for pushing large computational tasks around they can be very useful. The moment we find ourselves performing that computation in an interpreted (or in truth - dynamic) language is the moment we have gone wrong.

Where does this leave abstraction? Do we have to jettison Haskell, Java and C++ in favour of C and Fortran? No, we can still work at abstraction where it helps - but we must not hide behind it.

  1. We need to accept that hot code needs to be fully characterised on the target hardware. Mathematical abstraction will not help.
  2. The free lunch of ever cheaper and faster hardware is over - computer performance has stagnated and costs are rising.
  3. Tooling must evolve beyond simple 'this bit of code is taking the most time' - we need to know WHY.
  4. We as developers need to stop hiding from reality - computers are machines, computation is engineering not mathematics or science.

1,2 and 4 are starting to sink in to the commercial worl at least. I know of a few places where 3 is in its infancy. The future will be a more hardware centric, intellectually challenging place: In my view - all the better for it.


VLog: Cathode Ray - why do supermarkets sell crap wine

I am starting a new project - a simple rant vlog

Episode 1