The futures model of programming seems like a very a very stable and reliable way of making multi-threaded development safe and easy. The model is beautifully simple:
1) Create a task
2) Submit it to an executor
3) The executor submits it to a thread pool and returns a Future to yourself.
4) When you need the result of the task you can call Get on the future (or whatever the api defines) and wait till the result appears.
The key point being that tasks are queued up. If we submit tasks to the executor faster than they are being finished the queue grows. The executor will have a given number of execution threads. This might be dynamic or fixed. In general though there will be a maximum number of threads.
Sadly - there is a problem:
What is 'you' in the above description? I kept saying ’you do this’ and ’you do that’ but never defined you. Well, ’you’ are a thread of execution. Our simple model has a hidden complexity: there are two types of thread, submitters a executors. This is fine for really old fashioned programming like the sort of stuff we did in Single core machines and such in years gone by. Nowadays we need a more flexible approach to threading. We need to inject threading anywhere we can just to squeeze performance from the ever growing core count of modern machines.
What we need in modern programming is to go parallel everywhere. Anything which can be parallel should be. We nee to avoid ’this is the parallel bit’ thinking. Consider that we make a task, why not have that task make more tasks. Why not have those tasks make more tasks. If we make a two tasks which make two tasks which make two asks we have eight tasks. Breaking the outer problem eight ways might have been very tricky. By looking for parallelism everywhere a least we have hit eight ways for part if the execution path.
Such an ’go parallel anywhere you see a chance’ approach fits well with modern iterative software development. We dang progressive add parallelism throughout the lifetime of a program. Yes - really! The idea that ’oh dear this is single threaded and so will remain so unless we re-architect it all’ is not good enough. We need to be able to add parallel constructs as and when we see the chance.
Back To That Problem
If we submit tasks from anywhere in our code at any point without caring about the larger multi-threaded context then sooner of later we will submit a task from inside a task. This breaks the model of submitter/submitted. We could have a separated executor and thread pool for each task. That works but leads to unbounded thread counts so breaks the benefit of thread pools (too many threads burdens the system with administration overhead until it grinds to a standstill just managing threads).
A naive understanding could lead one to think that submitting tasks from within tasks is no problem. When I say naive - errr well -I’ve made that mistake. What is really annoying is that I realised it, built a task model to avoid it, went and did something else for a year and.... Well you can guess the rest. So, this time around I am writing a bog post to burn the concepts so firmly into my brain that I never make that mistake again.
The embrace of meh:
If you are above 30 and don’t have kids it could well be the case that you have not yet learned the amazing word “meh”. It seems to mean “I cannot be bothered to care about this”. It is a kind of passive aggressive version of giving up.
Now, thread deadlock is sometimes (rather graciously) referred to as the embrace of death. Interlocked threads can make no progress and so processing grinds to a halt. I though this is what had happened to my Sonic Field process just the other day. I installed some watchdog code which checks for thread deadlock. Sonic Field ground to a halt in queue but no deadlock was detected. Basically, I asked it to work but it's just said “meh”.
What would cause this terminal meh event? Tasks queuing tasks for execution does the trick. Consider a task running in a pool of just one thread (not likely in reality but it describes the problem). This task the queues another task for execution and waists for the subtask to end. Now we have a task in a waiting thread. It is waiting for a subtask. But there are no free threads in the pool, so the subtask will never get chance to execute and the outer task will never get out of a wait state. We have meh!
The underlying Problem
Tasks and futures (promises or what ever you call them) make a fundamental error. It happens a lot in software engineering. The model assumes some resource is infinite and yet it is actually precious. Threads (pre-emptive threads on modern system) are very expensive indeed. This is the very reason we have thread pools in the first place. If threads were cheap why pool them? So we have a “threads are free” based model on top of a “threads are precious” model and someone somewhere is just bound to go meh.
Let us get some ideas straight. Threads are expensive. 1024 threads in on process is ridiculous on many systems. However, if a task creates 8 tasks which all create 8 tasks which all create 8 themselves we have 584 (if my mental arithmetic is ok) tasks. One thread per task is just not going to cut it as an approach if we want to make task creation a trivial activity for the programmer.
A way around this which has gained a lot of popularity recently is to have light weight threads. These are not true threads in the pre-emptive sense. These are some notion of cooperative multi-tasking. Threads of activity actively relinquish their execution. Let's see how that works in our simple meh model.
Task A queues sub task B. Now a waits for B to finish. By so doing task A tells the underlying framework it no longer requires execution and so relinquishes the thread of execution. The schedular can set that threat of execution off doing something else. In effect, threads of task and threads of execution are separated.
But there are problems. The model only works for truly light weight operations. Anything which might lock (io for example) breaks the model. Frameworks based on this approach exist (task stealing in TBB from Intel or Fork/Join in Java) and work; nevertheless, they are not as easy or robust as would be ideal.
Do Semantic Heuristic Schedular
This solution is the one I am currently using in Sonic Field. Do semantics are an approach where closures are submitted as tasks. We just “do the task contained in the closure”. The program automatically waits for their completion value simply by dereferencing the value returned from task submission.
# code which closes around
#the current namespace
In the above two tasks are run in parallel and their results printed. We can embed tasks within tasks and so have composability of tasks. In other words, we can build up large parallel tasks from many smaller tasks which also use parallelisation. Clearly, we will hit thread exhaustion quickly in this model.
The Do schedular in Sonic Field uses a thread pool. Before it submits a task to the pool it takes an estimate of the number of free threads in the pool. If that number is below a cutoff then rather than scheduling the task for execution it executes it straight away in the submitting thread.
This model causes a 'soft landing' where more and more thread are scheduled until the limit is reaches at which scheduling rapidly reduces and a peak level is attained. Note that the peak is above the cutoff due to the estimated natured of the thread pool free thread count. Avoiding this would require synchronisation in thread submitter which is simply not required at would cause a massive bottleneck.
Which is better?
Neither! I suspect the task stealing light weight approach as a potential to be more effective. However, it requires more programmers effort than Do scheduling. The latter approach is so simple to use it offers a reach chance of retro-fitting parallelisation to existing code and for letting developers write parallel code as a habit.