Lizenzen: Wir verwenden für Bilder soweit angegeben Creative Commons Lizenzen ( Links zu den Linzenzen finden sie aus technischen Gründen hier CC BY SA, CC BY ND, CC BY )

Threads brought multi-tasking and multiprocessors to the mainstream of software development. Recently Go brought coroutines into the mainstream with their implementation called goroutines. Since then goroutines have been touted as the panacea of multi-processor development. When we take a deeper look at both technologies, we’ll find that there are more differences than the simple perception of coroutines being superior. There are many implementations of threads and coroutines, we’ll use Go goroutines and Java threads as examples from each group of implementations.

First, what are Java threads? Java threads are language constructs to execute different code in the same timeframe with each code block making progress. If we have one processor then execution is interleaved, if we have more processors than code blocks, execution happens at the same time in parallel, called parallelism. If we execute more than one code block “at the same time” on one processor, how is this achieved? A processor can only execute one instruction (simplified) at the same time. Language runtimes solve this by scheduling. Each piece of code is scheduled to be executed for some time, then it loses control and another piece of code is executed. So, all code that needs to be executed is interleaved. This is called scheduling.

There are many different scheduling strategies, the main categories are cooperative and preemptive scheduling. With cooperative scheduling the code that is being executed gives control back to the scheduler, for example with a directive in some languages called ‘yield’ or by calling a language library or system function. With preemptive scheduling the scheduler takes away control from the executing code, which can happen after every instruction. Scheduling can happen on different levels, most notably by the operating system or the runtime. If the OS does scheduling it’s easier to schedule with other tasks on the processor and easier to integrate with IO. Coroutines are a concurrency mechanism where code execution is scheduled by the runtime. Their scheduling implementations are mostly cooperative.

One example in Python would look like this

This code prints ‘Doing one thing’, then waits for one second, then prints ‘Doing another thing’. While executing ‘sleep’ control is given back to the runtime scheduling mechanism to execute other coroutines. The same code would look like this in Kotlin

Code execution is not interrupted at arbitrary points but when calling ‘delay’ in Kotlin or ‘sleep’ in Python. Control could also be given back at other points, depending on the runtime. Go scheduling takes back control at system calls, here this would be ‘println’. Another point to change scheduling in Go is returning from functions.

Let’s take a look into some aspects of threads and coroutines.

Coroutines are easier to synchronize

One of the main problems with threads is concurrent write access to data. Suppose we write a countdown where concurrent code decreases a counter.

With the preemptive scheduling of threads execution can be taken away from this code after every instruction (depending on the runtime also inside “one” instruction). So one code block reads the value of ‘counter’, sees it’s 1 and executes the ‘if’ branch. After this test another thread is scheduled and also runs into the test, also finding the value at 1 and executing the ‘if’ branch. Now the first thread resumes and decreases the value to 0, then the second thread executes and decreases the value to -1. Now we have an execution error, in this case called race condition. To prevent this kind of errors, we need to protect code execution like this

Which is often done with something called locks. In the case of goroutines, if we do not execute system calls, we do not need to protect the execution of this code, as the runtime doesn’t take away our control.

Coroutines have lower memory overhead

The scheduler switches execution between different threads or coroutines. To resume execution the scheduler needs to store some data like variable data (heap) or where execution will resume (program counter). As a general solution without knowledge of the runtime the operating system thread scheduler needs to store a relatively large amount of data per thread, beside threads data this includes operating system context data. Contrary to this a runtime scheduler for coroutines has a lot less memory overhead per coroutine. Go in particular has intelligent data structures and a segmented stack to reduce memory usage per goroutine even further.

We can have millions of coroutines

The most important result of the low memory usage of coroutines compared to threads: you can have millions of goroutines while on the same hardware hundreds of threads either lead to memory exhaustion or congesting the scheduler to a standstill. With this architecting software around millions of concurrent goroutines becomes possible.

Thread runtimes are easier to implement

Thread language runtimes are easier to implement because you don’t have to write your own scheduler. You mostly write a thin wrapper around operating system threads which are one-to-one mapped to your threads. Scheduling, execution flow etc. is managed by the operating system. With coroutines you have to write your own scheduler, task switching environment and execution restoration which when done naively leads to bugs and low performance.

Threads can’t starve other execution

The major downside of cooperative scheduling is how one code block can starve other code. For example

does not call sleep, return from a function call or execute a system call. So the cooperative scheduler does not take away control and the whole code is executed until it’s finished. During this time no other code is executed. This happens more often then expected, when during development small test data sets are used while in production hundreds of thousands of records are processed.

The impact is most severe if you have one processor and one server. Then one request can block all other requests for a long time. Over the years the danger decreased. With multiprocessing code is still executed on other processors, with heavy IO in your code there are many points where the scheduler takes away execution control, with many servers there are always other servers to serve requests, with the prevalent architecture of Go channels there are many points where execution control is given back to the Go scheduler.

For a discussion on making the Go scheduler preempteive see this Github issue.

Threads often work better with IO

Java started out with green threads, threads scheduled by the runtime instead of the operating system. There were many reasons for this. One reason operating was system threads weren’t the best and with their own threading implementation Javas execution behavior was the same on every operating system. The problem with green threads was IO. If one thread was stuck in (blocking) IO and execution was given over to the operating system which executed IO. During this time all other threads were blocked from executing. This was one of several reasons for Java to move over to operating system threads. This way the operating system did the scheduling and one thread blocked in IO did not block all other threads.

Today this is less of a problem with intelligent coroutine runtimes. Go moves blocking IO calls from the coroutine thread to a new operating system thread so other coroutines don’t block. On top of this IO today moves from blocking to event driven where the scheduling of execution is managed by the IO system instead of the coroutine or thread scheduler. This way coroutines do not block and the amount of blocked threads is reduced.

Conclusion

Threads and coroutines have different tradeoffs with the balances shifting to coroutines over the last decade. Many problems coroutines had were mitigated by multi-server and multiprocessing, event driven IO and intelligent runtimes like in Go, which schedules IO to it’s own thread and which support multiprocessing coroutines.

Coroutines with an intelligent, state-of-the-art runtime like with Go have many benefits of coroutines like low memory overhead and scaling without the classical downsides. With goroutines you now get the best of both worlds, other coroutine runtimes might still suffer the old downsides as most others aren’t as sophisticated as the Go scheduler.