Concurrency in Golang - KZHU.ai 🚀

A big property of Go language is that concurrency is built into the language. Concurrency and parallelism are two closely related ideas.

Parallel execution is when two programs execute at exactly the same time. Generally, one CPU core runs one instruction at a time. So, if you want to have actual parallel execution, you need two CPUs or at least two CPU cores. By using parallel execution, some tasks may complete more quickly, giving a better throughput overall. However some tasks still must be performed sequentially, they can not be parallelized.

Programming with parallelism is very hard, how can we achieve speedup without parallelism?

Designing faster processors (higher clock rate) is an option, but this option does not work anymore in recent years, since Moore’s Law (an observation that transistor density would double every 2 years) has died.
Pack more cache into processors. Because memory is always slower than processors (known as Von Neumann Bottleneck), even though processor is super fast, memory still affects the speed of execution. People use fast cache to alleviate the delay and speed things up.

Power and Temperature Problem

Nowadays though, everything’s portable running off of a battery, and density has gotten high. Transistor density increase, which leads to a speedup and performance improvement, can’t continue. These transistors consume a chunk of power, and the power has now is becoming a critical issue, called power wall. High power leads to high temperature. If you don’t dissipate the heat, the chip just physically melts. Generic equation of power is below, where:

α is percentage of switching from 0 to 1 or vice versa
C is capacitance which relates to the size of transistors
F is the clock frequency
V is voltage swing (from low to high)

P = α C F V²

Dennard Scaling states that voltage should scale with transistor size. As the transistors get smaller and you get more density, more transistors on a chip, you would also like to scale down the voltage at the same time. You want to have Dennard Scaling, however like Moore’s Law, you can not continue forever. Voltage can’t go too low for physical reasons:

The voltage swing between low and high has to be higher than the threshold voltage of the transistor (below a certain voltage, they can not switch on).
Noise occurs. Having a big voltage swing, you can still tell the difference when switching, even though there are noises. A small voltage swing will make you less noisy tolerant.
Another kind of power called leakage power, which the transistor leaks off even when it’s no switching. In the past, insulators are thick, it is hard for leakage to happen, but as you scale everything down, these insulators become thinner and thinner and leakage can occur.

So because of the power wall, you can’t increase the frequency, not without melting things. You have to improve performance by other ways.

Multi-Core Systems

A processor core basically executes a program, and you can have multiple of them. Together they’re still increasing density of a processor, by replicating hardware on the chip. But they don’t increase the frequency. Now, the thing about having a multi-core system is is that you have to have parallel / concurrent execution to exploit multi-core systems. You have to be able to take your program divided amongst a bunch of cores and execute different code concurrently on the different cores.

So that is really the motivation. This is why concurrency is so important nowadays. Because the programmer has to tell the program how it’s going to be divided amongst cores. That’s really what concurrent programming is doing.

Concurrent Execution

Concurrent execution is not necessarily executing at the same time. We say two tasks are concurrent if the start time and the end time of the tasks overlap, but they are not executing necessary at the same time.

	Concurrent			Parallelism
	Task1	Task2		Task1	Task2
Time1	starts executes pauses		Time1	starts executes
Time2		starts executes ends	Time2	executes	starts executes ends
Time3	resumes executes ends		Time3	executes ends

Parallel tasks must be executed on different hardware, meanwhile concurrent tasks may be executed on the same hardware but may not. This process of mapping the task to hardware is not directly controlled by the programmer. For instance, you’ve got task 1, 2, 3, … n that you need to perform on some hardware core 1, 2, 3, 4, determining which core is performing which task (hardware mapping) is not directly controlled by the Golang programmer (and most languages either).

The Golang programmer describes what can be executed in parallel, not what will be executed in parallel. The latter one depends on how the tasks are mapped to hardware (not directly in the control of the program, but left to:

Operating system
Go runtime scheduler

Concurrency improves performance even without parallelism (say, you have only 1 core processor), because typically, tasks have to periodically wait for some event, which is slow, like input / output things. It the current task has to wait for 100 cycles, it would be better you would have that processor core executing some other useful code. This is called hiding latency – you can hide latency by just taking another task that’s ready to run and execute that, while you’re waiting on an event for the first task.

Processes and Scheduling

A process is basically an instance of a running program. Operating system allows many processes to execute concurrently without bumping into each other, and makes sure that these processes get fair use of the processor and other resources. The user has the impression of parallelism even if it’s not parallel. There are many different scheduling algorithms for operating systems to handle prioritization, and meeting deadlines. Scheduling is the main task in operating system.

The act of switching processes is called a context switch. Basically what’s happening here is:

You’re running a process A, when you want to stop it and to start running process B, you take the current state (context) of process A and save it somewhere for later use.
Then you have to bring in the state (context) of process B, let process B run.
Later when you want go back to A, you take that state (context) of process A that you saved so it can start from where it left off.

Threads

Context switching requires a lot of memory access, it can be long and expensive. One process can have multiple threads inside it. A thread is like a process, but it has less context, it shares some of the contexts with other threads in the process.

Each thread is associated with a piece of code executing, its unique stack, its unique data register.
All threads share the same virtual memory space and file descriptors, etc.

Now what happens in operating systems is that they, instead of scheduling processes, they scheduled thread by thread.

Goroutines

Goroutines are basically a thread but in Go. Many Goroutines execute within a single operating system thread. You can have multiple Goroutines that are all executing in that same thread. While from the operating system point of view, it’s scheduling threads, but inside Go, Go can switch basically these Goroutines which are like “threads” inside the real thread. The scheduling of Goroutines is done by what is called the Go Runtime Scheduler, which is like a mini OS inside a single thread.

Go Runtime Scheduler uses a logical processor, which is mapped to a thread. By default, you have one logical processor are mapped to a main thread and then the Goroutines are going to run on that logical processor, which means it’s running in that main thread.

As a programmer, you can increase the number of logical processors, say 2 or 4, according to how many cores you have. Go Runtime Scheduler will take these Goroutines and map them to different logical processors, each logical processor can be mapped to a different operating system thread and the operating system made those threads to different cores.

Process
- Thread (scheduled by OS)
  - Logical processor (specified by programmer according to number of cores)
    - Go Runtime Scheduler
      - Goroutines (scheduled by Go)

One Goroutine is created automatically to execute the main() function. Other Goroutines are created using the go keyword.

func main() { // main Goroutine
  a = 1
  go foo() // create a new Goroutine to execute foo()
  b = 1
}

A Goroutine exits when its code is complete. When the main Goroutine is complete, all other Goroutines exit, they might exit early before they finish their job. You need formal synchronization constructs.

Interleavings

Writing concurrent code is hard, because it’s hard to have a mental model of the state of the program at any particular time. Order of execution within a task is known, say both task A and B will execute step 1, 2, 3. But the order of execution between concurrent tasks (both A and B) is now known, and interleaving of instructions can be different each time you run it. For example, some possible interleavings of task A and B:

A1, B1, A2, B2, A3, B3
A1, B1, B2, A2, A3, B3
A1, B1, A2, A3, B2, B3
...

Interleaving is even harder to deal with, because it does not happy at Go source code level, it happens at machine code level. That makes it even harder to get a handle on all the possibilities.

Race Conditions

A race condition is a problem that can happen as a result of all these possible interleaving that you have to consider. Remember that the interleavings are not deterministic, you don’t know what the interleaving are. However, you want a program that whose result is always deterministic. You have to make sure that your program and it’s outcome does not depend on these interleavings, if it does, that’s called a race condition. For example, in code below, outcome depends on non-deterministic interleaving:

Interleaving 1		Interleaving 2
Task A	Task B	Task A	Task B
x = 1		x = 1
print(x)			x = x + 1
	x = x + 1	print(x)

Race conditions occur due to communication between tasks or goroutines, in the example, it is the shared variable x. If you had no communication between two different tasks, you would never have a race condition. The outcome of tasks would be completely independent of the possible interleaving.

Synchronization

Synchronization is when multiple threads agree on the timing of global events. Typically, Goroutines are largely (not completely) independent to each other. One Goroutine doesn’t know anything about the timing of it’s fellow Go routines. If a Goroutine is executing some code, it knows what line of code it’s at, it doesn’t have any idea what line of code the other Goroutines are at and it doesn’t care.

Synchronization makes some kind of global events that every Goroutine sees at the same time. These are important to restrict ordering, to restrict some of the interleavings that are possible, in other words to make certain interleavings (that we don’t want) impossible.

Interleaving
Task A	Task B
x = 1
	x = x + 1 executes GLOBAL EVENT
if GLOBAL EVENT happens print(x)

Every time we use synchronization, we are reducing the amount of possible concurrency and the options for the scheduler so the scheduler won’t be able to use the hardware as well as effectively. So there might be times when nothing can execute because we’re waiting on the synchronization events.

The sync Package contains functions to synchronize between Goroutines.

`sync.WaitGroup`

sync.WaitGroup is a particular type of synchronization that is common, it forces a Goroutine to wait for other Goroutines. WaitGroup is basically a group of Goroutines that your Goroutine is waiting on, your Goroutine will not continue until they’ve all finished. WaitGroup contains an internal counter:

increment the counter for each Goroutine to wait for.
Decrement the counter when each Goroutine completes.
The waiting Goroutine can not continue until the counter reaches 0

`// main Goroutine`	`// the new Goroutine`
`func main() { var wg sync.WaitGroup wg.Add(1) // increment the counter go foo(&wg) // create a new Goroutine to execute foo() wg.Wait() // block until the counter becomes zero }`	`func foo(wg *sync.WaitGroup) { ... wg.Done() // decrement the counter ... }`

Goroutine

Goroutines work together to perform a bigger task and communicate to send data to collaborate. For instance, the main Goroutine needs to send data to sub-Goroutines to do some calculation, when the calculation is done the sub-Goroutines need to send results back to the main Goroutine.

main Goroutine --- data --→ sub-Goroutines
       ↑-------- results ----------↓

The communication might be more complicated, it doesn’t just have to happen at the start and the end of a Goroutine, they can happen right in the middle of execution. So channels are used.

Communications on Channels

Channels are used to transfer data between Goroutines. Channels are typed and transfer typed data. When you create a channel, you create it with a certain type. For instance, code below creates a channel of type int, which transfers integer information. Once you got the channel, you can send and receive data using the arrow operator <-.

// use make() to create a channel
c := make(chan int)

// send data to a channel
c <- 3

// receive data from a channel
x := <- c

Certainly, when you create a Goroutine, you can send data using arguments to the function. But that’s just initial, anytime after the starting of the Go routine, if you’re going to send data back and forth, you have to use a channel. Here is an example:

func product(v1 int, v2 int, c chan int) {
  c <- v1 * v2
}

func main() {
  c := make(chan int)
  go prod(1, 2, c)
  go prod(3, 4, c)
  a := <- c
  b := <- c
  fmt.Println(a * b)
}

Unbuffered Channels

By default, a channel is unbuffered, which cannot hold data in transit. An obverse drawback of unbuffered channel is that, since we don’t want to lose data:

the sending has to block until the data is sent (received on the receiving end), and
the receiving also has to block until the data is received (sent on the sending end

For instance, task 1 sits (blocks) at the sending instruction until task 2 reaches its receiving instruction.

Task 1	Task 2
`c <- 3`
	// long later… data can be transmitted // only when this line is executed
	`x := <- c`
// task 1 can continue

The blocking also happens if Task 2 hits its receive first, if there is nothing to receive, it just blocks there until Task 1 sends data.

Task 1	Task 2
	`x := <- c`
// long later… data can be transmitted // only when this line is executed
`c <- 3`
	// task 2 can continue

NOTE: when use a channel in this way, it’s also doing synchronization. Because Task 1 has to wait for Task 2 to receive or Task 2 has to wait for Task 1 to send. It means that you can use this channel communication for just the synchronization and throw away the received result.

Buffered Channels

Channels can have some capacity. Channels can contain a limited number of objects in transit if you want. The capacity of a channel is the number of objects it can hold in transit. The way you define this as an optional argument to the make function.

c := make(chan int, 3) // the buffer is size 3

Now, the send and receives block under different conditions:

Sending blocks if the buffer is full.
Receiving blocks if the buffer is empty.

The main reason to use buffering is so the sender and the receiver don’t need to operate at exactly the same speed. If you don’t have a buffer (unbuffered channel), the sender and the receiver have to be in lockstep, meaning, the sender has to block until the receive happens, the receive blocks until the send happens, which reduces the amount of concurrency.

Iterating Through Channels

A common operation on a channel is to iterate through the channel. So this would happen in a scenario where you had a producer and consumer, the consumer is receiving from a channel, and it just wants to continually receive data from the channel and then process it in some way. The for loop quits when the sender, who is doing the sending, close() the channel. Then the receiver realizes close, and this for loop will end.

// use range keyword to continually read from channel
// until a close happens
for i := range c {
  fmt.Println(i)
}

Multiple channels may be used to receive from multiple sources, and maybe you need data from multiple channels:

a := <- c1
b := <- c2
fmt.Println(a * b)

Sometimes you don’t have to read from all of them, you just needs one of them. In this scenario, whichever one comes first, it is first served. You don’t want to read from both channel 1 and channel 2, since if you read from both of them then you’re blocked on one of them. We use a select statement like this, which allows you to wait for the first data from a set of channels. Whichever one of those c1 and c2 happens first, that’s the one that’s going to get executed.

select {
  case a := <- c1:
    fmt.Println(a)
  case b := <- c2:
    fmt.Println(b)
}

We can also block on sending data too, we can select either send or receive operations.

select {
  case a = <- input_channel:
    fmt.Println("Received a")
  case output_channel <- b:
    fmt.Println("Sent b")
}

One common use of a select is to have a separate abort channel. You may want to receive data until an abort signal is received.

for {
  select {
    case a <- channel:
      fmt.Println(a)
    case <- abort_channel:
      return
  }
}

You may want a default operation to avoid blocking.

select {
  case a = <- channel_1:
    fmt.Println(a)
  case b = <- channel_2:
    fmt.Println(b)
  default:
    fmt.Println("nop")
}

Mutual Exclusion

Sharing variables between Goroutines can cause problems. Two Goroutines writing to a shared variable, can interfere with each other. A Goroutine is said to be concurrency-safe, if it can be executed concurrently with other Goroutines, without interfering improperly with the other goroutines.

var i int = 0
var wg sync.WaitGroup

func inc() {
  i = i + 1 // this is not concurrency-safe
  wg.Done()
}

func main() {
  wg.Add(2)
  go inc()
  go int()
  wg.Wait()
  fmt.Println(i)
}

Recall that, for each one of Golang source code instructions, there can be a sequence of machine code instructions that correspond to it. And concurrency is at the machine code level, not the Golang source code level. So, interleaving is not interleaving of Golang source code instructions, where it’s actually getting interleaved at the underlying machine code instructions.

Golang source code	Corresponding machine code example
`i = i + 1`	`read i increment write i`

You need to restrict the possible interleavings of these goroutines, to ensure that access to the shared variables cannot be interleaved. Using sync.Mutex, you as a programmer have to declare code segments in different Goroutines which cannot execute concurrently, so that they cannot be interleaved.

Lock() method puts the flag up, meaning the shared variable is in use. If the flag is already put up by another Goroutine, the Lock() method will blocks until the flag is put down.
Unlock() method puts the flag down, meaning the shared variable is available. When Unlock() is called by another Goroutine, the Lock() method can proceed.

In this way, as long as every Goroutine calls Lock() at the beginning of it’s mutually exclusive region and calls Unlock() at the end, then, it ensures that only one of these Goroutines can be inside this mutually exclusive region.

This is how we could fix this incrementing problem.

var i int = 0
var mut sync.Mutex

func inc() {
  mut.Lock()
  i = i + 1
  mut.Unlock()
}
...

Synchronous Initialization

By definition initialization is something that should happen once, and it has to happen before everything else happens. Often you don’t know which Goroutine should you put the initialization in, because you can’t guarantee the order of execution of these multiple Goroutines. One way is to actually perform initialization before starting the Goroutines, say do your initialization inside the main().

Another facility that allows you to do this, as part of the sync package is sync.Once, which has one method Do(some_function), and you pass it some function as an argument. You can put this in lots of different Goroutines, and this function some_function will only be executed one time. It also guarantees that all calls to Do(...) inside any Goroutine, they block until the first one that executes actually returns. That makes sure that the initialization execute first before anything else can proceed. So you can put this Do(...) at the beginning of all your Goroutines.

var wg sync.WaitGroup
var on sync.Once

func setup() { // should be executed only once
  fmt.Println("Init")
}

func dostuff() {
  on.Do(setup)
  fmt.Println("Hello")
  wg.Done()
}

func main() {
  wg.Add(2)
  go dostuff()
  go dostuff()
  wg.Wait()
}

// Execution result
Init
Hello
Hello

Deaklock

Deadlock comes from synchronization dependencies. You have multiple Goroutines and synchronization can cause Goroutines execution to depend on each other. Circular dependencies cause all involved Groutines to block. This can be caused by:

waiting on channels
waiting on mutexs

Here is an example o deadlock, suppose all Goroutines will execute this function:

func dostuff(c1 chan int, c2 chan int) {
  <- c1    // read from first channel, wait somebody else to write
  c2 <- 1  // write to second channel, wait somebody else to read
  wg.Done()
}

func main() {
  ch1 := make(chan int)
  ch2 := make(chan int)
  wg.Add(2)
  go dostuff(ch1, ch2) // argument order is swapped
  go dostuff(ch2, ch1) // each Goroutine is blocked
  wg.Wait()
}

Golang runtime automatically detects the deadlock situation that all Goroutines are stopped. However, the Golang runtime cannot detect when only a subset of Goroutines are deadlocked. You won’t know it in any obvious way, it’s just your program won’t operate correctly.

My Certificate

For more on Concurrency in Golang, please refer to the wonderful course here https://www.coursera.org/learn/golang-concurrency

My #27 course certificate from Coursera

I am Kesler Zhu, thank you for visiting my website. Check out more course reviews at https://KZHU.ai