Atomics
Atomics, typically a compiler intrinsic, platform/hardware dependent(x86, risc-v, arm etc), OS dependent.
typically at least a pointer size – in rust a usize
Usecases/Patterns
- stop flag/locking primitive/aka naive mutex + busy wait
- thread progress reporting
- synchronization/barrier fence etc
A SeqCst fence is both a release fence and an acquire fence (just like AcqRel), but also part of the single total order of sequentially consistent operations. However, only the fence is part of the total order, but not necessarily the atomic operations before or after it.
- lazy initialization
Orderings
- Relaxed: total modification order
- Release & Acquire: happens-before btw thread A & B
- AcqRel, SeqCst:
- *Consume
A happens-before relationship is formed when an acquire-load operation observes the result of a release-store operation. In this case, the store and everything before it, happened before the load and everything after it.
- go mem: https://research.swtch.com/gomm
- overview: https://doc.rust-lang.org/nightly/nomicon/atomics.html
- super helpful summary(formalism): https://gist.github.com/kprotty/bb26b963441baf2ab3486a07fbf4762e
- c++ mem: https://en.cppreference.com/w/cpp/atomic/memory_order
Load and Store
impl AtomicI32 {
pub fn load(&self, ordering: Ordering) -> i32;
pub fn store(&self, value: i32, ordering: Ordering);
}
Building a lock with load/store:
As you can see from this interleaving, with timely (untimely?) interrupts, we can easily produce a case where both threads set the flag to 1 and both threads are thus able to enter the critical section. This behavior is what professionals call “bad” – we have obviously failed to provide the most basic requirement: providing mutual exclusion.
The performance problem, which we will address more later on, is the fact that the way a thread waits to acquire a lock that is already held: it endlessly checks the value of flag, a technique known as spin-waiting. Spin-waiting wastes time waiting for another thread to release a lock. The waste is exceptionally high on a uniprocessor, where the thread that the waiter is waiting for cannot even run (at least, until a context switch oc- curs)! Thus, as we move forward and develop more sophisticated solu- tions, we should also consider ways to avoid this kind of waste.
Fetch and Modify (test-and-set, fetch-add, etc)
- can impose ‘ordering’ of a ‘happens before’
- viz can make a binary mutex
- can determine thread progress
- statistics
impl AtomicI32 {
pub fn fetch_add(&self, v: i32, ordering: Ordering) -> i32;
pub fn fetch_sub(&self, v: i32, ordering: Ordering) -> i32;
pub fn fetch_or(&self, v: i32, ordering: Ordering) -> i32;
pub fn fetch_and(&self, v: i32, ordering: Ordering) -> i32;
pub fn fetch_nand(&self, v: i32, ordering: Ordering) -> i32;
pub fn fetch_xor(&self, v: i32, ordering: Ordering) -> i32;
pub fn fetch_max(&self, v: i32, ordering: Ordering) -> i32;
pub fn fetch_min(&self, v: i32, ordering: Ordering) -> i32;
pub fn swap(&self, v: i32, ordering: Ordering) -> i32; // "fetch_store"
}
Compare and Swap/Compare and Exchange
CAS is a super flexible primitive, can build most things out of it. ABA problem bites you though, via logical vs physical “happens-before”
impl AtomicI32 {
pub fn compare_exchange(&self, expected: i32, new: i32) -> Result<i32, i32> {
// In reality, the load, comparison and store,
// all happen as a single atomic operation.
let v = self.load();
if v == expected {
// Value is as expected.
// Replace it and report success.
self.store(new);
Ok(v)
} else {
// The value was not as expected.
// Leave it untouched and report failure.
Err(v)
}
}
}
CAS Loop lock-free, but not wait-free, dist-sys context, but same principle, wait-free affects availability/liveness:
func (l *replicatedLog) acquireLease(kv *maelstrom.KV) int {
l.global.Lock()
defer l.global.Unlock()
ctx, cancel := context.WithDeadline(context.Background(), time.Now().Add(400*time.Millisecond))
var count int
defer cancel()
err := errors.New("busy wait")
for err != nil {
previous, _ := kv.Read(ctx, "monotonic-counter")
if previous == nil {
previous = 0
count = 1
} else {
count = previous.(int) + 1
}
err = kv.CompareAndSwap(ctx, "monotonic-counter", previous, count, true)
}
return count
}
Monotonic Counter, weak ordering:
fn increment(a: &AtomicU32) {
let mut current = a.load(Relaxed);
loop {
let new = current + 1;
match a.compare_exchange(current, new, Relaxed, Relaxed) {
Ok(_) => return,
Err(v) => current = v,
}
}
}
Locks - Mutexes/ RwLock etc
design axis:
- fairness (FIFO) vs starvation
- correctness (mutual exlusion)
- performance (syscall overhead, space etc)
- priority inversion
mutex lock/unlock atomic:
use std::sync::atomic::AtomicBool;
use std::sync::atomic::Ordering::{Acquire, Relaxed, Release};
use std::thread;
static mut DATA: String = String::new();
static LOCKED: AtomicBool = AtomicBool::new(false);
fn mu() {
if LOCKED
.compare_exchange(false, true, Acquire, Relaxed)
.is_ok()
{
// Safety: We hold the exclusive lock, so nothing else is accessing DATA.
unsafe { DATA.push('!') };
LOCKED.store(false, Release);
}
}