Barrier Synchronization
Barrier Synchronization Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit
Simple Video Game
Art of Multiprocessor Programming 2 Simple Video Game Prepare frame for display By graphics coprocessor “soft real-time” application Need at least 35 frames/second OK to mess up rarely
Simple Video Game
Art of Multiprocessor Programming 3 Simple Video Game while ( true ) { frame.prepare(); frame.display(); }
Simple Video Game
Art of Multiprocessor Programming 4 Simple Video Game while ( true ) { frame.prepare(); frame.display(); } What about overlapping work? 1 st thread displays frame 2 nd prepares next frame
Two-Phase Rendering
Art of Multiprocessor Programming 5 Two-Phase Rendering while ( true ) { if (phase) { frame[0].display(); } else { frame[1].display(); } phase = !phase; } while ( true ) { if (phase) { frame[1].prepare(); } else { frame[0].prepare(); } phase = !phase; }
Two-Phase Rendering
Art of Multiprocessor Programming 6 Two-Phase Rendering while (true) { if (phase) { frame[0].display(); } else { frame[1].display(); } phase = !phase; } while (true) { if (phase) { frame[1].prepare(); } else { frame[0].prepare(); } phase = !phase; } even phases
Two-Phase Rendering
Art of Multiprocessor Programming 7 Two-Phase Rendering while (true) { if (phase) { frame[0].display(); } else { frame[1].display(); } phase = !phase; } while (true) { if (phase) { frame[1].prepare(); } else { frame[0].prepare(); } phase = !phase; } odd phases
Synchronization Problems
Art of Multiprocessor Programming 8 Synchronization Problems How do threads stay in phase? Too early? “we render no frame before its time” Too late? Recycle memory before frame is displayed
Ideal Parallel Computation
Art of Multiprocessor Programming 9 Ideal Parallel Computation 0 0 0 1 1 1
Ideal Parallel Computation
Art of Multiprocessor Programming 10 Ideal Parallel Computation 2 2 2 1 1 1
Real-Life Parallel Computation
Art of Multiprocessor Programming 11 Real-Life Parallel Computation 0 0 0 1 1 zzz…
Real-Life Parallel Computation
Art of Multiprocessor Programming 12 Real-Life Parallel Computation 2 1 1 zzz… Uh, oh 0
Barrier Synchronization
Art of Multiprocessor Programming 13 Barrier Synchronization 0 0 0 barrier
Barrier Synchronization
Art of Multiprocessor Programming 14 Barrier Synchronization 1 1 1 barrier
Barrier Synchronization
Art of Multiprocessor Programming 15 Barrier Synchronization No thread enters here Until every thread has left here barrier
Why Do We Care?
Art of Multiprocessor Programming 16 Why Do We Care? Mostly of interest to Scientific & numeric computation Elsewhere Garbage collection Less common in systems programming Still important topic
Duality
Art of Multiprocessor Programming 17 Duality Dual to mutual exclusion Include others, not exclude them Same implementation issues Interaction with caches … Invalidation? Local spinning?
Example: Parallel Prefix
Art of Multiprocessor Programming 18 Example: Parallel Prefix a b c d a a+b a+b+c a+b+c +d before after
Parallel Prefix
Art of Multiprocessor Programming 19 Parallel Prefix a b c d One thread Per entry
Parallel Prefix: Phase 1
Art of Multiprocessor Programming 20 Parallel Prefix: Phase 1 a b c d a a+ b b +c c +d
Parallel Prefix: Phase 2
Art of Multiprocessor Programming 21 Parallel Prefix: Phase 2 a b c d a a+b a +b+c a+b +c +d
Parallel Prefix
Art of Multiprocessor Programming 22 Parallel Prefix N threads can compute Parallel prefix Of N entries In log 2 N rounds What if system is asynchronous? Why we need barriers
Prefix
Art of Multiprocessor Programming 23 Prefix class Prefix extends Thread { int [] a; int i; Barrier b; void Prefix( int [] a, Barrier b, int i) { a = a; b = b; i = i; }
Prefix
class Prefix extends Thread { int [] a; int i; Barrier b; void Prefix(int[] a, Barrier b, int i) { a = a; b = b; i = i; } Art of Multiprocessor Programming 24 Prefix Array of input values
Prefix
class Prefix extends Thread { int[] a; int i; Barrier b; void Prefix(int[] a, Barrier b, int i) { a = a; b = b; i = i; } Art of Multiprocessor Programming 25 Prefix Thread index
Prefix
class Prefix extends Thread { int[] a; int i; Barrier b; void Prefix(int[] a, Barrier b, int i) { a = a; b = b; i = i; } Art of Multiprocessor Programming 26 Prefix Shared barrier
Prefix
class Prefix extends Thread { int[] a; int i; Barrier b; void Prefix(int[] a, Barrier b, int i) { a = a; b = b; i = i; } Art of Multiprocessor Programming 27 Prefix Initialize fields
Where Do the Barriers Go?
Art of Multiprocessor Programming 28 Where Do the Barriers Go? public void run() { int d = 1, sum = 0; while (d < N) { if (i >= d) sum = a[i-d]; if (i >= d) a[i] += sum; d = d * 2; }}}
Where Do the Barriers Go?
Art of Multiprocessor Programming 29 Where Do the Barriers Go? public void run() { int d = 1, sum = 0; while (d < N) { if (i >= d) sum = a[i-d]; b.await(); if (i >= d) a[i] += sum; d = d * 2; }}} Make sure everyone reads before anyone writes
Where Do the Barriers Go?
Art of Multiprocessor Programming 30 Where Do the Barriers Go? public void run() { int d = 1, sum = 0; while (d < N) { if (i >= d) sum = a[i-d]; b.await(); if (i >= d) a[i] += sum; b.await(); d = d * 2; }}} Make sure everyone writes before anyone reads Make sure everyone reads before anyone writes
Barrier Implementations
Art of Multiprocessor Programming 31 Barrier Implementations Cache coherence Spin on locally-cached locations? Spin on statically-defined locations? Latency How many steps? Symmetry Do all threads do the same thing?
Barriers
Art of Multiprocessor Programming 32 Barriers public class Barrier { AtomicInteger count; int size; public Barrier( int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}}
Barriers
public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 33 Barriers Number of threads not yet arrived
Barriers
public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 34 Barriers Number of threads participating
Barriers
public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 35 Barriers Initialization
Barriers
public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 36 Barriers Principal method
Barriers
public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 37 Barriers If I’m last, reset fields for next time
Barriers
public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 38 Barriers Otherwise, wait for everyone else
Barriers
public class Barrier { AtomicInteger count; int size; public Barrier( int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 39 Barriers What’s wrong with this protocol?
Reuse
Art of Multiprocessor Programming 40 Reuse Barrier b = new Barrier(n); while ( mumble() ) { work(); b.await() } do work synchronize repeat
Barriers
public class Barrier { AtomicInteger count; int size; public Barrier( int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 41 Barriers
Barriers
public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 42 Barriers Waiting for Phase 1 to finish
Barriers
public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 43 Barriers Waiting for Phase 1 to finish Phase 1 is so over
Barriers
public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 44 Barriers ZZZZZ …. Prepare for phase 2
Uh-Oh
public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming Uh-Oh Waiting for Phase 1 to finish Waiting for Phase 2 to finish
Basic Problem
Art of Multiprocessor Programming 46 Basic Problem One thread “wraps around” to start phase 2 While another thread is still waiting for phase 1 One solution: Always use two barriers
Sense-Reversing Barriers
Art of Multiprocessor Programming 47 Sense-Reversing Barriers public class Barrier { AtomicInteger count; int size; boolean sense = false ; threadSense = new ThreadLocal< boolean >… public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}}
Sense-Reversing Barriers
public class Barrier { AtomicInteger count; int size; boolean sense = false ; threadSense = new ThreadLocal<boolean>… public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Art of Multiprocessor Programming 48 Sense-Reversing Barriers Completed odd or even-numbered phase?
Sense-Reversing Barriers
public class Barrier { AtomicInteger count; int size; boolean sense = false; threadSense = new ThreadLocal< boolean >… public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Art of Multiprocessor Programming 49 Sense-Reversing Barriers Store sense for next phase
Sense-Reversing Barriers
public class Barrier { AtomicInteger count; int size; boolean sense = false; threadSense = new ThreadLocal<boolean>… public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Art of Multiprocessor Programming 50 Sense-Reversing Barriers Get new sense determined by last phase
Sense-Reversing Barriers
public class Barrier { AtomicInteger count; int size; boolean sense = false; threadSense = new ThreadLocal<boolean>… public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Art of Multiprocessor Programming 51 Sense-Reversing Barriers If I’m last, reverse sense for next time
Sense-Reversing Barriers
public class Barrier { AtomicInteger count; int size; boolean sense = false; threadSense = new ThreadLocal<boolean>… public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Art of Multiprocessor Programming 52 Sense-Reversing Barriers Otherwise, wait for sense to flip
Sense-Reversing Barriers
public class Barrier { AtomicInteger count; int size; boolean sense = false; threadSense = new ThreadLocal<boolean>… public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Art of Multiprocessor Programming 53 Sense-Reversing Barriers Prepare sense for next phase
Combining Tree Barriers
Art of Multiprocessor Programming 54 2-barrier 2-barrier 2-barrier Combining Tree Barriers
Combining Tree Barriers
Art of Multiprocessor Programming 55 2-barrier 2-barrier 2-barrier Combining Tree Barriers
Combining Tree Barrier
Art of Multiprocessor Programming 56 Combining Tree Barrier public class Node{ AtomicInteger count; int size; Node parent; volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null ) parent.await() count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}}
Combining Tree Barrier
public class Node{ AtomicInteger count; int size; Node parent; volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) parent.await() count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Art of Multiprocessor Programming 57 Combining Tree Barrier Parent barrier in tree
Combining Tree Barrier
public class Node{ AtomicInteger count; int size; Node parent; volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) parent.await() count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Art of Multiprocessor Programming 58 Combining Tree Barrier Am I last?
Combining Tree Barrier
public class Node{ AtomicInteger count; int size; Node parent; volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null ) parent.await(); count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Art of Multiprocessor Programming 59 Combining Tree Barrier Proceed to parent barrier
Combining Tree Barrier
public class Node{ AtomicInteger count; int size; Node parent; volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) parent.await() count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Art of Multiprocessor Programming 60 Combining Tree Barrier Prepare for next phase
Combining Tree Barrier
public class Node{ AtomicInteger count; int size; Node parent; volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) parent.await() count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Art of Multiprocessor Programming 61 Combining Tree Barrier Notify others at this node
Combining Tree Barrier
public class Node{ AtomicInteger count; int size; Node parent; volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) { parent.await() count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Art of Multiprocessor Programming 62 Combining Tree Barrier I’m not last, so wait for notification
Combining Tree Barrier
Art of Multiprocessor Programming 63 Combining Tree Barrier No sequential bottleneck Parallel getAndDecrement() calls Low memory contention Same reason Cache behavior Local spinning on bus-based architecture Not so good for NUMA
Remarks
Art of Multiprocessor Programming 64 Remarks Everyone spins on sense field Local spinning on bus-based (good) Network hot-spot on distributed architecture (bad) Not really scalable
Tournament Tree Barrier
Art of Multiprocessor Programming 65 Tournament Tree Barrier If tree nodes have fan-in 2 Don’t need to call getAndDecrement() Winner chosen statically At level i If i -th bit of id is 0 , move up Otherwise keep back
Tournament Tree Barriers
Art of Multiprocessor Programming 66 Tournament Tree Barriers winner loser winner loser winner loser root
Tournament Tree Barriers
Art of Multiprocessor Programming 67 Tournament Tree Barriers All flags blue
Tournament Tree Barriers
Art of Multiprocessor Programming 68 Tournament Tree Barriers Loser thread sets winner’s flag
Tournament Tree Barriers
Art of Multiprocessor Programming 69 Tournament Tree Barriers Loser spins on own flag
Tournament Tree Barriers
Art of Multiprocessor Programming 70 Tournament Tree Barriers Winner spins on own flag
Tournament Tree Barriers
Art of Multiprocessor Programming 71 Tournament Tree Barriers Winner sees own flag, moves up, spins
Tournament Tree Barriers
Art of Multiprocessor Programming 72 Tournament Tree Barriers Bingo!
Tournament Tree Barriers
Art of Multiprocessor Programming 73 Tournament Tree Barriers Sense-reversing: next time use blue flags
Tournament Barrier
Art of Multiprocessor Programming 74 Tournament Barrier class TBarrier { boolean flag; TBarrier partner; TBarrier parent; boolean top; }
Tournament Barrier
Art of Multiprocessor Programming 75 Tournament Barrier class TBarrier { boolean flag; TBarrier partner; TBarrier parent; boolean top; } Notifications delivered here
Tournament Barrier
Art of Multiprocessor Programming 76 Tournament Barrier class TBarrier { boolean flag; TBarrier partner; TBarrier parent; boolean top; } Other thead at same level
Tournament Barrier
Art of Multiprocessor Programming 77 Tournament Barrier class TBarrier { boolean flag; TBarrier partner; TBarrier parent; boolean top; } Parent (winner) or null (loser)
Tournament Barrier
Art of Multiprocessor Programming 78 Tournament Barrier class TBarrier { boolean flag; TBarrier partner; TBarrier parent; boolean top; } Am I the root?
Tournament Barrier
Art of Multiprocessor Programming 79 Tournament Barrier void await( boolean mySense) { if (top) { return ; } else if (parent != null ) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { partner.flag = mySense; while (flag != mySense) {}; }}}
Tournament Barrier
Art of Multiprocessor Programming 80 Tournament Barrier void await ( boolean mySense) { if (top) { return ; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { partner.flag = mySense; while (flag != mySense) {}; }}} Le root, c’est moi Current sense
Tournament Barrier
Art of Multiprocessor Programming 81 Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null ) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { partner.flag = mySense; while (flag != mySense) {}; }}} I am already a winner
Tournament Barrier
Art of Multiprocessor Programming 82 Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { partner.flag = mySense; while (flag != mySense) {}; }}} Wait for partner
Tournament Barrier
Art of Multiprocessor Programming 83 Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { partner.flag = mySense; while (flag != mySense) {}; }}} Synchronize upstairs
Tournament Barrier
Art of Multiprocessor Programming 84 Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { partner.flag = mySense; while (flag != mySense) {}; }}} Inform partner
Tournament Barrier
Art of Multiprocessor Programming 85 Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { partner.flag = mySense; while (flag != mySense) {}; }}} Inform partner Order is important (why?)
Tournament Barrier
Art of Multiprocessor Programming 86 Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { partner.flag = mySense; while (flag != mySense) {}; }}} Natural-born loser
Tournament Barrier
Art of Multiprocessor Programming 87 Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { partner.flag = mySense; while (flag != mySense) {}; }}} Tell partner I’m here
Tournament Barrier
Art of Multiprocessor Programming 88 Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { partner.flag = mySense; while (flag != mySense) {}; }}} Wait for notification from partner
Remarks
Art of Multiprocessor Programming 89 Remarks No need for read-modify-write calls Each thread spins on fixed location Good for bus-based architectures Good for NUMA architectures
Dissemination Barrier
Art of Multiprocessor Programming 90 Dissemination Barrier At round i Thread A notifies thread A+2 i (mod n) Requires log n rounds
Dissemination Barrier
Art of Multiprocessor Programming 91 Dissemination Barrier +1 +2 +4
Remarks
Art of Multiprocessor Programming 92 Remarks Elegant Good source of homework problems Not cache-friendly
Ideas So Far
Art of Multiprocessor Programming 93 Ideas So Far Sense-reversing Reuse without reinitializing Combining tree Like counters, locks … Tournament tree Optimized combining tree Dissemination barrier Intellectually Pleasing (matter of taste)
Which is best for Multicore?
Art of Multiprocessor Programming 94 Which is best for Multicore? On a cache coherent multicore chip: perhaps none of the above… Here is another (arguably) better algorithm …
Static Tree Barrier
Art of Multiprocessor Programming 95 Static Tree Barrier One node per thread, statically assigned
Static Tree Barrier
Art of Multiprocessor Programming 96