Shared Counters and Parallelism
Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit
A Shared Pool
Art of Multiprocessor Programming 2 A Shared Pool Put Inserts object blocks if full Remove Removes & returns an object blocks if empty public interface Pool { public void put(Object x); public Object remove(); } Unordered set of objects
A Shared Pool
Art of Multiprocessor Programming 3 A Shared Pool Put Insert item block if full Remove Remove & return item block if empty public interface Pool<T> { public void put(T x); public T remove(); }
Simple Locking Implementation
Art of Multiprocessor Programming 4 put Simple Locking Implementation put
Simple Locking Implementation
5 put Simple Locking Implementation put Problem: hot- spot contention
Simple Locking Implementation
6 put Simple Locking Implementation put Problem: hot- spot contention Problem: sequential bottleneck
Simple Locking Implementation
Art of Multiprocessor Programming 7 put Simple Locking Implementation put Problem: hot- spot contention Problem: sequential bottleneck Solution: Queue Lock
Simple Locking Implementation
Art of Multiprocessor Programming 8 put Simple Locking Implementation put Problem: hot- spot contention Problem: sequential bottleneck Solution: Queue Lock Solution :?
Counting Implementation
Art of Multiprocessor Programming 9 Counting Implementation 19 20 21 remove put 19 20 21
Counting Implementation
Art of Multiprocessor Programming 10 Counting Implementation 19 20 21 Only the counters are sequential remove put 19 20 21
Shared Counter
Art of Multiprocessor Programming 11 Shared Counter 3 2 1 0 1 2 3
Shared Counter
Art of Multiprocessor Programming 12 Shared Counter 3 2 1 0 1 2 3 No duplication
Shared Counter
Art of Multiprocessor Programming 13 Shared Counter 3 2 1 0 1 2 3 No duplication No Omission
Shared Counter
Art of Multiprocessor Programming 14 Shared Counter 3 2 1 0 1 2 3 Not necessarily linearizable No duplication No Omission
Shared Counters
Art of Multiprocessor Programming 15 Shared Counters Can we build a shared counter with Low memory contention, and Real parallelism? Locking Can use queue locks to reduce contention No help with parallelism issue …
Software Combining Tree
Art of Multiprocessor Programming 16 Software Combining Tree 4 Contention: All spinning local Parallelism: Potential n/log n speedup
Combining Trees
Art of Multiprocessor Programming 17 Combining Trees 0
Combining Trees
Art of Multiprocessor Programming 18 Combining Trees 0 +3
Combining Trees
Art of Multiprocessor Programming 19 Combining Trees 0 +3 +2
Combining Trees
Art of Multiprocessor Programming 20 Combining Trees 0 +3 +2 Two threads meet, combine sums
Combining Trees
Art of Multiprocessor Programming 21 Combining Trees 0 +3 +2 Two threads meet, combine sums +5
Combining Trees
Art of Multiprocessor Programming 22 Combining Trees 5 +3 +2 +5 Combined sum added to root
Combining Trees
Art of Multiprocessor Programming 23 Combining Trees 5 +3 +2 0 Result returned to children
Combining Trees
Art of Multiprocessor Programming 24 Combining Trees 5 0 0 3 0 Results returned to threads
What if?
Art of Multiprocessor Programming 25 What if? Threads don’t arrive together? Should I stay or should I go? How long to wait? Waiting times add up … Idea: Use multi-phase algorithm Where threads wait in parallel
Combining Status
Art of Multiprocessor Programming 26 Combining Status enum CStatus{ IDLE, FIRST, SECOND, RESULT, ROOT };
Combining Status
enum CStatus{ IDLE, FIRST, SECOND, RESULT, ROOT }; Art of Multiprocessor Programming 27 Combining Status Nothing going on
Combining Status
enum CStatus{ IDLE, FIRST, SECOND, RESULT, ROOT }; Art of Multiprocessor Programming 28 Combining Status 1 st thread is a partner for combining, will return to check for 2 nd thread
Combining Status
enum CStatus{ IDLE, FIRST, SECOND, RESULT, ROOT }; Art of Multiprocessor Programming 29 Combining Status 2 nd thread has arrived with value for combining
Combining Status
enum CStatus{ IDLE, FIRST, SECOND, RESULT, ROOT }; Art of Multiprocessor Programming 30 Combining Status 1 st thread has deposited result for 2 nd thread
Combining Status
enum CStatus{ IDLE, FIRST, SECOND, RESULT, ROOT }; Art of Multiprocessor Programming 31 Combining Status Special case: root node
Node Synchronization
Art of Multiprocessor Programming 32 Node Synchronization Use “Meta Locking:” Short-term Synchronized methods Consistency during method call Long-term Boolean locked field Consistency across calls
Phases
Art of Multiprocessor Programming 33 Phases Precombining Set up combining rendez-vous
Phases
Art of Multiprocessor Programming 34 Phases Precombining Set up combining rendez-vous Combining Collect and combine operations
Phases
Art of Multiprocessor Programming 35 Phases Precombining Set up combining rendez-vous Combining Collect and combine operations Operation Hand off to higher thread
Phases
Art of Multiprocessor Programming 36 Phases Precombining Set up combining rendez-vous Combining Collect and combine operations Operation Hand off to higher thread Distribution Distribute results to waiting threads
Precombining Phase
Art of Multiprocessor Programming 37 Precombining Phase 0 Examine status IDLE
Precombining Phase
Art of Multiprocessor Programming 38 Precombining Phase 0 0 If IDLE, promise to return to look for partner FIRST
Precombining Phase
Art of Multiprocessor Programming 39 Precombining Phase At ROOT,turn back FIRST 0
Precombining Phase
Art of Multiprocessor Programming 40 Precombining Phase 0 FIRST
Precombining Phase
Art of Multiprocessor Programming 41 Precombining Phase 0 0 SECOND If FIRST, I’m willing to combine, but lock for now
Code
Art of Multiprocessor Programming 42 Code Tree class In charge of navigation Node class Combining state Synchronization state Bookkeeping
Precombining Navigation
Art of Multiprocessor Programming 43 Precombining Navigation Node node = myLeaf; while (node.precombine()) { node = node.parent; } Node stop = node;
Precombining Navigation
Art of Multiprocessor Programming 44 Precombining Navigation Node node = myLeaf; while (node.precombine()) { node = node.parent; } Node stop = node; Start at leaf
Precombining Navigation
Art of Multiprocessor Programming 45 Precombining Navigation Node node = myLeaf; while (node.precombine()) { node = node.parent; } Node stop = node; Move up while instructed to do so
Precombining Navigation
Art of Multiprocessor Programming 46 Precombining Navigation Node node = myLeaf; while (node.precombine()) { node = node.parent; } Node stop = node; Remember where we stopped
Precombining Node
Art of Multiprocessor Programming 47 Precombining Node synchronized boolean precombine() { while (locked) wait(); switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default : throw new PanicException() } }
Precombining Node
Art of Multiprocessor Programming 48 synchronized boolean precombine() { while (locked) wait(); switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } } Precombining Node Short-term synchronization
Synchronization
Art of Multiprocessor Programming 49 synchronized boolean precombine() { while (locked) wait(); switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } } Synchronization Wait while node is locked (in use by earlier combining phase)
Precombining Node
Art of Multiprocessor Programming 50 synchronized boolean precombine() { while (locked) wait(); switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } } Precombining Node Check combining status
Node was IDLE
Art of Multiprocessor Programming 51 Node was IDLE synchronized boolean precombine() { while (locked) {wait();} switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } } I will return to look for 2 nd thread’s input value
Precombining Node
Art of Multiprocessor Programming 52 Precombining Node synchronized boolean precombine() { while (locked) {wait();} switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } } Continue up the tree
I’m the 2nd Thread
Art of Multiprocessor Programming 53 I’m the 2 nd Thread synchronized boolean precombine() { while (locked) {wait();} switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } } If 1 st thread has promised to return, lock node so it won’t leave without me
Precombining Node
Art of Multiprocessor Programming 54 Precombining Node synchronized boolean precombine() { while (locked) {wait();} switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } } Prepare to deposit 2 nd thread’s input value
Precombining Node
Art of Multiprocessor Programming 55 Precombining Node synchronized boolean phase1() { while (sStatus==SStatus.BUSY) {wait();} switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } } End of precombining phase, don’t continue up tree
Node is the Root
Art of Multiprocessor Programming 56 Node is the Root synchronized boolean phase1() { while (sStatus==SStatus.BUSY) {wait();} switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } } If root, precombining phase ends, don’t continue up tree
Precombining Node
Art of Multiprocessor Programming 57 Precombining Node synchronized boolean precombine() { while (locked) {wait();} switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default : throw new PanicException() } } Always check for unexpected values!
Combining Phase
Art of Multiprocessor Programming 58 Combining Phase 0 0 SECOND 2 nd thread locks out 1 st until 2 nd returns with value +3
Combining Phase
Art of Multiprocessor Programming 59 Combining Phase 0 0 SECOND 2 nd thread deposits value to be combined, unlocks node, & waits 2 +3 zzz
Combining Phase
Art of Multiprocessor Programming 60 Combining Phase +3 +2 +5 SECOND 2 0 1 st thread moves up the tree with combined value … zzz
Combining (reloaded)
Art of Multiprocessor Programming 61 Combining (reloaded) 0 0 2 nd thread has not yet deposited value … FIRST
Combining (reloaded)
Art of Multiprocessor Programming 62 Combining (reloaded) 0 +3 FIRST 1 st thread is alone, locks out late partner
Combining (reloaded)
Art of Multiprocessor Programming 63 Combining (reloaded) 0 +3 +3 FIRST Stop at root
Combining (reloaded)
Art of Multiprocessor Programming 64 Combining (reloaded) 0 +3 +3 FIRST 2 nd thread’s late precombining phase visit locked out
Combining Navigation
Art of Multiprocessor Programming 65 Combining Navigation node = myLeaf; int combined = 1; while (node != stop) { combined = node.combine(combined); stack.push(node); node = node.parent; }
Combining Navigation
Art of Multiprocessor Programming 66 Combining Navigation node = myLeaf; int combined = 1; while (node != stop) { combined = node.combine(combined); stack.push(node); node = node.parent; } Start at leaf
Combining Navigation
Art of Multiprocessor Programming 67 Combining Navigation node = myLeaf; int combined = 1; while (node != stop) { combined = node.combine(combined); stack.push(node); node = node.parent; } Add 1
Combining Navigation
Art of Multiprocessor Programming 68 Combining Navigation node = myLeaf; int combined = 1; while (node != stop) { combined = node.combine(combined); stack.push(node); node = node.parent; } Revisit nodes visited in precombining
Combining Navigation
Art of Multiprocessor Programming 69 Combining Navigation node = myLeaf; int combined = 1; while (node != stop) { combined = node.combine(combined); stack.push(node); node = node.parent; } Accumulate combined values, if any
Combining Navigation
Art of Multiprocessor Programming 70 node = myLeaf; int combined = 1; while (node != stop) { combined = node.combine(combined); stack.push(node); node = node.parent; } Combining Navigation We will retraverse path in reverse order …
Combining Navigation
Art of Multiprocessor Programming 71 Combining Navigation node = myLeaf; int combined = 1; while (node != stop) { combined = node.combine(combined); stack.push(node); node = node.parent; } Move up the tree
Combining Phase Node
Art of Multiprocessor Programming 72 Combining Phase Node synchronized int combine(int combined) { while (locked) wait(); locked = true ; firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; default : … } }
Combining Phase Node
Art of Multiprocessor Programming 73 synchronized int combine(int combined) { while (locked) wait(); locked = true; firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; default: … } } Combining Phase Node If node is locked by the 2 nd thread, wait until it deposits its value
Combining Phase Node
Art of Multiprocessor Programming 74 synchronized int combine(int combined) { while (locked) wait(); locked = true ; firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; default: … } } Combining Phase Node How do we know that no thread acquires the lock between the two lines? Because the methods are synchronized
Combining Phase Node
Art of Multiprocessor Programming 75 synchronized int combine(int combined) { while (locked) wait(); locked = true ; firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; default: … } } Combining Phase Node Lock out late attempts to combine (by threads still in precombining)
Combining Phase Node
Art of Multiprocessor Programming 76 synchronized int combine(int combined) { while (locked) wait(); locked = true; firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; default: … } } Combining Phase Node Remember my (1 st thread) contribution
Combining Phase Node
Art of Multiprocessor Programming 77 synchronized int combine(int combined) { while (locked) wait(); locked = true; firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; default: … } } Combining Phase Node Check status
Combining Phase Node
Art of Multiprocessor Programming 78 synchronized int combine(int combined) { while (locked) wait(); locked = true; firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; default: … } } Combining Phase Node I (1 st thread) am alone
Combining Node
Art of Multiprocessor Programming 79 synchronized int combine(int combined) { while (locked) wait(); locked = true; firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; default: … } } Combining Node Not alone: combine with 2 nd thread
Operation Phase
Art of Multiprocessor Programming 80 Operation Phase 5 +3 +2 +5 Add combined value to root, start back down zzz
Operation Phase (reloaded)
Art of Multiprocessor Programming 81 Operation Phase (reloaded) 5 Leave value to be combined … SECOND 2
Operation Phase (reloaded)
Art of Multiprocessor Programming 82 Operation Phase (reloaded) 5 +2 Unlock, and wait SECOND 2 zzz
Operation Phase Navigation
Art of Multiprocessor Programming 83 Operation Phase Navigation prior = stop.op(combined);
Operation Phase Navigation
Art of Multiprocessor Programming 84 Operation Phase Navigation prior = stop.op(combined); The node where we stopped. Provide collected sum and wait for combining result
Operation on Stopped Node
Art of Multiprocessor Programming 85 Operation on Stopped Node synchronized int op(int combined) { switch (cStatus) { case ROOT: int prior = result; result += combined; return prior; case SECOND: secondValue = combined; locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); locked = false; notifyAll(); cStatus = CStatus.IDLE; return result; default : …
Op States of Stop Node
Art of Multiprocessor Programming 86 synchronized int op(int combined) { switch (cStatus) { case ROOT: int prior = result; result += combined; return prior; case SECOND: secondValue = combined; locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); locked = false; notifyAll(); cStatus = CStatus.IDLE; return result; default: … Op States of Stop Node Only ROOT and SECOND possible. Why?
At Root
Art of Multiprocessor Programming 87 synchronized int op(int combined) { switch (cStatus) { case ROOT: int prior = result; result += combined; return prior; case SECOND: secondValue = combined; locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); locked = false; notifyAll(); cStatus = CStatus.IDLE; return result; default: … At Root Add sum to root, return prior value
Intermediate Node
Art of Multiprocessor Programming 88 synchronized int op(int combined) { switch (cStatus) { case ROOT: int prior = result; result += combined; return prior; case SECOND: secondValue = combined; locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); locked = false; notifyAll(); cStatus = CStatus.IDLE; return result; default: … Intermediate Node Deposit value for later combining …
Intermediate Node
Art of Multiprocessor Programming 89 synchronized int op(int combined) { switch (cStatus) { case ROOT: int prior = result; result += combined; return prior; case SECOND: secondValue = combined; locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); locked = false; notifyAll(); cStatus = CStatus.IDLE; return result; default: … Intermediate Node Unlock node (locked in precombining ), then notify 1 st thread
Intermediate Node
Art of Multiprocessor Programming 90 synchronized int op(int combined) { switch (cStatus) { case ROOT: int prior = result; result += combined; return prior; case SECOND: secondValue = combined; locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); locked = false; notifyAll(); cStatus = CStatus.IDLE; return result; default: … Intermediate Node Wait for 1 st thread to deliver results
Intermediate Node
Art of Multiprocessor Programming 91 synchronized int op(int combined) { switch (cStatus) { case ROOT: int prior = result; result += combined; return prior; case SECOND: secondValue = combined; locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); locked = false; notifyAll(); cStatus = CStatus.IDLE; return result; default: … Intermediate Node Unlock node (locked by 1 st thread in combining phase) & return
Distribution Phase
Art of Multiprocessor Programming 92 Distribution Phase 5 0