2 * Written by Doug Lea, Bill Scherer, and Michael Scott with
3 * assistance from members of JCP JSR-166 Expert Group and released to
4 * the public domain, as explained at
5 * http://creativecommons.org/licenses/publicdomain
8 package java.util.concurrent;
9 import java.util.concurrent.atomic.*;
10 import java.util.concurrent.locks.LockSupport;
13 * A synchronization point at which threads can pair and swap elements
14 * within pairs. Each thread presents some object on entry to the
15 * {@link #exchange exchange} method, matches with a partner thread,
16 * and receives its partner's object on return. An Exchanger may be
17 * viewed as a bidirectional form of a {@link SynchronousQueue}.
18 * Exchangers may be useful in applications such as genetic algorithms
19 * and pipeline designs.
21 * <p><b>Sample Usage:</b>
22 * Here are the highlights of a class that uses an {@code Exchanger}
23 * to swap buffers between threads so that the thread filling the
24 * buffer gets a freshly emptied one when it needs it, handing off the
25 * filled one to the thread emptying the buffer.
27 * class FillAndEmpty {
28 * Exchanger<DataBuffer> exchanger = new Exchanger<DataBuffer>();
29 * DataBuffer initialEmptyBuffer = ... a made-up type
30 * DataBuffer initialFullBuffer = ...
32 * class FillingLoop implements Runnable {
34 * DataBuffer currentBuffer = initialEmptyBuffer;
36 * while (currentBuffer != null) {
37 * addToBuffer(currentBuffer);
38 * if (currentBuffer.isFull())
39 * currentBuffer = exchanger.exchange(currentBuffer);
41 * } catch (InterruptedException ex) { ... handle ... }
45 * class EmptyingLoop implements Runnable {
47 * DataBuffer currentBuffer = initialFullBuffer;
49 * while (currentBuffer != null) {
50 * takeFromBuffer(currentBuffer);
51 * if (currentBuffer.isEmpty())
52 * currentBuffer = exchanger.exchange(currentBuffer);
54 * } catch (InterruptedException ex) { ... handle ...}
59 * new Thread(new FillingLoop()).start();
60 * new Thread(new EmptyingLoop()).start();
65 * <p>Memory consistency effects: For each pair of threads that
66 * successfully exchange objects via an {@code Exchanger}, actions
67 * prior to the {@code exchange()} in each thread
68 * <a href="package-summary.html#MemoryVisibility"><i>happen-before</i></a>
69 * those subsequent to a return from the corresponding {@code exchange()}
70 * in the other thread.
73 * @author Doug Lea and Bill Scherer and Michael Scott
74 * @param <V> The type of objects that may be exchanged
76 public class Exchanger<V> {
78 * Algorithm Description:
80 * The basic idea is to maintain a "slot", which is a reference to
81 * a Node containing both an Item to offer and a "hole" waiting to
82 * get filled in. If an incoming "occupying" thread sees that the
83 * slot is null, it CAS'es (compareAndSets) a Node there and waits
84 * for another to invoke exchange. That second "fulfilling" thread
85 * sees that the slot is non-null, and so CASes it back to null,
86 * also exchanging items by CASing the hole, plus waking up the
87 * occupying thread if it is blocked. In each case CAS'es may
88 * fail because a slot at first appears non-null but is null upon
89 * CAS, or vice-versa. So threads may need to retry these
92 * This simple approach works great when there are only a few
93 * threads using an Exchanger, but performance rapidly
94 * deteriorates due to CAS contention on the single slot when
95 * there are lots of threads using an exchanger. So instead we use
96 * an "arena"; basically a kind of hash table with a dynamically
97 * varying number of slots, any one of which can be used by
98 * threads performing an exchange. Incoming threads pick slots
99 * based on a hash of their Thread ids. If an incoming thread
100 * fails to CAS in its chosen slot, it picks an alternative slot
101 * instead. And similarly from there. If a thread successfully
102 * CASes into a slot but no other thread arrives, it tries
103 * another, heading toward the zero slot, which always exists even
104 * if the table shrinks. The particular mechanics controlling this
107 * Waiting: Slot zero is special in that it is the only slot that
108 * exists when there is no contention. A thread occupying slot
109 * zero will block if no thread fulfills it after a short spin.
110 * In other cases, occupying threads eventually give up and try
111 * another slot. Waiting threads spin for a while (a period that
112 * should be a little less than a typical context-switch time)
113 * before either blocking (if slot zero) or giving up (if other
114 * slots) and restarting. There is no reason for threads to block
115 * unless there are unlikely to be any other threads present.
116 * Occupants are mainly avoiding memory contention so sit there
117 * quietly polling for a shorter period than it would take to
118 * block and then unblock them. Non-slot-zero waits that elapse
119 * because of lack of other threads waste around one extra
120 * context-switch time per try, which is still on average much
121 * faster than alternative approaches.
123 * Sizing: Usually, using only a few slots suffices to reduce
124 * contention. Especially with small numbers of threads, using
125 * too many slots can lead to just as poor performance as using
126 * too few of them, and there's not much room for error. The
127 * variable "max" maintains the number of slots actually in
128 * use. It is increased when a thread sees too many CAS
129 * failures. (This is analogous to resizing a regular hash table
130 * based on a target load factor, except here, growth steps are
131 * just one-by-one rather than proportional.) Growth requires
132 * contention failures in each of three tried slots. Requiring
133 * multiple failures for expansion copes with the fact that some
134 * failed CASes are not due to contention but instead to simple
135 * races between two threads or thread pre-emptions occurring
136 * between reading and CASing. Also, very transient peak
137 * contention can be much higher than the average sustainable
138 * levels. The max limit is decreased on average 50% of the times
139 * that a non-slot-zero wait elapses without being fulfilled.
140 * Threads experiencing elapsed waits move closer to zero, so
141 * eventually find existing (or future) threads even if the table
142 * has been shrunk due to inactivity. The chosen mechanics and
143 * thresholds for growing and shrinking are intrinsically
144 * entangled with indexing and hashing inside the exchange code,
145 * and can't be nicely abstracted out.
147 * Hashing: Each thread picks its initial slot to use in accord
148 * with a simple hashcode. The sequence is the same on each
149 * encounter by any given thread, but effectively random across
150 * threads. Using arenas encounters the classic cost vs quality
151 * tradeoffs of all hash tables. Here, we use a one-step FNV-1a
152 * hash code based on the current thread's Thread.getId(), along
153 * with a cheap approximation to a mod operation to select an
154 * index. The downside of optimizing index selection in this way
155 * is that the code is hardwired to use a maximum table size of
156 * 32. But this value more than suffices for known platforms and
159 * Probing: On sensed contention of a selected slot, we probe
160 * sequentially through the table, analogously to linear probing
161 * after collision in a hash table. (We move circularly, in
162 * reverse order, to mesh best with table growth and shrinkage
163 * rules.) Except that to minimize the effects of false-alarms
164 * and cache thrashing, we try the first selected slot twice
167 * Padding: Even with contention management, slots are heavily
168 * contended, so use cache-padding to avoid poor memory
169 * performance. Because of this, slots are lazily constructed
170 * only when used, to avoid wasting this space unnecessarily.
171 * While isolation of locations is not much of an issue at first
172 * in an application, as time goes on and garbage-collectors
173 * perform compaction, slots are very likely to be moved adjacent
174 * to each other, which can cause much thrashing of cache lines on
175 * MPs unless padding is employed.
177 * This is an improvement of the algorithm described in the paper
178 * "A Scalable Elimination-based Exchange Channel" by William
179 * Scherer, Doug Lea, and Michael Scott in Proceedings of SCOOL05
180 * workshop. Available at: http://hdl.handle.net/1802/2104
183 /** The number of CPUs, for sizing and spin control */
184 private static final int NCPU = Runtime.getRuntime().availableProcessors();
187 * The capacity of the arena. Set to a value that provides more
188 * than enough space to handle contention. On small machines
189 * most slots won't be used, but it is still not wasted because
190 * the extra space provides some machine-level address padding
191 * to minimize interference with heavily CAS'ed Slot locations.
192 * And on very large machines, performance eventually becomes
193 * bounded by memory bandwidth, not numbers of threads/CPUs.
194 * This constant cannot be changed without also modifying
195 * indexing and hashing algorithms.
197 private static final int CAPACITY = 32;
200 * The value of "max" that will hold all threads without
201 * contention. When this value is less than CAPACITY, some
202 * otherwise wasted expansion can be avoided.
204 private static final int FULL =
205 Math.max(0, Math.min(CAPACITY, NCPU / 2) - 1);
208 * The number of times to spin (doing nothing except polling a
209 * memory location) before blocking or giving up while waiting to
210 * be fulfilled. Should be zero on uniprocessors. On
211 * multiprocessors, this value should be large enough so that two
212 * threads exchanging items as fast as possible block only when
213 * one of them is stalled (due to GC or preemption), but not much
214 * longer, to avoid wasting CPU resources. Seen differently, this
215 * value is a little over half the number of cycles of an average
216 * context switch time on most systems. The value here is
217 * approximately the average of those across a range of tested
220 private static final int SPINS = (NCPU == 1) ? 0 : 2000;
223 * The number of times to spin before blocking in timed waits.
224 * Timed waits spin more slowly because checking the time takes
225 * time. The best value relies mainly on the relative rate of
226 * System.nanoTime vs memory accesses. The value is empirically
227 * derived to work well across a variety of systems.
229 private static final int TIMED_SPINS = SPINS / 20;
232 * Sentinel item representing cancellation of a wait due to
233 * interruption, timeout, or elapsed spin-waits. This value is
234 * placed in holes on cancellation, and used as a return value
235 * from waiting methods to indicate failure to set or get hole.
237 private static final Object CANCEL = new Object();
240 * Value representing null arguments/returns from public
241 * methods. This disambiguates from internal requirement that
242 * holes start out as null to mean they are not yet set.
244 private static final Object NULL_ITEM = new Object();
247 * Nodes hold partially exchanged data. This class
248 * opportunistically subclasses AtomicReference to represent the
249 * hole. So get() returns hole, and compareAndSet CAS'es value
250 * into hole. This class cannot be parameterized as "V" because
251 * of the use of non-V CANCEL sentinels.
253 private static final class Node extends AtomicReference<Object> {
254 /** The element offered by the Thread creating this node. */
255 public final Object item;
257 /** The Thread waiting to be signalled; null until waiting. */
258 public volatile Thread waiter;
261 * Creates node with given item and empty hole.
262 * @param item the item
264 public Node(Object item) {
270 * A Slot is an AtomicReference with heuristic padding to lessen
271 * cache effects of this heavily CAS'ed location. While the
272 * padding adds noticeable space, all slots are created only on
273 * demand, and there will be more than one of them only when it
274 * would improve throughput more than enough to outweigh using
277 private static final class Slot extends AtomicReference<Object> {
278 // Improve likelihood of isolation on <= 64 byte cache lines
279 long q0, q1, q2, q3, q4, q5, q6, q7, q8, q9, qa, qb, qc, qd, qe;
283 * Slot array. Elements are lazily initialized when needed.
284 * Declared volatile to enable double-checked lazy construction.
286 private volatile Slot[] arena = new Slot[CAPACITY];
289 * The maximum slot index being used. The value sometimes
290 * increases when a thread experiences too many CAS contentions,
291 * and sometimes decreases when a spin-wait elapses. Changes
292 * are performed only via compareAndSet, to avoid stale values
293 * when a thread happens to stall right before setting.
295 private final AtomicInteger max = new AtomicInteger();
298 * Main exchange function, handling the different policy variants.
299 * Uses Object, not "V" as argument and return value to simplify
300 * handling of sentinel values. Callers from public methods decode
301 * and cast accordingly.
303 * @param item the (non-null) item to exchange
304 * @param timed true if the wait is timed
305 * @param nanos if timed, the maximum wait time
306 * @return the other thread's item, or CANCEL if interrupted or timed out
308 private Object doExchange(Object item, boolean timed, long nanos) {
309 Node me = new Node(item); // Create in case occupying
310 int index = hashIndex(); // Index of current slot
311 int fails = 0; // Number of CAS failures
314 Object y; // Contents of current slot
315 Slot slot = arena[index];
316 if (slot == null) // Lazily initialize slots
317 createSlot(index); // Continue loop to reread
318 else if ((y = slot.get()) != null && // Try to fulfill
319 slot.compareAndSet(y, null)) {
320 Node you = (Node)y; // Transfer item
321 if (you.compareAndSet(null, item)) {
322 LockSupport.unpark(you.waiter);
324 } // Else cancelled; continue
326 else if (y == null && // Try to occupy
327 slot.compareAndSet(null, me)) {
328 if (index == 0) // Blocking wait for slot 0
329 return timed? awaitNanos(me, slot, nanos): await(me, slot);
330 Object v = spinWait(me, slot); // Spin wait for non-0
333 me = new Node(item); // Throw away cancelled node
335 if (m > (index >>>= 1)) // Decrease index
336 max.compareAndSet(m, m - 1); // Maybe shrink table
338 else if (++fails > 1) { // Allow 2 fails on 1st slot
340 if (fails > 3 && m < FULL && max.compareAndSet(m, m + 1))
341 index = m + 1; // Grow on 3rd failed slot
342 else if (--index < 0)
343 index = m; // Circularly traverse
349 * Returns a hash index for the current thread. Uses a one-step
350 * FNV-1a hash code (http://www.isthe.com/chongo/tech/comp/fnv/)
351 * based on the current thread's Thread.getId(). These hash codes
352 * have more uniform distribution properties with respect to small
353 * moduli (here 1-31) than do other simple hashing functions.
355 * <p>To return an index between 0 and max, we use a cheap
356 * approximation to a mod operation, that also corrects for bias
357 * due to non-power-of-2 remaindering (see {@link
358 * java.util.Random#nextInt}). Bits of the hashcode are masked
359 * with "nbits", the ceiling power of two of table size (looked up
360 * in a table packed into three ints). If too large, this is
361 * retried after rotating the hash by nbits bits, while forcing new
362 * top bit to 0, which guarantees eventual termination (although
363 * with a non-random-bias). This requires an average of less than
364 * 2 tries for all table sizes, and has a maximum 2% difference
365 * from perfectly uniform slot probabilities when applied to all
366 * possible hash codes for sizes less than 32.
368 * @return a per-thread-random index, 0 <= index < max
370 private final int hashIndex() {
371 long id = Thread.currentThread().getId();
372 int hash = (((int)(id ^ (id >>> 32))) ^ 0x811c9dc5) * 0x01000193;
375 int nbits = (((0xfffffc00 >> m) & 4) | // Compute ceil(log2(m+1))
376 ((0x000001f8 >>> m) & 2) | // The constants hold
377 ((0xffff00f2 >>> m) & 1)); // a lookup table
379 while ((index = hash & ((1 << nbits) - 1)) > m) // May retry on
380 hash = (hash >>> nbits) | (hash << (33 - nbits)); // non-power-2 m
385 * Creates a new slot at given index. Called only when the slot
386 * appears to be null. Relies on double-check using builtin
387 * locks, since they rarely contend. This in turn relies on the
388 * arena array being declared volatile.
390 * @param index the index to add slot at
392 private void createSlot(int index) {
393 // Create slot outside of lock to narrow sync region
394 Slot newSlot = new Slot();
397 if (a[index] == null)
403 * Tries to cancel a wait for the given node waiting in the given
404 * slot, if so, helping clear the node from its slot to avoid
407 * @param node the waiting node
408 * @param the slot it is waiting in
409 * @return true if successfully cancelled
411 private static boolean tryCancel(Node node, Slot slot) {
412 if (!node.compareAndSet(null, CANCEL))
414 if (slot.get() == node) // pre-check to minimize contention
415 slot.compareAndSet(node, null);
419 // Three forms of waiting. Each just different enough not to merge
423 * Spin-waits for hole for a non-0 slot. Fails if spin elapses
424 * before hole filled. Does not check interrupt, relying on check
425 * in public exchange method to abort if interrupted on entry.
427 * @param node the waiting node
428 * @return on success, the hole; on failure, CANCEL
430 private static Object spinWait(Node node, Slot slot) {
433 Object v = node.get();
439 tryCancel(node, slot);
444 * Waits for (by spinning and/or blocking) and gets the hole
445 * filled in by another thread. Fails if interrupted before
448 * When a node/thread is about to block, it sets its waiter field
449 * and then rechecks state at least one more time before actually
450 * parking, thus covering race vs fulfiller noticing that waiter
451 * is non-null so should be woken.
453 * Thread interruption status is checked only surrounding calls to
454 * park. The caller is assumed to have checked interrupt status
457 * @param node the waiting node
458 * @return on success, the hole; on failure, CANCEL
460 private static Object await(Node node, Slot slot) {
461 Thread w = Thread.currentThread();
464 Object v = node.get();
467 else if (spins > 0) // Spin-wait phase
469 else if (node.waiter == null) // Set up to block next
471 else if (w.isInterrupted()) // Abort on interrupt
472 tryCancel(node, slot);
474 LockSupport.park(node);
479 * Waits for (at index 0) and gets the hole filled in by another
480 * thread. Fails if timed out or interrupted before hole filled.
481 * Same basic logic as untimed version, but a bit messier.
483 * @param node the waiting node
484 * @param nanos the wait time
485 * @return on success, the hole; on failure, CANCEL
487 private Object awaitNanos(Node node, Slot slot, long nanos) {
488 int spins = TIMED_SPINS;
492 Object v = node.get();
495 long now = System.nanoTime();
497 w = Thread.currentThread();
499 nanos -= now - lastTime;
504 else if (node.waiter == null)
506 else if (w.isInterrupted())
507 tryCancel(node, slot);
509 LockSupport.parkNanos(node, nanos);
511 else if (tryCancel(node, slot) && !w.isInterrupted())
512 return scanOnTimeout(node);
517 * Sweeps through arena checking for any waiting threads. Called
518 * only upon return from timeout while waiting in slot 0. When a
519 * thread gives up on a timed wait, it is possible that a
520 * previously-entered thread is still waiting in some other
521 * slot. So we scan to check for any. This is almost always
522 * overkill, but decreases the likelihood of timeouts when there
523 * are other threads present to far less than that in lock-based
524 * exchangers in which earlier-arriving threads may still be
525 * waiting on entry locks.
527 * @param node the waiting node
528 * @return another thread's item, or CANCEL
530 private Object scanOnTimeout(Node node) {
532 for (int j = arena.length - 1; j >= 0; --j) {
533 Slot slot = arena[j];
535 while ((y = slot.get()) != null) {
536 if (slot.compareAndSet(y, null)) {
538 if (you.compareAndSet(null, node.item)) {
539 LockSupport.unpark(you.waiter);
550 * Creates a new Exchanger.
556 * Waits for another thread to arrive at this exchange point (unless
557 * the current thread is {@linkplain Thread#interrupt interrupted}),
558 * and then transfers the given object to it, receiving its object
561 * <p>If another thread is already waiting at the exchange point then
562 * it is resumed for thread scheduling purposes and receives the object
563 * passed in by the current thread. The current thread returns immediately,
564 * receiving the object passed to the exchange by that other thread.
566 * <p>If no other thread is already waiting at the exchange then the
567 * current thread is disabled for thread scheduling purposes and lies
568 * dormant until one of two things happens:
570 * <li>Some other thread enters the exchange; or
571 * <li>Some other thread {@linkplain Thread#interrupt interrupts} the current
574 * <p>If the current thread:
576 * <li>has its interrupted status set on entry to this method; or
577 * <li>is {@linkplain Thread#interrupt interrupted} while waiting
580 * then {@link InterruptedException} is thrown and the current thread's
581 * interrupted status is cleared.
583 * @param x the object to exchange
584 * @return the object provided by the other thread
585 * @throws InterruptedException if the current thread was
586 * interrupted while waiting
588 public V exchange(V x) throws InterruptedException {
589 if (!Thread.interrupted()) {
590 Object v = doExchange(x == null? NULL_ITEM : x, false, 0);
595 Thread.interrupted(); // Clear interrupt status on IE throw
597 throw new InterruptedException();
601 * Waits for another thread to arrive at this exchange point (unless
602 * the current thread is {@linkplain Thread#interrupt interrupted} or
603 * the specified waiting time elapses), and then transfers the given
604 * object to it, receiving its object in return.
606 * <p>If another thread is already waiting at the exchange point then
607 * it is resumed for thread scheduling purposes and receives the object
608 * passed in by the current thread. The current thread returns immediately,
609 * receiving the object passed to the exchange by that other thread.
611 * <p>If no other thread is already waiting at the exchange then the
612 * current thread is disabled for thread scheduling purposes and lies
613 * dormant until one of three things happens:
615 * <li>Some other thread enters the exchange; or
616 * <li>Some other thread {@linkplain Thread#interrupt interrupts}
617 * the current thread; or
618 * <li>The specified waiting time elapses.
620 * <p>If the current thread:
622 * <li>has its interrupted status set on entry to this method; or
623 * <li>is {@linkplain Thread#interrupt interrupted} while waiting
626 * then {@link InterruptedException} is thrown and the current thread's
627 * interrupted status is cleared.
629 * <p>If the specified waiting time elapses then {@link
630 * TimeoutException} is thrown. If the time is less than or equal
631 * to zero, the method will not wait at all.
633 * @param x the object to exchange
634 * @param timeout the maximum time to wait
635 * @param unit the time unit of the <tt>timeout</tt> argument
636 * @return the object provided by the other thread
637 * @throws InterruptedException if the current thread was
638 * interrupted while waiting
639 * @throws TimeoutException if the specified waiting time elapses
640 * before another thread enters the exchange
642 public V exchange(V x, long timeout, TimeUnit unit)
643 throws InterruptedException, TimeoutException {
644 if (!Thread.interrupted()) {
645 Object v = doExchange(x == null? NULL_ITEM : x,
646 true, unit.toNanos(timeout));
651 if (!Thread.interrupted())
652 throw new TimeoutException();
654 throw new InterruptedException();