[PartitionAlloc] Exponential backoff in SpinningFutex.

The latency of "pause" is unusually high on the Skylake Client architecture. This is likely not affecting us though, as the spinning loop is short. Add a comment noting this (with a link to the Intel optimization manual), and follow the best practice highlighted in the manual, namely exponential backoff with "pause". Bug: 1125999 Change-Id: Ia233aa5ae4b9cd58b24897537861b60c7d4ca05a Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2465745Reviewed-by: Egor Pasko <pasko@chromium.org> Commit-Queue: Benoit L <lizeb@chromium.org> Cr-Commit-Position: refs/heads/master@{#816665}

[PartitionAlloc] Exponential backoff in SpinningFutex.
The latency of "pause" is unusually high on the Skylake Client architecture. This is likely not affecting us though, as the spinning loop is short. Add a comment noting this (with a link to the Intel optimization manual), and follow the best practice highlighted in the manual, namely exponential backoff with "pause". Bug: 1125999 Change-Id: Ia233aa5ae4b9cd58b24897537861b60c7d4ca05a Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2465745Reviewed-by: Egor Pasko <pasko@chromium.org> Commit-Queue: Benoit L <lizeb@chromium.org> Cr-Commit-Position: refs/heads/master@{#816665}
fb2fea16 · Benoit Lize · Commit Bot · c2d62d36 · fb2fea16
Commit fb2fea16 authored Oct 13, 2020 by Benoit Lize Committed by Commit Bot Oct 13, 2020
Hide whitespace changes
Inline Side-by-side

Showing with 18 additions and 2 deletions

base/allocator/partition_allocator/spinning_futex_linux.h base/allocator/partition_allocator/spinning_futex_linux.h +18 -2

No files found.
--- a/base/allocator/partition_allocator/spinning_futex_linux.h
+++ b/base/allocator/partition_allocator/spinning_futex_linux.h
@@ -5,6 +5,7 @@
 #ifndef BASE_ALLOCATOR_PARTITION_ALLOCATOR_SPINNING_FUTEX_LINUX_H_
 #define BASE_ALLOCATOR_PARTITION_ALLOCATOR_SPINNING_FUTEX_LINUX_H_
+#include <algorithm>
 #include <atomic>
 #include "base/allocator/partition_allocator/yield_processor.h"
@@ -65,13 +66,28 @@ class BASE_EXPORT SpinningFutex {
 ALWAYS_INLINE void SpinningFutex::Acquire() {
  int tries = 0;
+  int backoff = 1;
  // Busy-waiting is inlined, which is fine as long as we have few callers. This
  // is only used for the partition lock, so this is the case.
  do {
    if (LIKELY(Try()))
      return;
-    YIELD_PROCESSOR;
+    // Note: Per the intel optimization manual
-    tries++;
+    // (https://software.intel.com/content/dam/develop/public/us/en/documents/64-ia-32-architectures-optimization-manual.pdf),
+    // the "pause" instruction is more costly on Skylake Client than on previous
+    // (and subsequent?) architectures. The latency is found to be 141 cycles
+    // there. This is not a big issue here as we don't spin long enough for this
+    // to become a problem, as we spend a maximum of ~141k cycles ~= 47us at
+    // 3GHz in "pause".
+    //
+    // Also, loop several times here, following the guidelines in section 2.3.4
+    // of the manual, "Pause latency in Skylake Client Microarchitecture".
+    for (int yields = 0; yields < backoff; yields++) {
+      YIELD_PROCESSOR;
+      tries++;
+    }
+    constexpr int kMaxBackoff = 64;
+    backoff = std::min(kMaxBackoff, backoff << 1);
  } while (tries < kSpinCount);
  LockSlow();