Reland "base/allocator: Add a thread cache to PartitionAlloc."

This reverts commit 91b2c272. Reason for reland: Don't use `thread_local` Original change's description: > base/allocator: Add a thread cache to PartitionAlloc. > > This CL adds a thread cache to PartitionAlloc. It is optional, only > applies to thread-safe partitions, and uses the same freelist encoding > and bucketing as the main allocator. > > The thread cache is added "in the middle" of the main allocator, that is: > - After all the cookie/tag management > - Before the "raw" allocator. > > That is, the general allocation flow is: > 1. Adjustment of requested size to make room for tags / cookies > 2. Allocation: > a. Call to the thread cache, if it succeeds, return. > b. Otherwise, call the "raw" allocator <-- Locking > 3. Handle cookies/tags, zero allocation if required > > On the deallocation side, the process is reversed: > 1. Check cookies / tags, adjust the pointer > 2. Deallocation > a. Return to the thread cache of possible. If it succeeds, return. > b. Otherwise, call the "raw" allocator <-- Locking > > The thread cache maintains an array of buckets, the same as the parent > allocator. A single thread cache instance is only used by a single > partition. Each bucket is a linked list of allocations, capped to a set > maximum size. Elements in this "freelist" are encoded the same way they > are for the main allocator. > Only the smallest buckets are eligible for caching, to reduce the > memory impact. > > There are several limitations: > - Only a single partition is allowed to have a thread cache > - No periodic purging of thread caches is done > - No statistics are collected > > The last two limitations will be addressed in subsequent CLs. Regarding > the first one, it is not possible to use Chrome's native thread local > storage support, as it allocates. It is also desirable to use > thread_local to improve performance. > > Bug: 998048 > Change-Id: Ia771f507d9dd1c2c26a4668c76da220fb0c65dd4 > Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2375206 > Commit-Queue: Benoit L <lizeb@chromium.org> > Reviewed-by: Kentaro Hara <haraken@chromium.org> > Cr-Commit-Position: refs/heads/master@{#805697} Bug: 998048 Change-Id: If7fa5c2e1e10bc7dd1d41cdb188840668aad888f Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2410126Reviewed-by: Kentaro Hara <haraken@chromium.org> Commit-Queue: Benoit L <lizeb@chromium.org> Cr-Commit-Position: refs/heads/master@{#807852}

Reland "base/allocator: Add a thread cache to PartitionAlloc."
This reverts commit 91b2c272. Reason for reland: Don't use `thread_local` Original change's description: > base/allocator: Add a thread cache to PartitionAlloc. > > This CL adds a thread cache to PartitionAlloc. It is optional, only > applies to thread-safe partitions, and uses the same freelist encoding > and bucketing as the main allocator. > > The thread cache is added "in the middle" of the main allocator, that is: > - After all the cookie/tag management > - Before the "raw" allocator. > > That is, the general allocation flow is: > 1. Adjustment of requested size to make room for tags / cookies > 2. Allocation: > a. Call to the thread cache, if it succeeds, return. > b. Otherwise, call the "raw" allocator <-- Locking > 3. Handle cookies/tags, zero allocation if required > > On the deallocation side, the process is reversed: > 1. Check cookies / tags, adjust the pointer > 2. Deallocation > a. Return to the thread cache of possible. If it succeeds, return. > b. Otherwise, call the "raw" allocator <-- Locking > > The thread cache maintains an array of buckets, the same as the parent > allocator. A single thread cache instance is only used by a single > partition. Each bucket is a linked list of allocations, capped to a set > maximum size. Elements in this "freelist" are encoded the same way they > are for the main allocator. > Only the smallest buckets are eligible for caching, to reduce the > memory impact. > > There are several limitations: > - Only a single partition is allowed to have a thread cache > - No periodic purging of thread caches is done > - No statistics are collected > > The last two limitations will be addressed in subsequent CLs. Regarding > the first one, it is not possible to use Chrome's native thread local > storage support, as it allocates. It is also desirable to use > thread_local to improve performance. > > Bug: 998048 > Change-Id: Ia771f507d9dd1c2c26a4668c76da220fb0c65dd4 > Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2375206 > Commit-Queue: Benoit L <lizeb@chromium.org> > Reviewed-by: Kentaro Hara <haraken@chromium.org> > Cr-Commit-Position: refs/heads/master@{#805697} Bug: 998048 Change-Id: If7fa5c2e1e10bc7dd1d41cdb188840668aad888f Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2410126Reviewed-by: Kentaro Hara <haraken@chromium.org> Commit-Queue: Benoit L <lizeb@chromium.org> Cr-Commit-Position: refs/heads/master@{#807852}
8f9fa594 · Benoit Lize · Commit Bot · e2891284 · 8f9fa594 · 8f9fa594
Commit 8f9fa594 authored Sep 17, 2020 by Benoit Lize Committed by Commit Bot Sep 17, 2020
9 changed files
--- a/base/BUILD.gn
+++ b/base/BUILD.gn
@@ -1789,8 +1789,11 @@ component("base") {
        "allocator/partition_allocator/partition_ref_count.h",
        "allocator/partition_allocator/partition_tag.h",
        "allocator/partition_allocator/partition_tag_bitmap.h",
+        "allocator/partition_allocator/partition_tls.h",
        "allocator/partition_allocator/random.cc",
        "allocator/partition_allocator/random.h",
+        "allocator/partition_allocator/thread_cache.cc",
+        "allocator/partition_allocator/thread_cache.h",
      ]
      if (is_win) {
        sources +=
@@ -3214,6 +3217,7 @@ test("base_unittests") {
      "allocator/partition_allocator/memory_reclaimer_unittest.cc",
      "allocator/partition_allocator/page_allocator_unittest.cc",
      "allocator/partition_allocator/partition_alloc_unittest.cc",
+      "allocator/partition_allocator/thread_cache_unittest.cc",
    ]
  }

--- a/base/allocator/allocator_shim_default_dispatch_to_partition_alloc.cc
+++ b/base/allocator/allocator_shim_default_dispatch_to_partition_alloc.cc
@@ -5,6 +5,7 @@
 #include "base/allocator/allocator_shim.h"
 #include "base/allocator/allocator_shim_internals.h"
 #include "base/allocator/partition_allocator/partition_alloc.h"
+#include "base/allocator/partition_allocator/partition_alloc_constants.h"
 #include "base/bits.h"
 #include "base/no_destructor.h"
 #include "build/build_config.h"
@@ -74,8 +75,8 @@ base::ThreadSafePartitionRoot& Allocator() {
    return *root;
  }
-  auto* new_root = new (g_allocator_buffer)
+  auto* new_root = new (g_allocator_buffer) base::ThreadSafePartitionRoot(
-      base::ThreadSafePartitionRoot(false /* enforce_alignment */);
+      false /* enforce_alignment */, true /* enable_thread_cache */);
  g_root_.store(new_root, std::memory_order_release);
  // Semantically equivalent to base::Lock::Release().
@@ -100,8 +101,9 @@ void* PartitionMemalign(const AllocatorDispatch*,
                        size_t alignment,
                        size_t size,
                        void* context) {
+  // Since the general-purpose allocator uses the thread cache, this one cannot.
  static base::NoDestructor<base::ThreadSafePartitionRoot> aligned_allocator{
-      true /* enforce_alignment */};
+      true /* enforce_alignment */, false /* enable_thread_cache */};
  return aligned_allocator->AlignedAllocFlags(base::PartitionAllocNoHooks,
                                              alignment, size);
 }

--- a/base/allocator/partition_allocator/partition_alloc.cc
+++ b/base/allocator/partition_allocator/partition_alloc.cc
@@ -202,7 +202,8 @@ void PartitionAllocGlobalUninitForTesting() {
 }
 template <bool thread_safe>
-void PartitionRoot<thread_safe>::Init(bool enforce_alignment) {
+void PartitionRoot<thread_safe>::Init(bool enforce_alignment,
+                                      bool enable_thread_cache) {
  ScopedGuard guard{lock_};
  if (initialized)
    return;
@@ -216,6 +217,15 @@ void PartitionRoot<thread_safe>::Init(bool enforce_alignment) {
  // If alignment needs to be enforced, disallow adding cookies and/or tags at
  // the beginning of the slot.
  allow_extras = !enforce_alignment;
+#if !defined(OS_LINUX)
+  // Linux only, for now.
+  with_thread_cache = false;
+#else
+  with_thread_cache = enable_thread_cache;
+  if (with_thread_cache)
+    internal::ThreadCache::Init(this);
+#endif
  // We mark the sentinel bucket/page as free to make sure it is skipped by our
  // logic to find a new active page.
@@ -281,6 +291,9 @@ void PartitionRoot<thread_safe>::Init(bool enforce_alignment) {
  initialized = true;
 }
+template <bool thread_safe>
+PartitionRoot<thread_safe>::~PartitionRoot() = default;
 template <bool thread_safe>
 bool PartitionRoot<thread_safe>::ReallocDirectMappedInPlace(
    internal::PartitionPage<thread_safe>* page,
@@ -620,6 +633,10 @@ void PartitionRoot<thread_safe>::PurgeMemory(int flags) {
        PartitionPurgeBucket(bucket);
    }
  }
+  // Purges only this thread's cache.
+  if (with_thread_cache && internal::ThreadCache::Get())
+    internal::ThreadCache::Get()->Purge();
 }
 template <bool thread_safe>
@@ -807,7 +824,8 @@ void PartitionAllocator<thread_safe>::init(
    PartitionAllocatorAlignment alignment) {
  partition_root_.Init(
      alignment ==
-      PartitionAllocatorAlignment::kAlignedAlloc /* enforce_alignment */);
+          PartitionAllocatorAlignment::kAlignedAlloc /* enforce_alignment */,
+      false);
  PartitionAllocMemoryReclaimer::Instance()->RegisterPartition(
      &partition_root_);
 }

--- a/base/allocator/partition_allocator/partition_alloc.h
+++ b/base/allocator/partition_allocator/partition_alloc.h
--- a/base/allocator/partition_allocator/partition_alloc_perftest.cc
+++ b/base/allocator/partition_allocator/partition_alloc_perftest.cc
@@ -2,7 +2,10 @@
 // Use of this source code is governed by a BSD-style license that can be
 // found in the LICENSE file.
+#include <algorithm>
 #include <atomic>
+#include <limits>
+#include <memory>
 #include <vector>
 #include "base/allocator/partition_allocator/partition_alloc.h"
@@ -30,7 +33,7 @@ namespace {
 // Change kTimeLimit to something higher if you need more time to capture a
 // trace.
 constexpr base::TimeDelta kTimeLimit = base::TimeDelta::FromSeconds(2);
-constexpr int kWarmupRuns = 5;
+constexpr int kWarmupRuns = 10000;
 constexpr int kTimeCheckInterval = 100000;
 // Size constants are mostly arbitrary, but try to simulate something like CSS
@@ -78,12 +81,10 @@ class PartitionAllocator : public Allocator {
  void* Alloc(size_t size) override {
    return alloc_.AllocFlagsNoHooks(0, size);
  }
-  void Free(void* data) override {
+  void Free(void* data) override { ThreadSafePartitionRoot::FreeNoHooks(data); }
-    base::ThreadSafePartitionRoot::FreeNoHooks(data);
-  }
 private:
-  base::ThreadSafePartitionRoot alloc_{false};
+  ThreadSafePartitionRoot alloc_{false, false};
 };
 class TestLoopThread : public PlatformThread::Delegate {

--- a/base/allocator/partition_allocator/partition_tls.h
+++ b/base/allocator/partition_allocator/partition_tls.h
+// Copyright 2020 The Chromium Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style license that can be
+// found in the LICENSE file.
+#ifndef BASE_ALLOCATOR_PARTITION_ALLOCATOR_PARTITION_TLS_H_
+#define BASE_ALLOCATOR_PARTITION_ALLOCATOR_PARTITION_TLS_H_
+#include "base/allocator/partition_allocator/partition_alloc_check.h"
+#include "build/build_config.h"
+#if defined(OS_POSIX)
+#include <pthread.h>
+#endif
+// Barebones TLS implementation for use in PartitionAlloc. This doesn't use the
+// general chromium TLS handling to avoid dependencies, but more importantly
+// because it allocates memory.
+namespace base {
+namespace internal {
+#if defined(OS_POSIX)
+typedef pthread_key_t PartitionTlsKey;
+inline bool PartitionTlsCreate(PartitionTlsKey* key,
+                               void (*destructor)(void*)) {
+  return !pthread_key_create(key, destructor);
+}
+inline void* PartitionTlsGet(PartitionTlsKey key) {
+  return pthread_getspecific(key);
+}
+inline void PartitionTlsSet(PartitionTlsKey key, void* value) {
+  int ret = pthread_setspecific(key, value);
+  PA_DCHECK(!ret);
+}
+#else
+// Not implemented.
+typedef int PartitionTlsKey;
+inline bool PartitionTlsCreate(PartitionTlsKey* key,
+                               void (*destructor)(void*)) {
+  // Cannot use NOIMPLEMENTED() as it may allocate.
+  IMMEDIATE_CRASH();
+}
+inline void* PartitionTlsGet(PartitionTlsKey key) {
+  IMMEDIATE_CRASH();
+}
+inline void PartitionTlsSet(PartitionTlsKey key, void* value) {
+  IMMEDIATE_CRASH();
+}
+#endif  // defined(OS_POSIX)
+}  // namespace internal
+}  // namespace base
+#endif  // BASE_ALLOCATOR_PARTITION_ALLOCATOR_PARTITION_TLS_H_
--- a/base/allocator/partition_allocator/thread_cache.cc
+++ b/base/allocator/partition_allocator/thread_cache.cc
+// Copyright 2020 The Chromium Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style license that can be
+// found in the LICENSE file.
+#include "base/allocator/partition_allocator/thread_cache.h"
+#include <sys/types.h>
+#include <atomic>
+#include <vector>
+#include "base/allocator/partition_allocator/partition_alloc.h"
+namespace base {
+namespace internal {
+BASE_EXPORT PartitionTlsKey g_thread_cache_key;
+namespace {
+void DeleteThreadCache(void* tcache_ptr) {
+  reinterpret_cast<ThreadCache*>(tcache_ptr)->~ThreadCache();
+  PartitionRoot<ThreadSafe>::RawFreeStatic(tcache_ptr);
+}
+// Since |g_thread_cache_key| is shared, make sure that no more than one
+// PartitionRoot can use it.
+static std::atomic<bool> g_has_instance;
+}  // namespace
+// static
+void ThreadCache::Init(PartitionRoot<ThreadSafe>* root) {
+  bool ok = PartitionTlsCreate(&g_thread_cache_key, DeleteThreadCache);
+  PA_CHECK(ok);
+  // Make sure that only one PartitionRoot wants a thread cache.
+  bool expected = false;
+  if (!g_has_instance.compare_exchange_strong(expected, true,
+                                              std::memory_order_seq_cst,
+                                              std::memory_order_seq_cst)) {
+    PA_CHECK(false)
+        << "Only one PartitionRoot is allowed to have a thread cache";
+  }
+}
+// static
+ThreadCache* ThreadCache::Create(PartitionRoot<internal::ThreadSafe>* root) {
+  PA_CHECK(root);
+  // Placement new and RawAlloc() are used, as otherwise when this partition is
+  // the malloc() implementation, the memory allocated for the new thread cache
+  // would make this code reentrant.
+  //
+  // This also means that deallocation must use RawFreeStatic(), hence the
+  // operator delete() implementation below.
+  size_t allocated_size;
+  bool already_zeroed;
+  auto* bucket = root->SizeToBucket(sizeof(ThreadCache));
+  void* buffer =
+      root->RawAlloc(bucket, PartitionAllocZeroFill, sizeof(ThreadCache),
+                     &allocated_size, &already_zeroed);
+  ThreadCache* tcache = new (buffer) ThreadCache();
+  // This may allocate.
+  PartitionTlsSet(g_thread_cache_key, tcache);
+  return tcache;
+}
+ThreadCache::~ThreadCache() {
+  Purge();
+}
+void ThreadCache::Purge() {
+  for (Bucket& bucket : buckets_) {
+    size_t count = bucket.count;
+    while (bucket.freelist_head) {
+      auto* entry = bucket.freelist_head;
+      bucket.freelist_head = EncodedPartitionFreelistEntry::Decode(entry->next);
+      PartitionRoot<ThreadSafe>::RawFreeStatic(entry);
+      count--;
+    }
+    CHECK_EQ(0u, count);
+    bucket.count = 0;
+  }
+}
+}  // namespace internal
+}  // namespace base
--- a/base/allocator/partition_allocator/thread_cache.h
+++ b/base/allocator/partition_allocator/thread_cache.h
+// Copyright 2020 The Chromium Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style license that can be
+// found in the LICENSE file.
+#ifndef BASE_ALLOCATOR_PARTITION_ALLOCATOR_THREAD_CACHE_H_
+#define BASE_ALLOCATOR_PARTITION_ALLOCATOR_THREAD_CACHE_H_
+#include <cstdint>
+#include <memory>
+#include "base/allocator/partition_allocator/partition_alloc_forward.h"
+#include "base/allocator/partition_allocator/partition_cookie.h"
+#include "base/allocator/partition_allocator/partition_freelist_entry.h"
+#include "base/allocator/partition_allocator/partition_tls.h"
+#include "base/base_export.h"
+#include "base/gtest_prod_util.h"
+#include "base/partition_alloc_buildflags.h"
+namespace base {
+namespace internal {
+class ThreadCache;
+extern BASE_EXPORT PartitionTlsKey g_thread_cache_key;
+// Per-thread cache. *Not* threadsafe, must only be accessed from a single
+// thread.
+//
+// In practice, this is easily enforced as long as only |instance| is
+// manipulated, as it is a thread_local member. As such, any
+// |ThreadCache::instance->*()| call will necessarily be done from a single
+// thread.
+class BASE_EXPORT ThreadCache {
+ public:
+  // Initializes the thread cache for |root|. May allocate, so should be called
+  // with the thread cache disabled on the partition side, and without the
+  // partition lock held.
+  //
+  // May only be called by a single PartitionRoot.
+  static void Init(PartitionRoot<ThreadSafe>* root);
+  static void Init(PartitionRoot<NotThreadSafe>* root) { IMMEDIATE_CRASH(); }
+  static ThreadCache* Get() {
+    return reinterpret_cast<ThreadCache*>(PartitionTlsGet(g_thread_cache_key));
+  }
+  // Create a new ThreadCache associated with |root|.
+  // Must be called without the partition locked, as this may allocate.
+  static ThreadCache* Create(PartitionRoot<ThreadSafe>* root);
+  static ThreadCache* Create(PartitionRoot<NotThreadSafe>* root) {
+    IMMEDIATE_CRASH();
+  }
+  ~ThreadCache();
+  // Force placement new.
+  void* operator new(size_t) = delete;
+  void* operator new(size_t, void* buffer) { return buffer; }
+  void operator delete(void* ptr) = delete;
+  ThreadCache(const ThreadCache&) = delete;
+  ThreadCache(const ThreadCache&&) = delete;
+  ThreadCache& operator=(const ThreadCache&) = delete;
+  // Tries to put a memory block at |address| into the cache.
+  // The block comes from the bucket at index |bucket_index| from the partition
+  // this cache is for.
+  //
+  // Returns true if the memory was put in the cache, and false otherwise. This
+  // can happen either because the cache is full or the allocation was too
+  // large.
+  ALWAYS_INLINE bool MaybePutInCache(void* address, size_t bucket_index);
+  // Tries to allocate memory from the cache.
+  // Returns nullptr for failure.
+  //
+  // Has the same behavior as RawAlloc(), that is: no cookie nor tag handling.
+  ALWAYS_INLINE void* GetFromCache(size_t bucket_index);
+  // Empties the cache.
+  void Purge();
+  size_t bucket_count_for_testing(size_t index) const {
+    return buckets_[index].count;
+  }
+ private:
+  ThreadCache() = default;
+  struct Bucket {
+    size_t count;
+    PartitionFreelistEntry* freelist_head;
+  };
+  // TODO(lizeb): Optimize the threshold, and define it as an allocation size
+  // rather than a bucket index.
+  static constexpr size_t kBucketCount = 40;
+  static_assert(
+      kBucketCount < kNumBuckets,
+      "Cannot have more cached buckets than what the allocator supports");
+  // TODO(lizeb): Tune this constant, and adapt it to the bucket size /
+  // allocation patterns.
+  static constexpr size_t kMaxCountPerBucket = 100;
+  Bucket buckets_[kBucketCount];
+  FRIEND_TEST_ALL_PREFIXES(ThreadCacheTest, LargeAllocationsAreNotCached);
+  FRIEND_TEST_ALL_PREFIXES(ThreadCacheTest, MultipleThreadCaches);
+};
+ALWAYS_INLINE bool ThreadCache::MaybePutInCache(void* address,
+                                                size_t bucket_index) {
+  if (bucket_index >= kBucketCount)
+    return false;
+  auto& bucket = buckets_[bucket_index];
+  if (bucket.count >= kMaxCountPerBucket)
+    return false;
+  PA_DCHECK(bucket.count != 0 || bucket.freelist_head == nullptr);
+  auto* entry = reinterpret_cast<PartitionFreelistEntry*>(address);
+  entry->next = PartitionFreelistEntry::Encode(bucket.freelist_head);
+  bucket.freelist_head = entry;
+  bucket.count++;
+  return true;
+}
+ALWAYS_INLINE void* ThreadCache::GetFromCache(size_t bucket_index) {
+  // Only handle "small" allocations.
+  if (bucket_index >= kBucketCount)
+    return nullptr;
+  auto& bucket = buckets_[bucket_index];
+  auto* result = bucket.freelist_head;
+  if (!result) {
+    PA_DCHECK(bucket.count == 0);
+    return nullptr;
+  }
+  PA_DCHECK(bucket.count != 0);
+  auto* next = EncodedPartitionFreelistEntry::Decode(result->next);
+  PA_DCHECK(result != next);
+  bucket.count--;
+  PA_DCHECK(bucket.count != 0 || !next);
+  bucket.freelist_head = next;
+  return result;
+}
+}  // namespace internal
+}  // namespace base
+#endif  // BASE_ALLOCATOR_PARTITION_ALLOCATOR_THREAD_CACHE_H_
--- a/base/allocator/partition_allocator/thread_cache_unittest.cc
+++ b/base/allocator/partition_allocator/thread_cache_unittest.cc
+// Copyright 2020 The Chromium Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style license that can be
+// found in the LICENSE file.
+#include "base/allocator/partition_allocator/thread_cache.h"
+#include <vector>
+#include "base/allocator/buildflags.h"
+#include "base/allocator/partition_allocator/partition_alloc.h"
+#include "base/bind.h"
+#include "base/callback.h"
+#include "base/synchronization/lock.h"
+#include "base/test/bind_test_util.h"
+#include "base/threading/platform_thread.h"
+#include "build/build_config.h"
+#include "testing/gtest/include/gtest/gtest.h"
+// Only a single partition can have a thread cache at a time. When
+// PartitionAlloc is malloc(), it is already in use.
+//
+// With *SAN, PartitionAlloc is replaced in partition_alloc.h by ASAN, so we
+// cannot test the thread cache.
+//
+// Finally, the thread cache currently uses `thread_local`, which causes issues
+// on Windows 7 (at least). As long as it doesn't use something else on Windows,
+// disable the cache (and tests)
+#if !BUILDFLAG(USE_PARTITION_ALLOC_AS_MALLOC) && \
+    !defined(MEMORY_TOOL_REPLACES_ALLOCATOR) && defined(OS_LINUX)
+namespace base {
+namespace internal {
+namespace {
+class LambdaThreadDelegate : public PlatformThread::Delegate {
+ public:
+  explicit LambdaThreadDelegate(OnceClosure f) : f_(std::move(f)) {}
+  void ThreadMain() override { std::move(f_).Run(); }
+ private:
+  OnceClosure f_;
+};
+// Need to be a global object without a destructor, because the cache is a
+// global object with a destructor (to handle thread destruction), and the
+// PartitionRoot has to outlive it.
+//
+// Forbid extras, since they make finding out which bucket is used harder.
+NoDestructor<ThreadSafePartitionRoot> g_root{true, true};
+size_t BucketIndexForSize(size_t size) {
+  auto* bucket = g_root->SizeToBucket(size);
+  return bucket - g_root->buckets;
+}
+size_t FillThreadCacheAndReturnIndex(size_t size, size_t count = 1) {
+  size_t bucket_index = BucketIndexForSize(size);
+  std::vector<void*> allocated_data;
+  for (size_t i = 0; i < count; ++i) {
+    allocated_data.push_back(g_root->Alloc(size, ""));
+  }
+  for (void* ptr : allocated_data) {
+    g_root->Free(ptr);
+  }
+  return bucket_index;
+}
+}  // namespace
+class ThreadCacheTest : public ::testing::Test {
+ protected:
+  void SetUp() override {
+    auto* tcache = g_root->thread_cache_for_testing();
+    if (tcache)
+      tcache->Purge();
+  }
+  void TearDown() override {}
+};
+TEST_F(ThreadCacheTest, Simple) {
+  const size_t kTestSize = 12;
+  void* ptr = g_root->Alloc(kTestSize, "");
+  ASSERT_TRUE(ptr);
+  // There is a cache.
+  auto* tcache = g_root->thread_cache_for_testing();
+  EXPECT_TRUE(tcache);
+  size_t index = BucketIndexForSize(kTestSize);
+  EXPECT_EQ(0u, tcache->bucket_count_for_testing(index));
+  g_root->Free(ptr);
+  // Freeing fills the thread cache.
+  EXPECT_EQ(1u, tcache->bucket_count_for_testing(index));
+  void* ptr2 = g_root->Alloc(kTestSize, "");
+  EXPECT_EQ(ptr, ptr2);
+  // Allocated from the thread cache.
+  EXPECT_EQ(0u, tcache->bucket_count_for_testing(index));
+}
+TEST_F(ThreadCacheTest, InexactSizeMatch) {
+  const size_t kTestSize = 12;
+  void* ptr = g_root->Alloc(kTestSize, "");
+  ASSERT_TRUE(ptr);
+  // There is a cache.
+  auto* tcache = g_root->thread_cache_for_testing();
+  EXPECT_TRUE(tcache);
+  size_t index = BucketIndexForSize(kTestSize);
+  EXPECT_EQ(0u, tcache->bucket_count_for_testing(index));
+  g_root->Free(ptr);
+  // Freeing fills the thread cache.
+  EXPECT_EQ(1u, tcache->bucket_count_for_testing(index));
+  void* ptr2 = g_root->Alloc(kTestSize + 1, "");
+  EXPECT_EQ(ptr, ptr2);
+  // Allocated from the thread cache.
+  EXPECT_EQ(0u, tcache->bucket_count_for_testing(index));
+}
+TEST_F(ThreadCacheTest, MultipleObjectsCachedPerBucket) {
+  size_t bucket_index = FillThreadCacheAndReturnIndex(100, 10);
+  auto* tcache = g_root->thread_cache_for_testing();
+  EXPECT_EQ(10u, tcache->bucket_count_for_testing(bucket_index));
+}
+TEST_F(ThreadCacheTest, ObjectsCachedCountIsLimited) {
+  size_t bucket_index = FillThreadCacheAndReturnIndex(100, 1000);
+  auto* tcache = g_root->thread_cache_for_testing();
+  EXPECT_LT(tcache->bucket_count_for_testing(bucket_index), 1000u);
+}
+TEST_F(ThreadCacheTest, Purge) {
+  size_t bucket_index = FillThreadCacheAndReturnIndex(100, 10);
+  auto* tcache = g_root->thread_cache_for_testing();
+  EXPECT_EQ(10u, tcache->bucket_count_for_testing(bucket_index));
+  tcache->Purge();
+  EXPECT_EQ(0u, tcache->bucket_count_for_testing(bucket_index));
+}
+TEST_F(ThreadCacheTest, NoCrossPartitionCache) {
+  const size_t kTestSize = 12;
+  ThreadSafePartitionRoot root{true, false};
+  size_t bucket_index = FillThreadCacheAndReturnIndex(kTestSize);
+  void* ptr = root.Alloc(kTestSize, "");
+  ASSERT_TRUE(ptr);
+  auto* tcache = g_root->thread_cache_for_testing();
+  EXPECT_EQ(1u, tcache->bucket_count_for_testing(bucket_index));
+  ThreadSafePartitionRoot::Free(ptr);
+  EXPECT_EQ(1u, tcache->bucket_count_for_testing(bucket_index));
+}
+#if ENABLE_THREAD_CACHE_STATISTICS  // Required to record hits and misses.
+TEST_F(ThreadCacheTest, LargeAllocationsAreNotCached) {
+  auto* tcache = g_root->thread_cache_for_testing();
+  size_t hits_before = tcache ? tcache->hits_ : 0;
+  FillThreadCacheAndReturnIndex(100 * 1024);
+  tcache = g_root->thread_cache_for_testing();
+  EXPECT_EQ(hits_before, tcache->hits_);
+}
+#endif
+TEST_F(ThreadCacheTest, MultipleThreadCaches) {
+  const size_t kTestSize = 100;
+  FillThreadCacheAndReturnIndex(kTestSize);
+  auto* parent_thread_tcache = g_root->thread_cache_for_testing();
+  ASSERT_TRUE(parent_thread_tcache);
+  LambdaThreadDelegate delegate{BindLambdaForTesting([&]() {
+    EXPECT_FALSE(g_root->thread_cache_for_testing());  // No allocations yet.
+    FillThreadCacheAndReturnIndex(kTestSize);
+    auto* tcache = g_root->thread_cache_for_testing();
+    EXPECT_TRUE(tcache);
+    EXPECT_NE(parent_thread_tcache, tcache);
+  })};
+  PlatformThreadHandle thread_handle;
+  PlatformThread::Create(0, &delegate, &thread_handle);
+  PlatformThread::Join(thread_handle);
+}
+TEST_F(ThreadCacheTest, ThreadCacheReclaimedWhenThreadExits) {
+  const size_t kTestSize = 100;
+  // Make sure that there is always at least one object allocated in the test
+  // bucket, so that the PartitionPage is no reclaimed.
+  void* tmp = g_root->Alloc(kTestSize, "");
+  void* other_thread_ptr;
+  LambdaThreadDelegate delegate{BindLambdaForTesting([&]() {
+    EXPECT_FALSE(g_root->thread_cache_for_testing());  // No allocations yet.
+    other_thread_ptr = g_root->Alloc(kTestSize, "");
+    g_root->Free(other_thread_ptr);
+    // |other_thread_ptr| is now in the thread cache.
+  })};
+  PlatformThreadHandle thread_handle;
+  PlatformThread::Create(0, &delegate, &thread_handle);
+  PlatformThread::Join(thread_handle);
+  void* this_thread_ptr = g_root->Alloc(kTestSize, "");
+  // |other_thread_ptr| was returned to the central allocator, and is returned
+  // |here, as is comes from the freelist.
+  EXPECT_EQ(this_thread_ptr, other_thread_ptr);
+  g_root->Free(other_thread_ptr);
+  g_root->Free(tmp);
+}
+}  // namespace internal
+}  // namespace base
+#endif  // !BUILDFLAG(USE_PARTITION_ALLOC_AS_MALLOC) &&
+        // !defined(MEMORY_TOOL_REPLACES_ALLOCATOR) && defined(OS_LINUX)