Fix re-entrancy into Chrome's TLS implementation during thread destruction.

Chrome's TLS implementation is not safe to use after internal structures have been destroyed during thread destruction. On POSIX, doing so will cause extra, unnecessary work as internal structures are recreated, then destroyed later. On Windows, doing so will cause a leak. This CL adds a kDestroyed state that Chrome TLS sets after it finishes destruction. Slot::Get() DCHECKs that state != kDestroyed. There are currently two legitimate cases where an external consumer would need to check the status of kDestroyed. The heap-profiling shim can be invoked by BoringSSL's thread destructor. It's important that this doesn't cause re-entrancy into Chrome TLS. This CL adds a private function ThreadLocalStorage::HasBeenDestroyed which is exposed via friendship for the two implementations of the heap profiler to use. Change-Id: I5bd8dca3d48ee1c9d456bc05fa0824800575ca38 Bug: 825218 Reviewed-on: https://chromium-review.googlesource.com/978393 Commit-Queue: Erik Chen <erikchen@chromium.org> Reviewed-by: Nico Weber <thakis@chromium.org> Reviewed-by: Robert Liao <robliao@chromium.org> Reviewed-by: Dmitry Skiba <dskiba@chromium.org> Cr-Commit-Position: refs/heads/master@{#547949}

Fix re-entrancy into Chrome's TLS implementation during thread destruction.
Chrome's TLS implementation is not safe to use after internal structures have been destroyed during thread destruction. On POSIX, doing so will cause extra, unnecessary work as internal structures are recreated, then destroyed later. On Windows, doing so will cause a leak. This CL adds a kDestroyed state that Chrome TLS sets after it finishes destruction. Slot::Get() DCHECKs that state != kDestroyed. There are currently two legitimate cases where an external consumer would need to check the status of kDestroyed. The heap-profiling shim can be invoked by BoringSSL's thread destructor. It's important that this doesn't cause re-entrancy into Chrome TLS. This CL adds a private function ThreadLocalStorage::HasBeenDestroyed which is exposed via friendship for the two implementations of the heap profiler to use. Change-Id: I5bd8dca3d48ee1c9d456bc05fa0824800575ca38 Bug: 825218 Reviewed-on: https://chromium-review.googlesource.com/978393 Commit-Queue: Erik Chen <erikchen@chromium.org> Reviewed-by: Nico Weber <thakis@chromium.org> Reviewed-by: Robert Liao <robliao@chromium.org> Reviewed-by: Dmitry Skiba <dskiba@chromium.org> Cr-Commit-Position: refs/heads/master@{#547949}
929aad50 · Erik Chen · Commit Bot · 510098ed · 929aad50 · 929aad50
Commit 929aad50 authored Apr 04, 2018 by Erik Chen Committed by Commit Bot Apr 04, 2018
5 changed files
--- a/base/threading/thread_local_storage.cc
+++ b/base/threading/thread_local_storage.cc
@@ -69,6 +69,45 @@ namespace {
 base::subtle::Atomic32 g_native_tls_key =
    PlatformThreadLocalStorage::TLS_KEY_OUT_OF_INDEXES;

+// The OS TLS slot has three states:
+//   * kUninitialized: Any call to Slot::Get()/Set() will create the base
+//     per-thread TLS state. On POSIX, kUninitialized must be 0.
+//   * [Memory Address]: Raw pointer to the base per-thread TLS state.
+//   * kDestroyed: The base per-thread TLS state has been freed.
+//
+// Final States:
+//   * Windows: kDestroyed. Windows does not iterate through the OS TLS to clean
+//     up the values.
+//   * POSIX: kUninitialized. POSIX iterates through TLS until all slots contain
+//     nullptr.
+//
+// More details on this design:
+//   We need some type of thread-local state to indicate that the TLS system has
+//   been destroyed. To do so, we leverage the multi-pass nature of destruction
+//   of pthread_key.
+//
+//    a) After destruction of TLS system, we set the pthread_key to a sentinel
+//       kDestroyed.
+//    b) All calls to Slot::Get() DCHECK that the state is not kDestroyed, and
+//       any system which might potentially invoke Slot::Get() after destruction
+//       of TLS must check ThreadLOcalStorage::ThreadIsBeingDestroyed().
+//    c) After a full pass of the pthread_keys, on the next invocation of
+//       ConstructTlsVector(), we'll then set the key to nullptr.
+//    d) At this stage, the TLS system is back in its uninitialized state.
+//    e) If in the second pass of destruction of pthread_keys something were to
+//       re-initialize TLS [this should never happen! Since the only code which
+//       uses Chrome TLS is Chrome controlled, we should really be striving for
+//       single-pass destruction], then TLS will be re-initialized and then go
+//       through the 2-pass destruction system again. Everything should just
+//       work (TM).
+
+// The consumers of kUninitialized and kDestroyed expect void*, since that's
+// what the API exposes on both POSIX and Windows.
+void* const kUninitialized = nullptr;
+
+// A sentinel value to indicate that the TLS system has been destroyed.
+void* const kDestroyed = reinterpret_cast<void*>(-3);
+
 // The maximum number of slots in our thread local storage stack.
 constexpr int kThreadLocalStorageSize = 256;

@@ -139,7 +178,7 @@ TlsVectorEntry* ConstructTlsVector() {
      key = base::subtle::NoBarrier_Load(&g_native_tls_key);
    }
  }
-  CHECK(!PlatformThreadLocalStorage::GetTLSValue(key));
+  CHECK_EQ(PlatformThreadLocalStorage::GetTLSValue(key), kUninitialized);

  // Some allocators, such as TCMalloc, make use of thread local storage. As a
  // result, any attempt to call new (or malloc) will lazily cause such a system
@@ -162,6 +201,16 @@ TlsVectorEntry* ConstructTlsVector() {
 }

 void OnThreadExitInternal(TlsVectorEntry* tls_data) {
+  // This branch is for POSIX, where this function is called twice. The first
+  // pass calls dtors and sets state to kDestroyed. The second pass sets
+  // kDestroyed to kUninitialized.
+  if (tls_data == kDestroyed) {
+    PlatformThreadLocalStorage::TLSKey key =
+        base::subtle::NoBarrier_Load(&g_native_tls_key);
+    PlatformThreadLocalStorage::SetTLSValue(key, kUninitialized);
+    return;
+  }
+
  DCHECK(tls_data);
  // Some allocators, such as TCMalloc, use TLS. As a result, when a thread
  // terminates, one of the destructor calls we make may be to shut down an
@@ -221,7 +270,7 @@ void OnThreadExitInternal(TlsVectorEntry* tls_data) {
  }

  // Remove our stack allocated vector.
-  PlatformThreadLocalStorage::SetTLSValue(key, nullptr);
+  PlatformThreadLocalStorage::SetTLSValue(key, kDestroyed);
 }

 }  // namespace
@@ -237,8 +286,13 @@ void PlatformThreadLocalStorage::OnThreadExit() {
  if (key == PlatformThreadLocalStorage::TLS_KEY_OUT_OF_INDEXES)
    return;
  void *tls_data = GetTLSValue(key);
+
+  // On Windows, thread destruction callbacks are only invoked once per module,
+  // so there should be no way that this could be invoked twice.
+  DCHECK_NE(tls_data, kDestroyed);
+
  // Maybe we have never initialized TLS for this thread.
-  if (!tls_data)
+  if (tls_data == kUninitialized)
    return;
  OnThreadExitInternal(static_cast<TlsVectorEntry*>(tls_data));
 }
@@ -250,11 +304,19 @@ void PlatformThreadLocalStorage::OnThreadExit(void* value) {

 }  // namespace internal

+bool ThreadLocalStorage::HasBeenDestroyed() {
+  PlatformThreadLocalStorage::TLSKey key =
+      base::subtle::NoBarrier_Load(&g_native_tls_key);
+  if (key == PlatformThreadLocalStorage::TLS_KEY_OUT_OF_INDEXES)
+    return false;
+  return PlatformThreadLocalStorage::GetTLSValue(key) == kDestroyed;
+}
+
 void ThreadLocalStorage::Slot::Initialize(TLSDestructorFunc destructor) {
  PlatformThreadLocalStorage::TLSKey key =
      base::subtle::NoBarrier_Load(&g_native_tls_key);
  if (key == PlatformThreadLocalStorage::TLS_KEY_OUT_OF_INDEXES ||
-      !PlatformThreadLocalStorage::GetTLSValue(key)) {
+      PlatformThreadLocalStorage::GetTLSValue(key) == kUninitialized) {
    ConstructTlsVector();
  }

@@ -300,6 +362,7 @@ void* ThreadLocalStorage::Slot::Get() const {
  TlsVectorEntry* tls_data = static_cast<TlsVectorEntry*>(
      PlatformThreadLocalStorage::GetTLSValue(
          base::subtle::NoBarrier_Load(&g_native_tls_key)));
+  DCHECK_NE(tls_data, kDestroyed);
  if (!tls_data)
    tls_data = ConstructTlsVector();
  DCHECK_NE(slot_, kInvalidSlotValue);
@@ -314,6 +377,7 @@ void ThreadLocalStorage::Slot::Set(void* value) {
  TlsVectorEntry* tls_data = static_cast<TlsVectorEntry*>(
      PlatformThreadLocalStorage::GetTLSValue(
          base::subtle::NoBarrier_Load(&g_native_tls_key)));
+  DCHECK_NE(tls_data, kDestroyed);
  if (!tls_data)
    tls_data = ConstructTlsVector();
  DCHECK_NE(slot_, kInvalidSlotValue);

--- a/base/threading/thread_local_storage.h
+++ b/base/threading/thread_local_storage.h
@@ -18,10 +18,20 @@
 #include <pthread.h>
 #endif

+namespace profiling {
+class MemlogAllocatorShimInternal;
+}  // namespace profiling
+
 namespace base {

+namespace trace_event {
+class MallocDumpProvider;
+}  // namespace trace_event
+
 namespace internal {

+class ThreadLocalStorageTestInternal;
+
 // WARNING: You should *NOT* use this class directly.
 // PlatformThreadLocalStorage is a low-level abstraction of the OS's TLS
 // interface. Instead, you should use one of the following:
@@ -90,7 +100,6 @@ class BASE_EXPORT PlatformThreadLocalStorage {
 // an API for portability.
 class BASE_EXPORT ThreadLocalStorage {
 public:
-
  // Prototype for the TLS destructor function, which can be optionally used to
  // cleanup thread local storage on thread exit.  'value' is the data that is
  // stored in thread local storage.
@@ -134,6 +143,19 @@ class BASE_EXPORT ThreadLocalStorage {
  };

 private:
+  // In most cases, most callers should not need access to HasBeenDestroyed().
+  // If you are working in code that runs during thread destruction, contact the
+  // base OWNERs for advice and then make a friend request.
+  //
+  // Returns |true| if Chrome's implementation of TLS has been destroyed during
+  // thread destruction. Attempting to call Slot::Get() during destruction is
+  // disallowed and will hit a DCHECK. Any code that relies on TLS during thread
+  // destruction must first check this method before calling Slot::Get().
+  friend class base::internal::ThreadLocalStorageTestInternal;
+  friend class base::trace_event::MallocDumpProvider;
+  friend class profiling::MemlogAllocatorShimInternal;
+  static bool HasBeenDestroyed();
+
  DISALLOW_COPY_AND_ASSIGN(ThreadLocalStorage);
 };


--- a/base/threading/thread_local_storage_unittest.cc
+++ b/base/threading/thread_local_storage_unittest.cc
@@ -23,6 +23,22 @@

 namespace base {

+#if defined(OS_POSIX)
+
+namespace internal {
+
+// This class is friended by ThreadLocalStorage.
+class ThreadLocalStorageTestInternal {
+ public:
+  static bool HasBeenDestroyed() {
+    return ThreadLocalStorage::HasBeenDestroyed();
+  }
+};
+
+}  // namespace internal
+
+#endif
+
 namespace {

 const int kInitialTlsValue = 0x5555;
@@ -80,6 +96,97 @@ void ThreadLocalStorageCleanup(void *value) {
  TLSSlot().Set(value);
 }

+#if defined(OS_POSIX)
+constexpr intptr_t kDummyValue = 0xABCD;
+constexpr size_t kKeyCount = 20;
+
+// The order in which pthread keys are destructed is non-deterministic.
+// Hopefully, of the 20 keys we create, some of them should be destroyed
+class UseTLSDuringDestructionThread : public SimpleThread {
+ public:
+  UseTLSDuringDestructionThread() : SimpleThread("prefix") {}
+
+  // The order in which pthread_key destructors are called is not well defined.
+  // Hopefully, by creating 10 both before and after initializing TLS on the
+  // thread, at least 1 will be called after TLS destruction.
+  void Run() override {
+    ASSERT_FALSE(internal::ThreadLocalStorageTestInternal::HasBeenDestroyed());
+
+    // Create 10 pthread keys before initializing TLS on the thread.
+    size_t slot_index = 0;
+    for (; slot_index < 10; ++slot_index) {
+      CreateTlsKeyWithDestructor(slot_index);
+    }
+
+    // Initialize the Chrome TLS system. It's possible that base::Thread has
+    // already initialized Chrome TLS, but we don't rely on that.
+    slot_.Set(reinterpret_cast<void*>(kDummyValue));
+
+    // Create 10 pthread keys after initializing TLS on the thread.
+    for (; slot_index < kKeyCount; ++slot_index) {
+      CreateTlsKeyWithDestructor(slot_index);
+    }
+  }
+
+  bool teardown_works_correctly() { return teardown_works_correctly_; }
+
+ private:
+  struct TLSState {
+    pthread_key_t key;
+    bool* teardown_works_correctly;
+  };
+
+  // The POSIX TLS destruction API takes as input a single C-function, which is
+  // called with the current |value| of a (key, value) pair. We need this
+  // function to do two things: set the |value| to nullptr, which requires
+  // knowing the associated |key|, and update the |teardown_works_correctly_|
+  // state.
+  //
+  // To accomplish this, we set the value to an instance of TLSState, which
+  // contains |key| as well as a pointer to |teardown_works_correctly|.
+  static void ThreadLocalDestructor(void* value) {
+    TLSState* state = static_cast<TLSState*>(value);
+    int result = pthread_setspecific(state->key, nullptr);
+    ASSERT_EQ(result, 0);
+
+    // If this path is hit, then the thread local destructor was called after
+    // the Chrome-TLS destructor and the internal state was updated correctly.
+    // No further checks are necessary.
+    if (internal::ThreadLocalStorageTestInternal::HasBeenDestroyed()) {
+      *(state->teardown_works_correctly) = true;
+      return;
+    }
+
+    // If this path is hit, then the thread local destructor was called before
+    // the Chrome-TLS destructor is hit. The ThreadLocalStorage::Slot should
+    // still function correctly.
+    ASSERT_EQ(reinterpret_cast<intptr_t>(slot_.Get()), kDummyValue);
+  }
+
+  void CreateTlsKeyWithDestructor(size_t index) {
+    ASSERT_LT(index, kKeyCount);
+
+    tls_states_[index].teardown_works_correctly = &teardown_works_correctly_;
+    int result = pthread_key_create(
+        &(tls_states_[index].key),
+        UseTLSDuringDestructionThread::ThreadLocalDestructor);
+    ASSERT_EQ(result, 0);
+
+    result = pthread_setspecific(tls_states_[index].key, &tls_states_[index]);
+    ASSERT_EQ(result, 0);
+  }
+
+  static base::ThreadLocalStorage::Slot slot_;
+  bool teardown_works_correctly_ = false;
+  TLSState tls_states_[kKeyCount];
+
+  DISALLOW_COPY_AND_ASSIGN(UseTLSDuringDestructionThread);
+};
+
+base::ThreadLocalStorage::Slot UseTLSDuringDestructionThread::slot_;
+
+#endif  // defined(OS_POSIX)
+
 }  // namespace

 TEST(ThreadLocalStorageTest, Basics) {
@@ -142,4 +249,17 @@ TEST(ThreadLocalStorageTest, TLSReclaim) {
  }
 }

+#if defined(OS_POSIX)
+// Unlike POSIX, Windows does not iterate through the OS TLS to cleanup any
+// values there. Instead a per-module thread destruction function is called.
+// However, it is not possible to perform a check after this point (as the code
+// is detached from the thread), so this check remains POSIX only.
+TEST(ThreadLocalStorageTest, UseTLSDuringDestruction) {
+  UseTLSDuringDestructionThread thread;
+  thread.Start();
+  thread.Join();
+  EXPECT_TRUE(thread.teardown_works_correctly());
+}
+#endif
+
 }  // namespace base
--- a/base/trace_event/malloc_dump_provider.cc
+++ b/base/trace_event/malloc_dump_provider.cc
@@ -12,6 +12,7 @@
 #include "base/allocator/allocator_shim.h"
 #include "base/allocator/buildflags.h"
 #include "base/debug/profiler.h"
+#include "base/threading/thread_local_storage.h"
 #include "base/trace_event/heap_profiler_allocation_context.h"
 #include "base/trace_event/heap_profiler_allocation_context_tracker.h"
 #include "base/trace_event/heap_profiler_heap_dump_writer.h"
@@ -333,6 +334,9 @@ void MallocDumpProvider::OnHeapProfilingEnabled(bool enabled) {
 }

 void MallocDumpProvider::InsertAllocation(void* address, size_t size) {
+  if (UNLIKELY(base::ThreadLocalStorage::HasBeenDestroyed()))
+    return;
+
  // CurrentId() can be a slow operation (crbug.com/497226). This apparently
  // redundant condition short circuits the CurrentID() calls when unnecessary.
  if (tid_dumping_heap_ != kInvalidThreadId &&
@@ -358,6 +362,9 @@ void MallocDumpProvider::InsertAllocation(void* address, size_t size) {
 }

 void MallocDumpProvider::RemoveAllocation(void* address) {
+  if (UNLIKELY(base::ThreadLocalStorage::HasBeenDestroyed()))
+    return;
+
  // No re-entrancy is expected here as none of the calls below should
  // cause a free()-s (|allocation_register_| does its own heap management).
  if (tid_dumping_heap_ != kInvalidThreadId &&

--- a/chrome/common/profiling/memlog_allocator_shim.cc
+++ b/chrome/common/profiling/memlog_allocator_shim.cc