Add LazilyDeallocatedDeque a custom deque for TaskQueueManager

Our usage pattern is unfortunate for existing queues such as base::circular_deque. We tend to fill up an empty queue and then drain all those tasks until it's empty. This means the queue yo-yos in size which confuses the memory reclamation schemes of most queues. As an optimisation we introduce a deque specialised for TaskQueueManager's usage patterns. For performance (memory allocation isn't free) we don't automatically reclaim memory when the queue becomes empty. Instead we rely on the surrounding code periodically calling MaybeShrinkQueue, ideally when the queue is empty. We keep track of the maximum recent queue size and rate limit how often MaybeShrinkQueue actually shrinks the buffer to avoid unnecessary churn. This yields a nice win on our microbenchmark: Patch: us/run for 10000 delayed tasks with N queues 1 queue 4 queues 8 queues 32 queues 33448.166666666664 33215.75496688742 33484.34 34018.37414965987 33972.18243243243 33846.91891891892 34489.737931034484 34727.90277777778 33367.90666666667 33167.54304635762 33392.96 33906.89864864865 33392.13333333333 33107.17763157895 33340.18 33718.73825503356 37921.01515151515 39379.06299212598 38851.27906976744 39366.03125 38171.564885496184 37401.72388059701 37640.32330827068 37800.51127819549 34691.2275862069 34359.61643835616 34993.468531468534 35366.795774647886 35981.20863309353 35089.18881118881 38530.230769230766 39280.3515625 39262.8671875 36411.384057971016 33576.10067114094 33939.69594594595 37913.59848484849 38324.076335877864 38061.59848484849 39921.00793650794 Average 35812.1871 35430.24471 35636.02188 36204.63076 ToT: us/run for 10000 delayed tasks with N queues 1 queue 4 queues 8 queues 32 queues 40459.540322580644 40536.04838709677 38994.573643410855 38696.2 39422.149606299216 39299.5 37888.18939393939 37874.74436090225 38419.70229007633 38025.742424242424 37844.41353383459 38020.469696969696 35052.72027972028 38147.80303030303 35504.89361702128 34138.02721088436 37096.77777777778 34942.541666666664 37003.529411764706 37579.60447761194 38818.67441860465 38233.068702290075 37978.628787878784 37867.57142857143 38455.49618320611 37903.05303030303 38106.143939393936 38129.5 40609.33064516129 37721.75187969925 34656.441379310345 34294.33561643836 35273.704225352114 34646.324137931035 34335.643835616436 34311.82876712329 35821.41428571429 35362.035211267605 37522.27611940299 35429.281690140844 Average 37942.951 37481.78685 36983.47337 36634.15632 Percentage improvement 5.61570422 5.473437399 3.643388159 1.172472933 NB the reason the improvement goes down with the number of queues is because we're saving malloc overhead in the queue, but a constant number of tasks are posted across N queues. This means the more queues we have in this test, the less loaded the queues are individually. Change-Id: I75be9c1f266700ac76003ae0191ce0a59539298a Reviewed-on: https://chromium-review.googlesource.com/1080792 Commit-Queue: Alex Clarke <alexclarke@chromium.org> Reviewed-by: Alexander Timin <altimin@chromium.org> Reviewed-by: Greg Kraynov <kraynov@chromium.org> Reviewed-by: Sami Kyöstilä <skyostil@chromium.org> Cr-Commit-Position: refs/heads/master@{#564939}

Add LazilyDeallocatedDeque a custom deque for TaskQueueManager
Our usage pattern is unfortunate for existing queues such as base::circular_deque. We tend to fill up an empty queue and then drain all those tasks until it's empty. This means the queue yo-yos in size which confuses the memory reclamation schemes of most queues. As an optimisation we introduce a deque specialised for TaskQueueManager's usage patterns. For performance (memory allocation isn't free) we don't automatically reclaim memory when the queue becomes empty. Instead we rely on the surrounding code periodically calling MaybeShrinkQueue, ideally when the queue is empty. We keep track of the maximum recent queue size and rate limit how often MaybeShrinkQueue actually shrinks the buffer to avoid unnecessary churn. This yields a nice win on our microbenchmark: Patch: us/run for 10000 delayed tasks with N queues 1 queue 4 queues 8 queues 32 queues 33448.166666666664 33215.75496688742 33484.34 34018.37414965987 33972.18243243243 33846.91891891892 34489.737931034484 34727.90277777778 33367.90666666667 33167.54304635762 33392.96 33906.89864864865 33392.13333333333 33107.17763157895 33340.18 33718.73825503356 37921.01515151515 39379.06299212598 38851.27906976744 39366.03125 38171.564885496184 37401.72388059701 37640.32330827068 37800.51127819549 34691.2275862069 34359.61643835616 34993.468531468534 35366.795774647886 35981.20863309353 35089.18881118881 38530.230769230766 39280.3515625 39262.8671875 36411.384057971016 33576.10067114094 33939.69594594595 37913.59848484849 38324.076335877864 38061.59848484849 39921.00793650794 Average 35812.1871 35430.24471 35636.02188 36204.63076 ToT: us/run for 10000 delayed tasks with N queues 1 queue 4 queues 8 queues 32 queues 40459.540322580644 40536.04838709677 38994.573643410855 38696.2 39422.149606299216 39299.5 37888.18939393939 37874.74436090225 38419.70229007633 38025.742424242424 37844.41353383459 38020.469696969696 35052.72027972028 38147.80303030303 35504.89361702128 34138.02721088436 37096.77777777778 34942.541666666664 37003.529411764706 37579.60447761194 38818.67441860465 38233.068702290075 37978.628787878784 37867.57142857143 38455.49618320611 37903.05303030303 38106.143939393936 38129.5 40609.33064516129 37721.75187969925 34656.441379310345 34294.33561643836 35273.704225352114 34646.324137931035 34335.643835616436 34311.82876712329 35821.41428571429 35362.035211267605 37522.27611940299 35429.281690140844 Average 37942.951 37481.78685 36983.47337 36634.15632 Percentage improvement 5.61570422 5.473437399 3.643388159 1.172472933 NB the reason the improvement goes down with the number of queues is because we're saving malloc overhead in the queue, but a constant number of tasks are posted across N queues. This means the more queues we have in this test, the less loaded the queues are individually. Change-Id: I75be9c1f266700ac76003ae0191ce0a59539298a Reviewed-on: https://chromium-review.googlesource.com/1080792 Commit-Queue: Alex Clarke <alexclarke@chromium.org> Reviewed-by: Alexander Timin <altimin@chromium.org> Reviewed-by: Greg Kraynov <kraynov@chromium.org> Reviewed-by: Sami Kyöstilä <skyostil@chromium.org> Cr-Commit-Position: refs/heads/master@{#564939}
68d36054 · Alex Clarke · Commit Bot · 437a45bd · 68d36054 · 68d36054
Commit 68d36054 authored Jun 06, 2018 by Alex Clarke Committed by Commit Bot Jun 06, 2018
7 changed files
--- a/third_party/blink/renderer/platform/scheduler/BUILD.gn
+++ b/third_party/blink/renderer/platform/scheduler/BUILD.gn
@@ -12,6 +12,7 @@ blink_platform_sources("scheduler") {
    "base/graceful_queue_shutdown_helper.cc",
    "base/graceful_queue_shutdown_helper.h",
    "base/intrusive_heap.h",
+    "base/lazily_deallocated_deque.h",
    "base/moveable_auto_lock.h",
    "base/real_time_domain.cc",
    "base/real_time_domain.h",
@@ -190,6 +191,7 @@ jumbo_source_set("unit_tests") {
  sources = [
    "base/intrusive_heap_unittest.cc",
+    "base/lazily_deallocated_deque_unittest.cc",
    "base/task_queue_manager_impl_unittest.cc",
    "base/task_queue_selector_unittest.cc",
    "base/time_domain_unittest.cc",

--- a/third_party/blink/renderer/platform/scheduler/base/lazily_deallocated_deque.h
+++ b/third_party/blink/renderer/platform/scheduler/base/lazily_deallocated_deque.h
+// Copyright 2018 The Chromium Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style license that can be
+// found in the LICENSE file.
+#ifndef THIRD_PARTY_BLINK_RENDERER_PLATFORM_SCHEDULER_BASE_LAZILY_DEALLOCATED_DEQUE_H_
+#define THIRD_PARTY_BLINK_RENDERER_PLATFORM_SCHEDULER_BASE_LAZILY_DEALLOCATED_DEQUE_H_
+#include <algorithm>
+#include <cmath>
+#include <vector>
+#include "base/gtest_prod_util.h"
+#include "base/logging.h"
+#include "base/time/time.h"
+namespace base {
+namespace sequence_manager {
+// A LazilyDeallocatedDeque specialized for the TaskQueueManager's usage
+// patterns. The queue generally grows while tasks are added and then removed
+// until empty and the cycle repeats.
+//
+// The main difference between sequence_manager::LazilyDeallocatedDeque and
+// others is memory management.  For performance (memory allocation isn't free)
+// we don't automatically reclaiming memory when the queue becomes empty.
+// Instead we rely on the surrounding code periodically calling
+// MaybeShrinkQueue, ideally when the queue is empty.
+//
+// We keep track of the maximum recent queue size and rate limit
+// MaybeShrinkQueue to avoid unnecessary churn.
+//
+// NB this queue isn't by itself thread safe.
+template <typename T>
+class LazilyDeallocatedDeque {
+ public:
+  enum {
+    // Minimum allocation for a ring. Note a ring of size 4 will only hold up to
+    // 3 elements.
+    kMinimumRingSize = 4,
+    // Maximum "wasted" capacity allowed when considering if we should resize
+    // the backing store.
+    kReclaimThreshold = 16,
+    // Used to rate limit how frequently MaybeShrinkQueue actually shrinks the
+    // queue.
+    kMinimumShrinkIntervalInSeconds = 5
+  };
+  LazilyDeallocatedDeque() {}
+  ~LazilyDeallocatedDeque() { clear(); }
+  bool empty() const { return size_ == 0; }
+  size_t max_size() const { return max_size_; }
+  size_t size() const { return size_; }
+  size_t capacity() const {
+    size_t capacity = 0;
+    for (const Ring* iter = head_.get(); iter; iter = iter->next_.get()) {
+      capacity += iter->capacity();
+    }
+    return capacity;
+  }
+  void clear() {
+    while (head_) {
+      head_ = std::move(head_->next_);
+    }
+    tail_ = nullptr;
+    size_ = 0;
+  }
+  // Assumed to be an uncommon operation.
+  void push_front(T t) {
+    if (!head_) {
+      head_ = std::make_unique<Ring>(kMinimumRingSize);
+      tail_ = head_.get();
+    }
+    // Grow if needed, by the minimum amount.
+    if (!head_->CanPush()) {
+      std::unique_ptr<Ring> new_ring = std::make_unique<Ring>(kMinimumRingSize);
+      new_ring->next_ = std::move(head_);
+      head_ = std::move(new_ring);
+    }
+    head_->push_front(std::move(t));
+    max_size_ = std::max(max_size_, ++size_);
+  }
+  // Assumed to be a common operation.
+  void push_back(T t) {
+    if (!head_) {
+      head_ = std::make_unique<Ring>(kMinimumRingSize);
+      tail_ = head_.get();
+    }
+    // Grow if needed.
+    if (!tail_->CanPush()) {
+      tail_->next_ = std::make_unique<Ring>(tail_->capacity() * 2);
+      tail_ = tail_->next_.get();
+    }
+    tail_->push_back(std::move(t));
+    max_size_ = std::max(max_size_, ++size_);
+  }
+  T& front() {
+    DCHECK(head_);
+    return head_->front();
+  }
+  const T& front() const {
+    DCHECK(head_);
+    return head_->front();
+  }
+  T& back() {
+    DCHECK(tail_);
+    return tail_->back();
+  }
+  const T& back() const {
+    DCHECK(tail_);
+    return tail_->back();
+  }
+  void pop_front() {
+    DCHECK(tail_);
+    head_->pop_front();
+    // If the ring has become empty and we have more than one ring, remove the
+    // head one (which we expect to have lower capacity than the remaining one).
+    if (head_->empty() && head_->next_) {
+      head_ = std::move(head_->next_);
+    }
+    --size_;
+  }
+  void swap(LazilyDeallocatedDeque& other) {
+    std::swap(head_, other.head_);
+    std::swap(tail_, other.tail_);
+    std::swap(size_, other.size_);
+    std::swap(max_size_, other.max_size_);
+    std::swap(next_resize_time_, other.next_resize_time_);
+  }
+  void MaybeShrinkQueue() {
+    if (!tail_)
+      return;
+    DCHECK_GE(max_size_, size_);
+    // Rate limit how often we shrink the queue because it's somewhat expensive.
+    TimeTicks current_time = TimeTicks::Now();
+    if (current_time < next_resize_time_)
+      return;
+    // Due to the way the Ring works we need 1 more slot than is used.
+    size_t new_capacity = max_size_ + 1;
+    if (new_capacity < kMinimumRingSize)
+      new_capacity = kMinimumRingSize;
+    // Reset |max_size_| so that unless usage has spiked up we will consider
+    // reclaiming it next time.
+    max_size_ = size_;
+    // Only realloc if the current capacity is sufficiently the observed maximum
+    // size for the previous period.
+    if (new_capacity + kReclaimThreshold >= capacity())
+      return;
+    SetCapacity(new_capacity);
+    next_resize_time_ =
+        current_time + TimeDelta::FromSeconds(kMinimumShrinkIntervalInSeconds);
+  }
+  void SetCapacity(size_t new_capacity) {
+    std::unique_ptr<Ring> new_ring = std::make_unique<Ring>(new_capacity);
+    DCHECK_GE(new_capacity, size_ + 1);
+    // Preserve the |size_| which counts down to zero in the while loop.
+    size_t real_size = size_;
+    while (!empty()) {
+      new_ring->push_back(std::move(head_->front()));
+      pop_front();
+    }
+    size_ = real_size;
+    DCHECK_EQ(head_.get(), tail_);
+    head_ = std::move(new_ring);
+    tail_ = head_.get();
+  }
+ private:
+  FRIEND_TEST_ALL_PREFIXES(LazilyDeallocatedDequeTest, RingPushFront);
+  FRIEND_TEST_ALL_PREFIXES(LazilyDeallocatedDequeTest, RingPushBack);
+  FRIEND_TEST_ALL_PREFIXES(LazilyDeallocatedDequeTest, RingCanPush);
+  FRIEND_TEST_ALL_PREFIXES(LazilyDeallocatedDequeTest, RingPushPopPushPop);
+  struct Ring {
+    explicit Ring(size_t capacity)
+        : capacity_(capacity),
+          front_index_(0),
+          back_index_(0),
+          data_(reinterpret_cast<T*>(new char[sizeof(T) * capacity])),
+          next_(nullptr) {}
+    Ring(Ring&& other) noexcept {
+      capacity_ = other.capacity_;
+      front_index_ = other.front_index_;
+      back_index_ = other.back_index_;
+      data_ = other.data_;
+      other.capacity_ = 0;
+      other.front_index_ = 0;
+      other.back_index_ = 0;
+      other.data_ = nullptr;
+    }
+    ~Ring() {
+      while (!empty()) {
+        pop_front();
+      }
+      delete[] reinterpret_cast<char*>(data_);
+    }
+    bool empty() const { return back_index_ == front_index_; }
+    size_t capacity() const { return capacity_; }
+    bool CanPush() const {
+      return front_index_ != CircularIncrement(back_index_);
+    }
+    void push_front(T&& t) {
+      // Mustn't appear to become empty.
+      DCHECK_NE(CircularDecrement(front_index_), back_index_);
+      new (&data_[front_index_]) T(std::move(t));
+      front_index_ = CircularDecrement(front_index_);
+    }
+    void push_back(T&& t) {
+      back_index_ = CircularIncrement(back_index_);
+      DCHECK(!empty());  // Mustn't appear to become empty.
+      new (&data_[back_index_]) T(std::move(t));
+    }
+    bool CanPop() const { return front_index_ != back_index_; }
+    void pop_front() {
+      DCHECK(!empty());
+      front_index_ = CircularIncrement(front_index_);
+      data_[front_index_].~T();
+    }
+    T& front() {
+      DCHECK(!empty());
+      return data_[CircularIncrement(front_index_)];
+    }
+    const T& front() const {
+      DCHECK(!empty());
+      return data_[CircularIncrement(front_index_)];
+    }
+    T& back() {
+      DCHECK(!empty());
+      return data_[back_index_];
+    }
+    const T& back() const {
+      DCHECK(!empty());
+      return data_[back_index_];
+    }
+    size_t CircularDecrement(size_t index) const {
+      if (index == 0)
+        return capacity_ - 1;
+      return --index;
+    }
+    size_t CircularIncrement(size_t index) const {
+      ++index;
+      if (index == capacity_)
+        return 0;
+      return index;
+    }
+    size_t capacity_;
+    size_t front_index_;
+    size_t back_index_;
+    T* data_;
+    std::unique_ptr<Ring> next_;
+    DISALLOW_COPY_AND_ASSIGN(Ring);
+  };
+ public:
+  class Iterator {
+   public:
+    using value_type = T;
+    using pointer = const T*;
+    using reference = const T&;
+    const T& operator->() const { return ring_->data_[index_]; }
+    const T& operator*() const { return ring_->data_[index_]; }
+    Iterator& operator++() {
+      if (index_ == ring_->back_index_) {
+        ring_ = ring_->next_.get();
+        index_ = 0;
+      } else {
+        index_ = ring_->CircularIncrement(index_);
+      }
+      return *this;
+    }
+    operator bool() const { return !!ring_; }
+   private:
+    explicit Iterator(const Ring* ring) {
+      if (!ring || ring->empty()) {
+        ring_ = nullptr;
+        index_ = 0;
+        return;
+      }
+      ring_ = ring;
+      index_ = ring_->CircularIncrement(ring->front_index_);
+    }
+    const Ring* ring_;
+    size_t index_;
+    friend class LazilyDeallocatedDeque;
+  };
+  Iterator begin() const { return Iterator(head_.get()); }
+  Iterator end() const { return Iterator(nullptr); }
+ private:
+  // We maintain a list of Ring buffers, to enable us to grow without copying,
+  // but most of the time we aim to have only one active Ring.
+  std::unique_ptr<Ring> head_;
+  Ring* tail_ = nullptr;
+  size_t size_ = 0;
+  size_t max_size_ = 0;
+  TimeTicks next_resize_time_;
+  DISALLOW_COPY_AND_ASSIGN(LazilyDeallocatedDeque);
+};
+}  // namespace sequence_manager
+}  // namespace base
+#endif  // THIRD_PARTY_BLINK_RENDERER_PLATFORM_SCHEDULER_BASE_LAZILY_DEALLOCATED_DEQUE_H_
--- a/third_party/blink/renderer/platform/scheduler/base/lazily_deallocated_deque_unittest.cc
+++ b/third_party/blink/renderer/platform/scheduler/base/lazily_deallocated_deque_unittest.cc
+// Copyright 2018 The Chromium Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style license that can be
+// found in the LICENSE file.
+#include "third_party/blink/renderer/platform/scheduler/base/lazily_deallocated_deque.h"
+#include "base/time/time_override.h"
+#include "testing/gmock/include/gmock/gmock.h"
+namespace base {
+namespace sequence_manager {
+class LazilyDeallocatedDequeTest : public testing::Test {};
+TEST_F(LazilyDeallocatedDequeTest, InitiallyEmpty) {
+  LazilyDeallocatedDeque<int> d;
+  EXPECT_TRUE(d.empty());
+  EXPECT_EQ(0u, d.size());
+}
+TEST_F(LazilyDeallocatedDequeTest, PushBackAndPopFront1) {
+  LazilyDeallocatedDeque<int> d;
+  d.push_back(123);
+  EXPECT_FALSE(d.empty());
+  EXPECT_EQ(1u, d.size());
+  EXPECT_EQ(123, d.front());
+  d.pop_front();
+  EXPECT_TRUE(d.empty());
+  EXPECT_EQ(0u, d.size());
+}
+TEST_F(LazilyDeallocatedDequeTest, PushBackAndPopFront1000) {
+  LazilyDeallocatedDeque<int> d;
+  for (int i = 0; i < 1000; i++) {
+    d.push_back(i);
+  }
+  EXPECT_EQ(0, d.front());
+  EXPECT_EQ(999, d.back());
+  EXPECT_EQ(1000u, d.size());
+  for (int i = 0; i < 1000; i++) {
+    EXPECT_EQ(i, d.front());
+    d.pop_front();
+  }
+  EXPECT_EQ(0u, d.size());
+}
+TEST_F(LazilyDeallocatedDequeTest, PushFrontBackAndPopFront1) {
+  LazilyDeallocatedDeque<int> d;
+  d.push_front(123);
+  EXPECT_FALSE(d.empty());
+  EXPECT_EQ(1u, d.size());
+  EXPECT_EQ(123, d.front());
+  d.pop_front();
+  EXPECT_TRUE(d.empty());
+  EXPECT_EQ(0u, d.size());
+}
+TEST_F(LazilyDeallocatedDequeTest, PushFrontAndPopFront1000) {
+  LazilyDeallocatedDeque<int> d;
+  for (int i = 0; i < 1000; i++) {
+    d.push_front(i);
+  }
+  EXPECT_EQ(999, d.front());
+  EXPECT_EQ(0, d.back());
+  EXPECT_EQ(1000u, d.size());
+  for (int i = 0; i < 1000; i++) {
+    EXPECT_EQ(999 - i, d.front());
+    d.pop_front();
+  }
+  EXPECT_EQ(0u, d.size());
+}
+TEST_F(LazilyDeallocatedDequeTest, MaybeShrinkQueueWithLargeSizeDrop) {
+  LazilyDeallocatedDeque<int> d;
+  for (int i = 0; i < 1000; i++) {
+    d.push_back(i);
+  }
+  EXPECT_EQ(1000u, d.size());
+  EXPECT_EQ(1020u, d.capacity());
+  EXPECT_EQ(1000u, d.max_size());
+  // Drop most elements.
+  for (int i = 0; i < 990; i++) {
+    d.pop_front();
+  }
+  EXPECT_EQ(10u, d.size());
+  EXPECT_EQ(512u, d.capacity());
+  EXPECT_EQ(1000u, d.max_size());
+  // This won't do anything since the max size is greater than the current
+  // capacity.
+  d.MaybeShrinkQueue();
+  EXPECT_EQ(512u, d.capacity());
+  EXPECT_EQ(10u, d.max_size());
+  // This will shrink because the max size is now much less than the current
+  // capacity.
+  d.MaybeShrinkQueue();
+  EXPECT_EQ(11u, d.capacity());
+}
+TEST_F(LazilyDeallocatedDequeTest, MaybeShrinkQueueWithSmallSizeDrop) {
+  LazilyDeallocatedDeque<int> d;
+  for (int i = 0; i < 1010; i++) {
+    d.push_back(i);
+  }
+  EXPECT_EQ(1010u, d.size());
+  EXPECT_EQ(1020u, d.capacity());
+  EXPECT_EQ(1010u, d.max_size());
+  // Drop a couple of elements.
+  d.pop_front();
+  d.pop_front();
+  EXPECT_EQ(1008u, d.size());
+  EXPECT_EQ(1020u, d.capacity());
+  EXPECT_EQ(1010u, d.max_size());
+  // This won't do anything since the max size is only slightly lower than the
+  // capacity.
+  EXPECT_EQ(1020u, d.capacity());
+  EXPECT_EQ(1010u, d.max_size());
+  // Ditto. Nothing changed so no point shrinking.
+  d.MaybeShrinkQueue();
+  EXPECT_EQ(1008u, d.max_size());
+  EXPECT_EQ(1020u, d.capacity());
+}
+TEST_F(LazilyDeallocatedDequeTest, MaybeShrinkQueueToEmpty) {
+  LazilyDeallocatedDeque<int> d;
+  for (int i = 0; i < 1000; i++) {
+    d.push_front(i);
+  }
+  for (int i = 0; i < 1000; i++) {
+    d.pop_front();
+  }
+  d.MaybeShrinkQueue();
+  EXPECT_EQ(0u, d.max_size());
+  EXPECT_EQ(LazilyDeallocatedDeque<int>::kMinimumRingSize, d.capacity());
+}
+namespace {
+TimeTicks fake_now;
+}
+TEST_F(LazilyDeallocatedDequeTest, MaybeShrinkQueueRateLimiting) {
+  subtle::ScopedTimeClockOverrides time_overrides(
+      nullptr, []() { return fake_now; }, nullptr);
+  LazilyDeallocatedDeque<int> d;
+  for (int i = 0; i < 1000; i++) {
+    d.push_back(i);
+  }
+  EXPECT_EQ(1000u, d.size());
+  EXPECT_EQ(1020u, d.capacity());
+  EXPECT_EQ(1000u, d.max_size());
+  // Drop some elements.
+  for (int i = 0; i < 100; i++) {
+    d.pop_front();
+  }
+  EXPECT_EQ(900u, d.size());
+  EXPECT_EQ(960u, d.capacity());
+  EXPECT_EQ(1000u, d.max_size());
+  // This won't do anything since the max size is greater than the current
+  // capacity.
+  d.MaybeShrinkQueue();
+  EXPECT_EQ(960u, d.capacity());
+  EXPECT_EQ(900u, d.max_size());
+  // This will shrink to fit.
+  d.MaybeShrinkQueue();
+  EXPECT_EQ(901u, d.capacity());
+  EXPECT_EQ(900u, d.max_size());
+  // Drop some more elements.
+  for (int i = 0; i < 100; i++) {
+    d.pop_front();
+  }
+  EXPECT_EQ(800u, d.size());
+  EXPECT_EQ(901u, d.capacity());
+  EXPECT_EQ(900u, d.max_size());
+  // Not enough time has passed so max_size is untouched and not shrunk.
+  d.MaybeShrinkQueue();
+  EXPECT_EQ(900u, d.max_size());
+  EXPECT_EQ(901u, d.capacity());
+  // After time passes we re-sample max_size.
+  fake_now += TimeDelta::FromSeconds(
+      LazilyDeallocatedDeque<int>::kMinimumShrinkIntervalInSeconds);
+  d.MaybeShrinkQueue();
+  EXPECT_EQ(800u, d.max_size());
+  EXPECT_EQ(901u, d.capacity());
+  // And The next call to MaybeShrinkQueue actually shrinks the queue.
+  d.MaybeShrinkQueue();
+  EXPECT_EQ(800u, d.max_size());
+  EXPECT_EQ(801u, d.capacity());
+}
+TEST_F(LazilyDeallocatedDequeTest, Iterators) {
+  LazilyDeallocatedDeque<int> d;
+  d.push_back(1);
+  d.push_back(2);
+  d.push_back(3);
+  auto iter = d.begin();
+  EXPECT_EQ(1, *iter);
+  EXPECT_NE(++iter, d.end());
+  EXPECT_EQ(2, *iter);
+  EXPECT_NE(++iter, d.end());
+  EXPECT_EQ(3, *iter);
+  EXPECT_EQ(++iter, d.end());
+}
+TEST_F(LazilyDeallocatedDequeTest, PushBackAndFront) {
+  LazilyDeallocatedDeque<int> d;
+  int j = 1;
+  for (int i = 0; i < 1000; i++) {
+    d.push_back(j++);
+    d.push_back(j++);
+    d.push_back(j++);
+    d.push_back(j++);
+    d.push_front(-i);
+  }
+  for (int i = -999; i < 4000; i++) {
+    EXPECT_EQ(d.front(), i);
+    d.pop_front();
+  }
+}
+TEST_F(LazilyDeallocatedDequeTest, SetCapacity) {
+  LazilyDeallocatedDeque<int> d;
+  for (int i = 0; i < 1000; i++) {
+    d.push_back(i);
+  }
+  EXPECT_EQ(1020u, d.capacity());
+  // We need 1 more spot than the size due to the way the Ring works.
+  d.SetCapacity(1001);
+  for (int i = 0; i < 1000; i++) {
+    EXPECT_EQ(d.front(), i);
+    d.pop_front();
+  }
+}
+TEST_F(LazilyDeallocatedDequeTest, RingPushFront) {
+  LazilyDeallocatedDeque<int>::Ring r(4);
+  r.push_front(1);
+  r.push_front(2);
+  r.push_front(3);
+  EXPECT_EQ(3, r.front());
+  EXPECT_EQ(1, r.back());
+}
+TEST_F(LazilyDeallocatedDequeTest, RingPushBack) {
+  LazilyDeallocatedDeque<int>::Ring r(4);
+  r.push_back(1);
+  r.push_back(2);
+  r.push_back(3);
+  EXPECT_EQ(1, r.front());
+  EXPECT_EQ(3, r.back());
+}
+TEST_F(LazilyDeallocatedDequeTest, RingCanPush) {
+  LazilyDeallocatedDeque<int>::Ring r1(4);
+  LazilyDeallocatedDeque<int>::Ring r2(4);
+  for (int i = 0; i < 3; i++) {
+    EXPECT_TRUE(r1.CanPush());
+    r1.push_back(0);
+    EXPECT_TRUE(r2.CanPush());
+    r2.push_back(0);
+  }
+  EXPECT_FALSE(r1.CanPush());
+  EXPECT_FALSE(r2.CanPush());
+}
+TEST_F(LazilyDeallocatedDequeTest, RingPushPopPushPop) {
+  LazilyDeallocatedDeque<int>::Ring r(4);
+  EXPECT_FALSE(r.CanPop());
+  EXPECT_TRUE(r.CanPush());
+  r.push_back(1);
+  EXPECT_TRUE(r.CanPop());
+  EXPECT_TRUE(r.CanPush());
+  r.push_back(2);
+  EXPECT_TRUE(r.CanPush());
+  r.push_back(3);
+  EXPECT_FALSE(r.CanPush());
+  EXPECT_TRUE(r.CanPop());
+  EXPECT_EQ(1, r.front());
+  r.pop_front();
+  EXPECT_TRUE(r.CanPop());
+  EXPECT_EQ(2, r.front());
+  r.pop_front();
+  EXPECT_TRUE(r.CanPop());
+  EXPECT_EQ(3, r.front());
+  r.pop_front();
+  EXPECT_FALSE(r.CanPop());
+  EXPECT_TRUE(r.CanPush());
+  r.push_back(10);
+  EXPECT_TRUE(r.CanPush());
+  r.push_back(20);
+  EXPECT_TRUE(r.CanPush());
+  r.push_back(30);
+  EXPECT_FALSE(r.CanPush());
+  EXPECT_TRUE(r.CanPop());
+  EXPECT_EQ(10, r.front());
+  r.pop_front();
+  EXPECT_TRUE(r.CanPop());
+  EXPECT_EQ(20, r.front());
+  r.pop_front();
+  EXPECT_TRUE(r.CanPop());
+  EXPECT_EQ(30, r.front());
+  r.pop_front();
+  EXPECT_FALSE(r.CanPop());
+}
+}  // namespace sequence_manager
+}  // namespace base
--- a/third_party/blink/renderer/platform/scheduler/base/task_queue_impl.cc
+++ b/third_party/blink/renderer/platform/scheduler/base/task_queue_impl.cc
@@ -347,6 +347,10 @@ void TaskQueueImpl::ReloadEmptyImmediateQueue(TaskDeque* queue) {
  AutoLock immediate_incoming_queue_lock(immediate_incoming_queue_lock_);
  queue->swap(immediate_incoming_queue());
+  // Since |immediate_incoming_queue| is empty, now is a good time to consider
+  // reducing it's capacity if we're wasting memory.
+  immediate_incoming_queue().MaybeShrinkQueue();
  // Activate delayed fence if necessary. This is ideologically similar to
  // ActivateDelayedFenceIfNeeded, but due to immediate tasks being posted
  // from any thread we can't generate an enqueue order for the fence there,
@@ -524,6 +528,14 @@ void TaskQueueImpl::AsValueInto(TimeTicks now,
                    main_thread_only().immediate_work_queue->Size());
  state->SetInteger("delayed_work_queue_size",
                    main_thread_only().delayed_work_queue->Size());
+  state->SetInteger("immediate_incoming_queue_capacity",
+                    immediate_incoming_queue().capacity());
+  state->SetInteger("immediate_work_queue_capacity",
+                    immediate_work_queue()->Capacity());
+  state->SetInteger("delayed_work_queue_capacity",
+                    delayed_work_queue()->Capacity());
  if (!main_thread_only().delayed_incoming_queue.empty()) {
    TimeDelta delay_to_next_task =
        (main_thread_only().delayed_incoming_queue.top().delayed_run_time -
@@ -892,6 +904,9 @@ void TaskQueueImpl::SweepCanceledDelayedTasks(TimeTicks now) {
  main_thread_only().delayed_incoming_queue = std::move(remaining_tasks);
+  // Also consider shrinking the work queue if it's wasting memory.
+  main_thread_only().delayed_work_queue->MaybeShrinkQueue();
  LazyNow lazy_now(now);
  UpdateDelayedWakeUp(&lazy_now);
 }

--- a/third_party/blink/renderer/platform/scheduler/base/task_queue_impl.h
+++ b/third_party/blink/renderer/platform/scheduler/base/task_queue_impl.h
@@ -11,7 +11,6 @@
 #include <set>
 #include "base/callback.h"
-#include "base/containers/circular_deque.h"
 #include "base/macros.h"
 #include "base/memory/weak_ptr.h"
 #include "base/message_loop/message_loop.h"
@@ -23,6 +22,7 @@
 #include "third_party/blink/renderer/platform/scheduler/base/enqueue_order.h"
 #include "third_party/blink/renderer/platform/scheduler/base/graceful_queue_shutdown_helper.h"
 #include "third_party/blink/renderer/platform/scheduler/base/intrusive_heap.h"
+#include "third_party/blink/renderer/platform/scheduler/base/lazily_deallocated_deque.h"
 #include "third_party/blink/renderer/platform/scheduler/base/task_queue.h"
 namespace base {
@@ -391,7 +391,7 @@ class PLATFORM_EXPORT TaskQueueImpl {
  // We reserve an inline capacity of 8 tasks to try and reduce the load on
  // PartitionAlloc.
-  using TaskDeque = circular_deque<Task>;
+  using TaskDeque = sequence_manager::LazilyDeallocatedDeque<Task>;
  // Extracts all the tasks from the immediate incoming queue and swaps it with
  // |queue| which must be empty.

--- a/third_party/blink/renderer/platform/scheduler/base/work_queue.cc
+++ b/third_party/blink/renderer/platform/scheduler/base/work_queue.cc
@@ -67,7 +67,7 @@ void WorkQueue::Push(TaskQueueImpl::Task task) {
 #endif
  // Make sure the |enqueue_order()| is monotonically increasing.
-  DCHECK(was_empty || tasks_.rbegin()->enqueue_order() < task.enqueue_order());
+  DCHECK(was_empty || tasks_.back().enqueue_order() < task.enqueue_order());
  // Amoritized O(1).
  tasks_.push_back(std::move(task));
@@ -132,11 +132,18 @@ TaskQueueImpl::Task WorkQueue::TakeTaskFromWorkQueue() {
  TaskQueueImpl::Task pending_task = std::move(tasks_.front());
  tasks_.pop_front();
-  // NB immediate tasks have a different pipeline to delayed ones.
+  if (tasks_.empty()) {
-  if (queue_type_ == QueueType::kImmediate && tasks_.empty()) {
+    // NB delayed tasks are inserted via Push, no don't need to reload those.
-    // Short-circuit the queue reload so that OnPopQueue does the right thing.
+    if (queue_type_ == QueueType::kImmediate) {
-    task_queue_->ReloadEmptyImmediateQueue(&tasks_);
+      // Short-circuit the queue reload so that OnPopQueue does the right
+      // thing.
+      task_queue_->ReloadEmptyImmediateQueue(&tasks_);
+    }
+    // Since the queue is empty, now is a good time to consider reducing it's
+    // capacity if we're wasting memory.
+    tasks_.MaybeShrinkQueue();
  }
  // OnPopQueue calls GetFrontTaskEnqueueOrder which checks BlockedByFence() so
  // we don't need to here.
  work_queue_sets_->OnPopQueue(this);
@@ -153,10 +160,16 @@ bool WorkQueue::RemoveAllCanceledTasksFromFront() {
    task_removed = true;
  }
  if (task_removed) {
-    // NB immediate tasks have a different pipeline to delayed ones.
+    if (tasks_.empty()) {
-    if (queue_type_ == QueueType::kImmediate && tasks_.empty()) {
+      // NB delayed tasks are inserted via Push, no don't need to reload those.
-      // Short-circuit the queue reload so that OnPopQueue does the right thing.
+      if (queue_type_ == QueueType::kImmediate) {
-      task_queue_->ReloadEmptyImmediateQueue(&tasks_);
+        // Short-circuit the queue reload so that OnPopQueue does the right
+        // thing.
+        task_queue_->ReloadEmptyImmediateQueue(&tasks_);
+      }
+      // Since the queue is empty, now is a good time to consider reducing it's
+      // capacity if we're wasting memory.
+      tasks_.MaybeShrinkQueue();
    }
    work_queue_sets_->OnPopQueue(this);
    task_queue_->TraceQueueSize();
@@ -230,6 +243,10 @@ void WorkQueue::PopTaskForTesting() {
  tasks_.pop_front();
 }
+void WorkQueue::MaybeShrinkQueue() {
+  tasks_.MaybeShrinkQueue();
+}
 }  // namespace internal
 }  // namespace sequence_manager
 }  // namespace base
--- a/third_party/blink/renderer/platform/scheduler/base/work_queue.h
+++ b/third_party/blink/renderer/platform/scheduler/base/work_queue.h
@@ -80,6 +80,8 @@ class PLATFORM_EXPORT WorkQueue {
  size_t Size() const { return tasks_.size(); }
+  size_t Capacity() const { return tasks_.capacity(); }
  // Pulls a task off the |tasks_| and informs the WorkQueueSets.  If the
  // task removed had an enqueue order >= the current fence then WorkQueue
  // pretends to be empty as far as the WorkQueueSets is concerned.
@@ -133,6 +135,9 @@ class PLATFORM_EXPORT WorkQueue {
  // Test support function. This should not be used in production code.
  void PopTaskForTesting();
+  // Shrinks |tasks_| if it's wasting memory.
+  void MaybeShrinkQueue();
 private:
  bool InsertFenceImpl(EnqueueOrder fence);