gpu: reduce locks in transfer buffer

Currently, gpu::CommandBufferHelper::GetLastState locks a mutex to check the latest token state for every outstanding block in the transfer buffer. Instead, take the lock once, and reuse the cached value when looping over all the blocks. On vmiura's "300 invalidation" repaint test in callgrind on Linux, the number of cycles spent per library changed like this, filtered to RasterBufferImpl::Playback and CommandBufferService::Flush: | gpur | oopr | % diff -------------+--------+--------+------- total cycles | 256.2m | 235.3m | 8.1% libbase.so | 19.9m | 13.1m | 34.3% libgpu.so | 16.1m | 8.7m | 45.9% libpthread | 5.7m | 1.5m | 72.8% The callstack this avoids is: gpu::FencedAllocator::GetLargestFreeSize -> gpu::FencedAllocator::FreeUnused -> gpu::CommandBufferHelper::HasTokenPassed -> gpu::ComamndBufferProxyImpl::GetLastState -> base::internal::LockImpl::Lock -> pthread_mutex_lock Change-Id: Ic7d7a93ff6b6833ce978318d3ad406afeca76c4c Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1725030Reviewed-by: Eric Karl <ericrk@chromium.org> Reviewed-by: Khushal <khushalsagar@chromium.org> Commit-Queue: enne <enne@chromium.org> Cr-Commit-Position: refs/heads/master@{#682870}

gpu: reduce locks in transfer buffer
Currently, gpu::CommandBufferHelper::GetLastState locks a mutex to check the latest token state for every outstanding block in the transfer buffer. Instead, take the lock once, and reuse the cached value when looping over all the blocks. On vmiura's "300 invalidation" repaint test in callgrind on Linux, the number of cycles spent per library changed like this, filtered to RasterBufferImpl::Playback and CommandBufferService::Flush: | gpur | oopr | % diff -------------+--------+--------+------- total cycles | 256.2m | 235.3m | 8.1% libbase.so | 19.9m | 13.1m | 34.3% libgpu.so | 16.1m | 8.7m | 45.9% libpthread | 5.7m | 1.5m | 72.8% The callstack this avoids is: gpu::FencedAllocator::GetLargestFreeSize -> gpu::FencedAllocator::FreeUnused -> gpu::CommandBufferHelper::HasTokenPassed -> gpu::ComamndBufferProxyImpl::GetLastState -> base::internal::LockImpl::Lock -> pthread_mutex_lock Change-Id: Ic7d7a93ff6b6833ce978318d3ad406afeca76c4c Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1725030Reviewed-by: Eric Karl <ericrk@chromium.org> Reviewed-by: Khushal <khushalsagar@chromium.org> Commit-Queue: enne <enne@chromium.org> Cr-Commit-Position: refs/heads/master@{#682870}
f0bdd22d · Adrienne Walker · Commit Bot · dc87c31d · f0bdd22d · f0bdd22d
Commit f0bdd22d authored Jul 31, 2019 by Adrienne Walker Committed by Commit Bot Jul 31, 2019
3 changed files
--- a/gpu/command_buffer/client/cmd_buffer_helper.cc
+++ b/gpu/command_buffer/client/cmd_buffer_helper.cc
@@ -260,8 +260,18 @@ bool CommandBufferHelper::HasTokenPassed(int32_t token) {
  // Don't update state if we don't have to.
  if (token <= cached_last_token_read_)
    return true;
+  RefreshCachedToken();
+  return token <= cached_last_token_read_;
+}
+void CommandBufferHelper::RefreshCachedToken() {
  CommandBuffer::State last_state = command_buffer_->GetLastState();
  UpdateCachedState(last_state);
+}
+bool CommandBufferHelper::HasCachedTokenPassed(int32_t token) {
+  if (token > token_)
+    return true;
  return token <= cached_last_token_read_;
 }

--- a/gpu/command_buffer/client/cmd_buffer_helper.h
+++ b/gpu/command_buffer/client/cmd_buffer_helper.h
@@ -102,11 +102,20 @@ class GPU_EXPORT CommandBufferHelper {
  //   shutdown.
  int32_t InsertToken();
-  // Returns true if the token has passed.
+  // Returns true if the token has passed.  This combines RefreshCachedToken
+  // and HasCachedTokenPassed.  Don't call this function if you have to call
+  // it repeatedly, and instead use those alternative functions.
  // Parameters:
  //   the value of the token to check whether it has passed
  bool HasTokenPassed(int32_t token);
+  // Returns true if the token has passed, but doesn't take a lock and check
+  // for what the latest token state is.
+  bool HasCachedTokenPassed(int32_t token);
+  // Update the state of the latest passed token.
+  void RefreshCachedToken();
  // Waits until the token of a particular value has passed through the command
  // stream (i.e. commands inserted before that token have been executed).
  // NOTE: This will call Flush if it needs to block.

--- a/gpu/command_buffer/client/fenced_allocator.cc
+++ b/gpu/command_buffer/client/fenced_allocator.cc
@@ -220,10 +220,11 @@ FencedAllocator::BlockIndex FencedAllocator::WaitForTokenAndFreeBlock(
 // Frees any blocks pending a token for which the token has been read.
 void FencedAllocator::FreeUnused() {
+  helper_->RefreshCachedToken();
  for (uint32_t i = 0; i < blocks_.size();) {
    Block& block = blocks_[i];
    if (block.state == FREE_PENDING_TOKEN &&
-        helper_->HasTokenPassed(block.token)) {
+        helper_->HasCachedTokenPassed(block.token)) {
      block.state = FREE;
      i = CollapseFreeBlock(i);
    } else {