gpu/command_buffer/client/fenced_allocator.cc · f0bdd22daa38303d523288589002a0538159ade2 · eriksson monteiro / tangled

gpu: reduce locks in transfer buffer · f0bdd22d

Adrienne Walker authored Jul 31, 2019

Currently, gpu::CommandBufferHelper::GetLastState locks a mutex to check
the latest token state for every outstanding block in the transfer
buffer.  Instead, take the lock once, and reuse the cached value when
looping over all the blocks.

On vmiura's "300 invalidation" repaint test in callgrind on Linux, the
number of cycles spent per library changed like this, filtered to
RasterBufferImpl::Playback and CommandBufferService::Flush:

             |  gpur  |  oopr  | % diff
-------------+--------+--------+-------
total cycles | 256.2m | 235.3m |  8.1%
libbase.so   |  19.9m |  13.1m | 34.3%
libgpu.so    |  16.1m |   8.7m | 45.9%
libpthread   |   5.7m |   1.5m | 72.8%

The callstack this avoids is:
gpu::FencedAllocator::GetLargestFreeSize ->
gpu::FencedAllocator::FreeUnused ->
gpu::CommandBufferHelper::HasTokenPassed ->
gpu::ComamndBufferProxyImpl::GetLastState ->
base::internal::LockImpl::Lock ->
pthread_mutex_lock

Change-Id: Ic7d7a93ff6b6833ce978318d3ad406afeca76c4c
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1725030Reviewed-by: Eric Karl <ericrk@chromium.org>
Reviewed-by: Khushal <khushalsagar@chromium.org>
Commit-Queue: enne <enne@chromium.org>
Cr-Commit-Position: refs/heads/master@{#682870}

f0bdd22d

fenced_allocator.cc 8.51 KB

Replace fenced_allocator.cc