gpu: reduce locks in transfer buffer
Currently, gpu::CommandBufferHelper::GetLastState locks a mutex to check the latest token state for every outstanding block in the transfer buffer. Instead, take the lock once, and reuse the cached value when looping over all the blocks. On vmiura's "300 invalidation" repaint test in callgrind on Linux, the number of cycles spent per library changed like this, filtered to RasterBufferImpl::Playback and CommandBufferService::Flush: | gpur | oopr | % diff -------------+--------+--------+------- total cycles | 256.2m | 235.3m | 8.1% libbase.so | 19.9m | 13.1m | 34.3% libgpu.so | 16.1m | 8.7m | 45.9% libpthread | 5.7m | 1.5m | 72.8% The callstack this avoids is: gpu::FencedAllocator::GetLargestFreeSize -> gpu::FencedAllocator::FreeUnused -> gpu::CommandBufferHelper::HasTokenPassed -> gpu::ComamndBufferProxyImpl::GetLastState -> base::internal::LockImpl::Lock -> pthread_mutex_lock Change-Id: Ic7d7a93ff6b6833ce978318d3ad406afeca76c4c Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1725030Reviewed-by:Eric Karl <ericrk@chromium.org> Reviewed-by:
Khushal <khushalsagar@chromium.org> Commit-Queue: enne <enne@chromium.org> Cr-Commit-Position: refs/heads/master@{#682870}
Showing
Please register or sign in to comment