• Adrienne Walker's avatar
    gpu: reduce locks in transfer buffer · f0bdd22d
    Adrienne Walker authored
    Currently, gpu::CommandBufferHelper::GetLastState locks a mutex to check
    the latest token state for every outstanding block in the transfer
    buffer.  Instead, take the lock once, and reuse the cached value when
    looping over all the blocks.
    
    On vmiura's "300 invalidation" repaint test in callgrind on Linux, the
    number of cycles spent per library changed like this, filtered to
    RasterBufferImpl::Playback and CommandBufferService::Flush:
    
                 |  gpur  |  oopr  | % diff
    -------------+--------+--------+-------
    total cycles | 256.2m | 235.3m |  8.1%
    libbase.so   |  19.9m |  13.1m | 34.3%
    libgpu.so    |  16.1m |   8.7m | 45.9%
    libpthread   |   5.7m |   1.5m | 72.8%
    
    The callstack this avoids is:
    gpu::FencedAllocator::GetLargestFreeSize ->
    gpu::FencedAllocator::FreeUnused ->
    gpu::CommandBufferHelper::HasTokenPassed ->
    gpu::ComamndBufferProxyImpl::GetLastState ->
    base::internal::LockImpl::Lock ->
    pthread_mutex_lock
    
    Change-Id: Ic7d7a93ff6b6833ce978318d3ad406afeca76c4c
    Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1725030Reviewed-by: default avatarEric Karl <ericrk@chromium.org>
    Reviewed-by: default avatarKhushal <khushalsagar@chromium.org>
    Commit-Queue: enne <enne@chromium.org>
    Cr-Commit-Position: refs/heads/master@{#682870}
    f0bdd22d
fenced_allocator.cc 8.51 KB