media/gpu/vaapi: batch submitting VABufferIDs for VP9 decoding
Decoding a bitstream in VA has two steps: one, submitting the parsed parameters and encoded chunk, and two, executing the decode. For the first part, ToT VaapiWrapper submits every piece of data individually, acquiring and releasing |va_lock_| every time. This is unnecessary, so this CL refactors the SubmitBuffer() method into a new SubmitBuffer_Locked() and adds a SubmitBuffers() to bundle a couple of submissions together. This is verified via chrome:tracing and codepen.io/full/RwarYvG that plays 4 1280x572 VP9 videos at the same time. Tracing is captured for a few seconds, results are summarised in [1,2], basically: The total decode CPU time doesn't change much on neither kohaku nor Braswell(reks) but batch-submitting takes less time, specially on BSW from ~3x0.089ms= ~0.267ms to ~0.236ms [3], so about 10%. Having less contention in the Lock makes for an ancillary reduction in Execute_Locked() from 3.295 to 3.165ms. Improvements are of course extremely small, the advantages of this CL are in reducing lock/unlock churn and associated contention. This benefit grows with the amount of decodes (e.g. Meet grid scenarios). Later CLs will migrate the other decoders, and possibly avoid the call to vaCreateBuffer() that takes a good 50% of the SubmitBuffer/s() time. [1] Kohaku w/o patch https://imgur.com/a/nVuE0Nk, [2] Kohaku with patch https://imgur.com/a/xhdbqHn [3] VP9 ToT calls SubmitBuffer 3 times per incoming encoded buffer. Bug: b/166646505 Change-Id: I1b8e36bb1d7107b5367b0b41137e2dc6625e1569 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2393629 Commit-Queue: Miguel Casas <mcasas@chromium.org> Reviewed-by:Hirokazu Honda <hiroh@chromium.org> Reviewed-by:
Andres Calderon Jaramillo <andrescj@chromium.org> Cr-Commit-Position: refs/heads/master@{#805300}
Showing
Please register or sign in to comment