Commit 49dc6464 authored by Miguel Casas's avatar Miguel Casas Committed by Commit Bot

vaapi: decode on client NativePixmaps (reduced)

crrev.com/c/1171566 got reverted due to a mishmash of performance
improvements and regressions.  Instead of relanding it verbatim, the
innocent/innocuous parts were landed in crrev.com/c/1180295; this CL
gets the tough parts, including restricting it to VP9 and Kabylakes.

I also repeated the power consumption results (https://goo.gl/r88qPf),
and in a nutshell I see improvements:
w/ this CL:  1.88   0.63   0.41  0.52  (pkg - pp1 - gfx - dram)
ToT       :  2.72   0.76   1.03  0.57
Savings   : 30.78% 17.51% 59.95% 7.60%

TBR=dcastagna@, hoegsberg@

Original CL description -----------------------------------------------
vaapi: decode on client NativePixmaps

This CL teaches VaVDA to use client VASurfaceIDs to decode onto, saving
a buffer copy and removing a costly blit (DownloadFromSurface) on the
GPU Main thread.  Three groups of changes:

1. In AssignPictureBuffers(): if |vaapi_picture_factory_| Create()s a
 VaapiPicture with a VASurfaceID, we use those to decode onto and set a new
 flag: |decode_using_client_picture_buffers_|

2. When the decoder calls CreateVASurface(), instead of giving it the first
 |available_va_surfaces_|, we need to figure out the first VASurfaceID in
 this |available_va_surfaces_| such that the corresponding VaapiPicture in
 |pictures_| is available (i.e. in |available_picture_buffers_|).
 Reason: libva holds on to some VASurfaceIDs, there's no simple one-to-one
 correspondence like on ToT.

3. When we're ready to OutputPicture() to |client_|, instead of using the first
 |available_picture_buffers_|, we find the corresponding one for the
 passed |va_surface_id| (we could search all over |pictures_| but there's
 always less |available_picture_buffers_|.

Some other notable developments:

- s/output_buffers_/available_picture_buffers_/
- OutputPicture() loses a parameter.
- |decode_using_client_picture_buffers_| is still false for e.g. VP9 Profile 2
 or the ImportBufferForPicture() path.

Bug: 822346, 717265

Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel
Change-Id: Icf1a9a1fa71264a7367fa351e3b7d28fab346744
Reviewed-on: https://chromium-review.googlesource.com/1181464Reviewed-by: default avatarMiguel Casas <mcasas@chromium.org>
Commit-Queue: Miguel Casas <mcasas@chromium.org>
Cr-Commit-Position: refs/heads/master@{#584553}
parent 8fc032a3
...@@ -11,6 +11,7 @@ ...@@ -11,6 +11,7 @@
#include <va/va.h> #include <va/va.h>
#include "base/bind.h" #include "base/bind.h"
#include "base/cpu.h"
#include "base/files/scoped_file.h" #include "base/files/scoped_file.h"
#include "base/logging.h" #include "base/logging.h"
#include "base/macros.h" #include "base/macros.h"
...@@ -71,6 +72,17 @@ void CloseGpuMemoryBufferHandle(const gfx::GpuMemoryBufferHandle& handle) { ...@@ -71,6 +72,17 @@ void CloseGpuMemoryBufferHandle(const gfx::GpuMemoryBufferHandle& handle) {
} }
#endif #endif
// Returns true if the CPU is an Intel Kaby Lake or later.
bool IsKabyLakeOrLater() {
constexpr int kPentiumAndLaterFamily = 0x06;
constexpr int kFirstKabyLakeModelId = 0x8E;
static base::CPU cpuid;
static bool is_kaby_lake_or_later =
cpuid.family() == kPentiumAndLaterFamily &&
cpuid.model() >= kFirstKabyLakeModelId;
return is_kaby_lake_or_later;
}
} // namespace } // namespace
#define RETURN_AND_NOTIFY_ON_FAILURE(result, log, error_code, ret) \ #define RETURN_AND_NOTIFY_ON_FAILURE(result, log, error_code, ret) \
...@@ -135,6 +147,7 @@ VaapiVideoDecodeAccelerator::VaapiVideoDecodeAccelerator( ...@@ -135,6 +147,7 @@ VaapiVideoDecodeAccelerator::VaapiVideoDecodeAccelerator(
input_ready_(&lock_), input_ready_(&lock_),
vaapi_picture_factory_(new VaapiPictureFactory()), vaapi_picture_factory_(new VaapiPictureFactory()),
surfaces_available_(&lock_), surfaces_available_(&lock_),
decode_using_client_picture_buffers_(false),
task_runner_(base::ThreadTaskRunnerHandle::Get()), task_runner_(base::ThreadTaskRunnerHandle::Get()),
decoder_thread_("VaapiDecoderThread"), decoder_thread_("VaapiDecoderThread"),
num_frames_at_client_(0), num_frames_at_client_(0),
...@@ -209,15 +222,36 @@ bool VaapiVideoDecodeAccelerator::Initialize(const Config& config, ...@@ -209,15 +222,36 @@ bool VaapiVideoDecodeAccelerator::Initialize(const Config& config,
void VaapiVideoDecodeAccelerator::OutputPicture( void VaapiVideoDecodeAccelerator::OutputPicture(
const scoped_refptr<VASurface>& va_surface, const scoped_refptr<VASurface>& va_surface,
int32_t input_id, int32_t input_id,
gfx::Rect visible_rect, gfx::Rect visible_rect) {
VaapiPicture* picture) {
DCHECK(task_runner_->BelongsToCurrentThread()); DCHECK(task_runner_->BelongsToCurrentThread());
int32_t output_id = picture->picture_buffer_id(); const VASurfaceID va_surface_id = va_surface->id();
VaapiPicture* picture = nullptr;
{
base::AutoLock auto_lock(lock_);
int32_t picture_buffer_id = available_picture_buffers_.front();
if (decode_using_client_picture_buffers_) {
// Find the |pictures_| entry matching |va_surface_id|.
for (const auto& id_and_picture : pictures_) {
if (id_and_picture.second->va_surface_id() == va_surface_id) {
picture_buffer_id = id_and_picture.first;
break;
}
}
}
picture = pictures_[picture_buffer_id].get();
DCHECK(base::ContainsValue(available_picture_buffers_, picture_buffer_id));
base::Erase(available_picture_buffers_, picture_buffer_id);
}
DCHECK(picture) << " could not find " << va_surface_id << " available";
const int32_t output_id = picture->picture_buffer_id();
DVLOGF(4) << "Outputting VASurface " << va_surface->id() DVLOGF(4) << "Outputting VASurface " << va_surface->id()
<< " into pixmap bound to picture buffer id " << output_id; << " into pixmap bound to picture buffer id " << output_id;
{
if (!decode_using_client_picture_buffers_) {
TRACE_EVENT2("media,gpu", "VAVDA::DownloadFromSurface", "input_id", TRACE_EVENT2("media,gpu", "VAVDA::DownloadFromSurface", "input_id",
input_id, "output_id", output_id); input_id, "output_id", output_id);
RETURN_AND_NOTIFY_ON_FAILURE(picture->DownloadFromSurface(va_surface), RETURN_AND_NOTIFY_ON_FAILURE(picture->DownloadFromSurface(va_surface),
...@@ -237,24 +271,19 @@ void VaapiVideoDecodeAccelerator::OutputPicture( ...@@ -237,24 +271,19 @@ void VaapiVideoDecodeAccelerator::OutputPicture(
} }
} }
void VaapiVideoDecodeAccelerator::TryOutputSurface() { void VaapiVideoDecodeAccelerator::TryOutputPicture() {
DCHECK(task_runner_->BelongsToCurrentThread()); DCHECK(task_runner_->BelongsToCurrentThread());
// Handle Destroy() arriving while pictures are queued for output. // Handle Destroy() arriving while pictures are queued for output.
if (!client_) if (!client_)
return; return;
if (pending_output_cbs_.empty() || output_buffers_.empty()) if (pending_output_cbs_.empty() || available_picture_buffers_.empty())
return; return;
OutputCB output_cb = std::move(pending_output_cbs_.front()); auto output_cb = std::move(pending_output_cbs_.front());
pending_output_cbs_.pop(); pending_output_cbs_.pop();
std::move(output_cb).Run();
DCHECK(base::ContainsKey(pictures_, output_buffers_.front()));
VaapiPicture* picture = pictures_[output_buffers_.front()].get();
output_buffers_.pop();
output_cb.Run(picture);
if (finish_flush_pending_ && pending_output_cbs_.empty()) if (finish_flush_pending_ && pending_output_cbs_.empty())
FinishFlush(); FinishFlush();
...@@ -550,6 +579,8 @@ void VaapiVideoDecodeAccelerator::RecycleVASurfaceID( ...@@ -550,6 +579,8 @@ void VaapiVideoDecodeAccelerator::RecycleVASurfaceID(
available_va_surfaces_.push_back(va_surface_id); available_va_surfaces_.push_back(va_surface_id);
surfaces_available_.Signal(); surfaces_available_.Signal();
TryOutputPicture();
} }
void VaapiVideoDecodeAccelerator::AssignPictureBuffers( void VaapiVideoDecodeAccelerator::AssignPictureBuffers(
...@@ -558,8 +589,7 @@ void VaapiVideoDecodeAccelerator::AssignPictureBuffers( ...@@ -558,8 +589,7 @@ void VaapiVideoDecodeAccelerator::AssignPictureBuffers(
base::AutoLock auto_lock(lock_); base::AutoLock auto_lock(lock_);
DCHECK(pictures_.empty()); DCHECK(pictures_.empty());
while (!output_buffers_.empty()) available_picture_buffers_.clear();
output_buffers_.pop();
RETURN_AND_NOTIFY_ON_FAILURE( RETURN_AND_NOTIFY_ON_FAILURE(
buffers.size() >= requested_num_pics_, buffers.size() >= requested_num_pics_,
...@@ -569,11 +599,6 @@ void VaapiVideoDecodeAccelerator::AssignPictureBuffers( ...@@ -569,11 +599,6 @@ void VaapiVideoDecodeAccelerator::AssignPictureBuffers(
const unsigned int va_format = GetVaFormatForVideoCodecProfile(profile_); const unsigned int va_format = GetVaFormatForVideoCodecProfile(profile_);
std::vector<VASurfaceID> va_surface_ids; std::vector<VASurfaceID> va_surface_ids;
RETURN_AND_NOTIFY_ON_FAILURE(
vaapi_wrapper_->CreateSurfaces(va_format, requested_pic_size_,
buffers.size(), &va_surface_ids),
"Failed creating VA Surfaces", PLATFORM_FAILURE, );
DCHECK_EQ(va_surface_ids.size(), buffers.size());
for (size_t i = 0; i < buffers.size(); ++i) { for (size_t i = 0; i < buffers.size(); ++i) {
DCHECK(requested_pic_size_ == buffers[i].size()); DCHECK(requested_pic_size_ == buffers[i].size());
...@@ -587,16 +612,42 @@ void VaapiVideoDecodeAccelerator::AssignPictureBuffers( ...@@ -587,16 +612,42 @@ void VaapiVideoDecodeAccelerator::AssignPictureBuffers(
RETURN_AND_NOTIFY_ON_FAILURE( RETURN_AND_NOTIFY_ON_FAILURE(
picture->Allocate(vaapi_picture_factory_->GetBufferFormat()), picture->Allocate(vaapi_picture_factory_->GetBufferFormat()),
"Failed to allocate memory for a VaapiPicture", PLATFORM_FAILURE, ); "Failed to allocate memory for a VaapiPicture", PLATFORM_FAILURE, );
output_buffers_.push(buffers[i].id()); available_picture_buffers_.push_back(buffers[i].id());
VASurfaceID va_surface_id = picture->va_surface_id();
if (va_surface_id != VA_INVALID_ID)
va_surface_ids.push_back(va_surface_id);
} }
DCHECK(!base::ContainsKey(pictures_, buffers[i].id())); DCHECK(!base::ContainsKey(pictures_, buffers[i].id()));
pictures_[buffers[i].id()] = std::move(picture); pictures_[buffers[i].id()] = std::move(picture);
available_va_surfaces_.push_back(va_surface_ids[i]);
surfaces_available_.Signal(); surfaces_available_.Signal();
} }
decode_using_client_picture_buffers_ = !va_surface_ids.empty() &&
IsKabyLakeOrLater() &&
profile_ == VP9PROFILE_PROFILE0;
// If we have some |va_surface_ids|, use them for decode, otherwise ask
// |vaapi_wrapper_| to allocate them for us.
if (decode_using_client_picture_buffers_) {
RETURN_AND_NOTIFY_ON_FAILURE(
vaapi_wrapper_->CreateContext(va_format, requested_pic_size_,
va_surface_ids),
"Failed creating VA Context", PLATFORM_FAILURE, );
} else {
va_surface_ids.clear();
RETURN_AND_NOTIFY_ON_FAILURE(
vaapi_wrapper_->CreateSurfaces(va_format, requested_pic_size_,
buffers.size(), &va_surface_ids),
"Failed creating VA Surfaces", PLATFORM_FAILURE, );
}
DCHECK_EQ(va_surface_ids.size(), buffers.size());
for (const auto id : va_surface_ids)
available_va_surfaces_.push_back(id);
// Resume DecodeTask if it is still in decoding state. // Resume DecodeTask if it is still in decoding state.
if (state_ == kDecoding) { if (state_ == kDecoding) {
decoder_thread_task_runner_->PostTask( decoder_thread_task_runner_->PostTask(
...@@ -667,8 +718,11 @@ void VaapiVideoDecodeAccelerator::ReusePictureBuffer( ...@@ -667,8 +718,11 @@ void VaapiVideoDecodeAccelerator::ReusePictureBuffer(
--num_frames_at_client_; --num_frames_at_client_;
TRACE_COUNTER1("media,gpu", "Vaapi frames at client", num_frames_at_client_); TRACE_COUNTER1("media,gpu", "Vaapi frames at client", num_frames_at_client_);
output_buffers_.push(picture_buffer_id); {
TryOutputSurface(); base::AutoLock auto_lock(lock_);
available_picture_buffers_.push_back(picture_buffer_id);
}
TryOutputPicture();
} }
void VaapiVideoDecodeAccelerator::FlushTask() { void VaapiVideoDecodeAccelerator::FlushTask() {
...@@ -884,7 +938,7 @@ void VaapiVideoDecodeAccelerator::VASurfaceReady( ...@@ -884,7 +938,7 @@ void VaapiVideoDecodeAccelerator::VASurfaceReady(
base::Bind(&VaapiVideoDecodeAccelerator::OutputPicture, weak_this_, base::Bind(&VaapiVideoDecodeAccelerator::OutputPicture, weak_this_,
va_surface, bitstream_id, visible_rect)); va_surface, bitstream_id, visible_rect));
TryOutputSurface(); TryOutputPicture();
} }
scoped_refptr<VASurface> VaapiVideoDecodeAccelerator::CreateVASurface() { scoped_refptr<VASurface> VaapiVideoDecodeAccelerator::CreateVASurface() {
...@@ -895,12 +949,32 @@ scoped_refptr<VASurface> VaapiVideoDecodeAccelerator::CreateVASurface() { ...@@ -895,12 +949,32 @@ scoped_refptr<VASurface> VaapiVideoDecodeAccelerator::CreateVASurface() {
return nullptr; return nullptr;
DCHECK(!awaiting_va_surfaces_recycle_); DCHECK(!awaiting_va_surfaces_recycle_);
scoped_refptr<VASurface> va_surface(new VASurface( if (!decode_using_client_picture_buffers_) {
available_va_surfaces_.front(), requested_pic_size_, const VASurfaceID id = available_va_surfaces_.front();
vaapi_wrapper_->va_surface_format(), va_surface_release_cb_)); available_va_surfaces_.pop_front();
available_va_surfaces_.pop_front(); return new VASurface(id, requested_pic_size_,
vaapi_wrapper_->va_surface_format(),
return va_surface; va_surface_release_cb_);
}
// Find the first |available_va_surfaces_| id such that the associated
// |pictures_| entry is marked as |available_picture_buffers_|. In practice,
// we will quickly find an available |va_surface_id|.
for (const VASurfaceID va_surface_id : available_va_surfaces_) {
for (const auto& id_and_picture : pictures_) {
if (id_and_picture.second->va_surface_id() == va_surface_id &&
base::ContainsValue(available_picture_buffers_,
id_and_picture.first)) {
// Remove |va_surface_id| from the list of availables, and use the id
// to return a new VASurface.
base::Erase(available_va_surfaces_, va_surface_id);
return new VASurface(va_surface_id, requested_pic_size_,
vaapi_wrapper_->va_surface_format(),
va_surface_release_cb_);
}
}
}
return nullptr;
} }
// static // static
......
...@@ -159,18 +159,16 @@ class MEDIA_GPU_EXPORT VaapiVideoDecodeAccelerator ...@@ -159,18 +159,16 @@ class MEDIA_GPU_EXPORT VaapiVideoDecodeAccelerator
// or return false on failure. // or return false on failure.
bool InitializeFBConfig(); bool InitializeFBConfig();
// Callback to be executed once we have a |va_surface| to be output and // Callback to be executed once we have a |va_surface| to be output and an
// an available |picture| to use for output. // available VaapiPicture in |available_picture_buffers_| for output. Puts
// Puts contents of |va_surface| into given |picture|, releases the surface // contents of |va_surface| into the latter, releases the surface and passes
// and passes the resulting picture to client to output the given // the resulting picture to |client_| along with |visible_rect|.
// |visible_rect| part of it.
void OutputPicture(const scoped_refptr<VASurface>& va_surface, void OutputPicture(const scoped_refptr<VASurface>& va_surface,
int32_t input_id, int32_t input_id,
gfx::Rect visible_rect, gfx::Rect visible_rect);
VaapiPicture* picture);
// Try to OutputPicture() if we have both a ready surface and picture. // Try to OutputPicture() if we have both a ready surface and picture.
void TryOutputSurface(); void TryOutputPicture();
// Called when a VASurface is no longer in use by the decoder or is not being // Called when a VASurface is no longer in use by the decoder or is not being
// synced/waiting to be synced to a picture. Returns it to available surfaces // synced/waiting to be synced to a picture. Returns it to available surfaces
...@@ -198,12 +196,13 @@ class MEDIA_GPU_EXPORT VaapiVideoDecodeAccelerator ...@@ -198,12 +196,13 @@ class MEDIA_GPU_EXPORT VaapiVideoDecodeAccelerator
kDestroying, kDestroying,
}; };
// Protects input buffer and surface queues and state_. // |lock_| protects |input_buffers_|, |curr_input_buffer_|, |state_| and
// |available_picture_buffers_|.
base::Lock lock_; base::Lock lock_;
State state_; State state_;
Config::OutputMode output_mode_; Config::OutputMode output_mode_;
// Queue of available InputBuffers (picture_buffer_ids). // Queue of available InputBuffers.
base::queue<std::unique_ptr<InputBuffer>> input_buffers_; base::queue<std::unique_ptr<InputBuffer>> input_buffers_;
// Signalled when input buffers are queued onto |input_buffers_| queue. // Signalled when input buffers are queued onto |input_buffers_| queue.
base::ConditionVariable input_ready_; base::ConditionVariable input_ready_;
...@@ -211,9 +210,9 @@ class MEDIA_GPU_EXPORT VaapiVideoDecodeAccelerator ...@@ -211,9 +210,9 @@ class MEDIA_GPU_EXPORT VaapiVideoDecodeAccelerator
// Current input buffer at decoder. // Current input buffer at decoder.
std::unique_ptr<InputBuffer> curr_input_buffer_; std::unique_ptr<InputBuffer> curr_input_buffer_;
// Queue for incoming output buffers (texture ids). // List of PictureBuffer ids available to be sent to |client_| via
using OutputBuffers = base::queue<int32_t>; // OutputPicture() (|client_| returns them via ReusePictureBuffer()).
OutputBuffers output_buffers_; std::list<int32_t> available_picture_buffers_;
std::unique_ptr<VaapiPictureFactory> vaapi_picture_factory_; std::unique_ptr<VaapiPictureFactory> vaapi_picture_factory_;
...@@ -227,11 +226,10 @@ class MEDIA_GPU_EXPORT VaapiVideoDecodeAccelerator ...@@ -227,11 +226,10 @@ class MEDIA_GPU_EXPORT VaapiVideoDecodeAccelerator
// pictures are destroyed before this is destroyed. // pictures are destroyed before this is destroyed.
base::small_map<std::map<int32_t, std::unique_ptr<VaapiPicture>>> pictures_; base::small_map<std::map<int32_t, std::unique_ptr<VaapiPicture>>> pictures_;
// VA Surfaces no longer in use that can be passed back to the decoder for // VASurfaceIDs no longer in use that can be passed back to |decoder_| for
// reuse, once it requests them. // reuse, once it requests them.
std::list<VASurfaceID> available_va_surfaces_; std::list<VASurfaceID> available_va_surfaces_;
// Signalled when output surfaces are queued onto the available_va_surfaces_ // Signalled when output surfaces are queued into |available_va_surfaces_|.
// queue.
base::ConditionVariable surfaces_available_; base::ConditionVariable surfaces_available_;
// Pending output requests from the decoder. When it indicates that we should // Pending output requests from the decoder. When it indicates that we should
...@@ -240,11 +238,13 @@ class MEDIA_GPU_EXPORT VaapiVideoDecodeAccelerator ...@@ -240,11 +238,13 @@ class MEDIA_GPU_EXPORT VaapiVideoDecodeAccelerator
// will put the contents of the surface into the picture and return it to // will put the contents of the surface into the picture and return it to
// the client, releasing the surface as well. // the client, releasing the surface as well.
// If we don't have any available |pictures_| at the time when the decoder // If we don't have any available |pictures_| at the time when the decoder
// requests output, we'll store the request on pending_output_cbs_ queue for // requests output, we'll store the request in this queue for later and run it
// later and run it once the client gives us more textures // once the client gives us more textures via ReusePictureBuffer().
// via ReusePictureBuffer(). base::queue<base::OnceClosure> pending_output_cbs_;
using OutputCB = base::Callback<void(VaapiPicture*)>;
base::queue<OutputCB> pending_output_cbs_; // Under some circumstances, we can pass to libva our own VASurfaceIDs to
// decode onto, which skips one copy.
bool decode_using_client_picture_buffers_;
// ChildThread's task runner. // ChildThread's task runner.
const scoped_refptr<base::SingleThreadTaskRunner> task_runner_; const scoped_refptr<base::SingleThreadTaskRunner> task_runner_;
......
...@@ -233,7 +233,6 @@ class VaapiVideoDecodeAcceleratorTest : public TestWithParam<VideoCodecProfile>, ...@@ -233,7 +233,6 @@ class VaapiVideoDecodeAcceleratorTest : public TestWithParam<VideoCodecProfile>,
ASSERT_TRUE(vda_.curr_input_buffer_) ASSERT_TRUE(vda_.curr_input_buffer_)
<< "QueueInputBuffer() should have been called"; << "QueueInputBuffer() should have been called";
::testing::InSequence s;
base::RunLoop run_loop; base::RunLoop run_loop;
base::Closure quit_closure = run_loop.QuitClosure(); base::Closure quit_closure = run_loop.QuitClosure();
...@@ -249,6 +248,7 @@ class VaapiVideoDecodeAcceleratorTest : public TestWithParam<VideoCodecProfile>, ...@@ -249,6 +248,7 @@ class VaapiVideoDecodeAcceleratorTest : public TestWithParam<VideoCodecProfile>,
MockCreateVaapiPicture(mock_vaapi_wrapper_.get(), picture_size)) MockCreateVaapiPicture(mock_vaapi_wrapper_.get(), picture_size))
.Times(num_pictures); .Times(num_pictures);
::testing::InSequence s;
EXPECT_CALL(*mock_decoder_, Decode()) EXPECT_CALL(*mock_decoder_, Decode())
.WillOnce(Return(AcceleratedVideoDecoder::kRanOutOfStreamData)); .WillOnce(Return(AcceleratedVideoDecoder::kRanOutOfStreamData));
EXPECT_CALL(*this, NotifyEndOfBitstreamBuffer(bitstream_id)) EXPECT_CALL(*this, NotifyEndOfBitstreamBuffer(bitstream_id))
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment