Vulkan video shenanigans – FFmpeg + RADV integration experiments

Vulkan video is finally here and it’s a fierce battle to get things working fully. The leaders of the pack right now with the full release is RADV (Dave Airlie) and FFmpeg (Lynne).

In Granite, I’ve been wanting a solid GPU video decoding solution and I figured I’d work on a Vulkan video implementation over the holidays to try helping iron out any kinks with real-world application integration. The goal was achieving everything a 3D engine could potentially want out of video decode.

  • Hardware accelerated
  • GPU decode to RGB without round-trip through system memory (with optional mip generation when placed in a 3D world)
  • Audio decode
  • A/V synchronization

This blog is mostly here to demonstrate the progress in FFmpeg + RADV. I made a neat little sample app that fully uses Vulkan video to do a simple Sponza cinema. It supports A/V sync and seeking, which covers most of what a real media player would need. Ideally, this can be used as a test bench.

Place a video feed as a 3D object inside Sponza, why not?

Introduction blog post – read this first

This blog post by Lynne summarizes the state of Vulkan video at the time it was written. Note that none of this is merged upstream as of writing and APIs are changing rapidly.

Building FFmpeg + RADV + Granite

FFmpeg

Make sure to install the very latest Vulkan headers. On Arch Linux, install vulkan-headers-git from AUR for example.

Check out the branch in the blog and build. Make sure to install it in some throwaway prefix, e.g.

./configure --disable-doc --disable-shared --enable-static --disable-ffplay --disable-ffprobe --enable-vulkan --prefix=$HOME/ffmpeg-vulkan

Mesa

Check out https://gitlab.freedesktop.org/airlied/mesa/-/commits/radv-vulkan-video-decode. Then build with:

mkdir build
cd build
meson setup .. -Dvideo-codecs=h264dec,h265dec --buildtype release
ninja

Granite

git clone https://github.com/Themaister/Granite
cd Granite
git submodule update --init
mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DGRANITE_FFMPEG=ON -DGRANITE_AUDIO=ON -DGRANITE_FFMPEG_VULKAN=ON -G Ninja -DCMAKE_PREFIX_PATH=$HOME/ffmpeg-vulkan
ninja video-player

Running test app

Basic operation, a weird video player where the image is a flat 3D object floating in space. For fun the video is also mip-mapped and the plane is anisotropically filtered, because why not.

RADV_PERFTEST=video_decode GRANITE_FFMPEG_VULKAN=1 ./tests/video-player /tmp/test.mkv

Controls
  • WASD: move camera
  • Arrow keys: rotate camera
  • Space: Toggle pause
  • HJKL: Vim style for seeking

If you have https://github.com/KhronosGroup/glTF-Sample-Models checked out you can add a glTF scene as well for fun. I hacked it together with Sponza in mind, so:

RADV_PERFTEST=video_decode GRANITE_FFMPEG_VULKAN=1 ./tests/video-player $HOME/git/glTF-Sample-Models/2.0/Sponza/glTF/Sponza.gltf /tmp/test.mkv

and then you get the screenshot above with whatever video you’re using 🙂

Integration API

The Granite implementation can be found in https://github.com/Themaister/Granite/blob/master/video/ffmpeg_decode.cpp. It will probably be different in the final upstreamed version, so beware. I’m not an FFmpeg developer either FWIW, so take this implementation with a few grains of salt.

To integrate with Vulkan video, there are some steps we need to take. This assumes some familiarity with FFmpeg APIs. This is mostly interesting for non-FFmpeg developers. I had to figure this out with help from Lynne, spelunking in mpv and looking over the hardware decode samples in FFmpeg upstream.

Creating shared device

Before opening the decode context with:

avcodec_open2(ctx, codec, nullptr)

we will provide libavcodec with a hardware device context.

avcodec_get_hw_config(codec, index)

to scan through until you find a Vulkan configuration.

AVBufferRef *hw_dev = av_hwdevice_ctx_alloc(config->device_type);
auto *hwctx = reinterpret_cast<AVHWDeviceContext *>(hw_dev->data);
auto *vk = static_cast<AVVulkanDeviceContext *>(hwctx->hwctx);

hwctx->user_opaque = this; // For callbacks later.

To interoperate with FFmpeg, we have to provide it our own Vulkan device and lots of information about how we created the device.

vk->get_proc_addr = Vulkan::Context::get_instance_proc_addr();
vk->inst = device->get_instance();
vk->act_dev = device->get_device();
vk->phys_dev = device->get_physical_device();
vk->device_features = *device->get_device_features().pdf2;
vk->enabled_inst_extensions =
  device->get_device_features().instance_extensions;
vk->nb_enabled_inst_extensions =
  int(device->get_device_features().num_instance_extensions);
vk->enabled_dev_extensions =
  device->get_device_features().device_extensions;
vk->nb_enabled_dev_extensions =
  int(device->get_device_features().num_device_extensions);

Fortunately, I had most of this query scaffolding in place for Fossilize integration already. Vulkan 1.3 core is required here as well, so I had to bump that too when Vulkan video is enabled.

auto &q = device->get_queue_info();

vk->queue_family_index =
  int(q.family_indices[Vulkan::QUEUE_INDEX_GRAPHICS]);
vk->queue_family_comp_index =
  int(q.family_indices[Vulkan::QUEUE_INDEX_COMPUTE]);
vk->queue_family_tx_index =
  int(q.family_indices[Vulkan::QUEUE_INDEX_TRANSFER]);
vk->queue_family_decode_index =
  int(q.family_indices[Vulkan::QUEUE_INDEX_VIDEO_DECODE]);

vk->nb_graphics_queues = int(q.counts[Vulkan::QUEUE_INDEX_GRAPHICS]);
vk->nb_comp_queues = int(q.counts[Vulkan::QUEUE_INDEX_COMPUTE]);
vk->nb_tx_queues = int(q.counts[Vulkan::QUEUE_INDEX_TRANSFER]);
vk->nb_decode_queues = int(q.counts[Vulkan::QUEUE_INDEX_VIDEO_DECODE]);

vk->queue_family_encode_index = -1;
vk->nb_encode_queues = 0;

We need to let FFmpeg know about how it can query queues. Close match with Granite, but I had to add some extra APIs to make this work.

We also need a way to lock Vulkan queues:

vk->lock_queue = [](AVHWDeviceContext *ctx, int, int) {
   auto *self = static_cast<Impl *>(ctx->user_opaque);
   self->device->external_queue_lock();
};

vk->unlock_queue = [](AVHWDeviceContext *ctx, int, int) {
   auto *self = static_cast<Impl *>(ctx->user_opaque);
   self->device->external_queue_unlock();
};

For integration purposes, not making vkQueueSubmit internally synchronized in Vulkan was a mistake I think, oh well.

Once we’ve created a hardware context, we can let the codec context borrow it:

hw.device = av_hwdevice_ctx_init(hw_dev); // Unref later.
ctx->hw_device_ctx = av_buffer_ref(hw.device);

We also have to override get_format() and return the hardware pixel format.

ctx->opaque = this;
ctx->get_format = [](
    AVCodecContext *ctx,
    const enum AVPixelFormat *pix_fmts) -> AVPixelFormat {
  auto *self = static_cast<Impl *>(ctx->opaque);
  while (*pix_fmts != AV_PIX_FMT_NONE)
  {
    if (*pix_fmts == self->hw.config->pix_fmt)
      return *pix_fmts;
    pix_fmts++;
  }

  return AV_PIX_FMT_NONE;
};

This will work, but we’re also supposed to create a frames context before returning from get_format(). This also lets us configure how Vulkan images are created.

int ret = avcodec_get_hw_frames_parameters(
      ctx, ctx->hw_device_ctx,
      AV_PIX_FMT_VULKAN, &ctx->hw_frames_ctx);
// Check error.

auto *frames =
  reinterpret_cast<AVHWFramesContext *>(ctx->hw_frames_ctx->data);
auto *vk = static_cast<AVVulkanFramesContext *>(frames->hwctx);

vk->img_flags |= VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT;

ret = av_hwframe_ctx_init(ctx->hw_frames_ctx);
// Check error.

The primary motivation for overriding image creation was that I wanted to do YCbCr to RGB conversion in a more unified way, i.e. using individual planes. That would be compatible with non-Vulkan video as well, but taking plane views of an image requires VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT.

Using per-plane views is important, as we’ll see later. YCbCr samplers fall flat when dealing with practical video use cases.

Processing AVFrames

In FFmpeg, decoding works by sending AVPackets to a codec and it spits out AVFrame objects. If these frames are emitted by a software codec, we just poke at AVFrame::data[] directly, but with hardware decoders, AVFrame::pix_fmt is an opaque type.

There are two ways we can deal with this. For non-Vulkan hardware decoders, just read-back and upload planes to a VkBuffer staging buffer later, ewwww.

AVFrame *sw_frame = av_frame_alloc();

if (av_hwframe_transfer_data(sw_frame, av_frame, 0) < 0)
{
   LOGE("Failed to transfer HW frame.\n");
   av_frame_free(&sw_frame);
   av_frame_free(&av_frame);
}
else
{
   sw_frame->pts = av_frame->pts;
   av_frame_free(&av_frame);
   av_frame = sw_frame;
}

Each hardware pixel format lets you reinterpret AVFrame::data[] in a “magical” way if you’re willing to poke into low-level data structures. For VAAPI, VDPAU and APIs like that there are ways to use buffer sharing somehow, but the details are extremely hairy and is best left to experts. For Vulkan, we don’t even need external memory!

First, we need to extract the decode format:

auto *frames =
  reinterpret_cast<AVHWFramesContext *>(ctx->hw_frames_ctx->data);
active_upload_pix_fmt = frames->sw_format;

Then we can query the VkFormat if we want to stay multi-plane.

auto *hwdev =
  reinterpret_cast<AVHWDeviceContext *>(hw.device->data);
const VkFormat *fmts = nullptr;
VkImageAspectFlags aspects;
VkImageUsageFlags usage;
int nb_images;

int ret = av_vkfmt_from_pixfmt2(hwdev, active_upload_pix_fmt,
                                VK_IMAGE_USAGE_SAMPLED_BIT, &fmts,
                                &nb_images, &aspects, &usage);

However, this has some pitfalls in practice. Video frames tend to be aligned to a macro-block size or similar, meaning that the VkImage dimension might not be equal to the actual size we’re supposed to display. Even 1080p falls in this category for example since 1080 does not cleanly divide into 16×16 macro blocks. The only way to resolve this without extra copies is to view planes separately with VK_IMAGE_ASPECT_PLANE_n_BIT and do texture coordinate clamping manually. This way we avoid sampling garbage when converting to RGB. av_vkfmt_from_pixfmt can help here to deduce the per-plane Vulkan formats, but I just did it manually either way.

// Real output size.
ubo.resolution = uvec2(video.av_ctx->width, video.av_ctx->height);

if (video.av_ctx->hw_frames_ctx && hw.config &&
    hw.config->device_type == AV_HWDEVICE_TYPE_VULKAN)
{
   // Frames (VkImages) may be padded.
   auto *frames = reinterpret_cast<AVHWFramesContext *>(
       video.av_ctx->hw_frames_ctx->data);
   ubo.inv_resolution = vec2(
       1.0f / float(frames->width),
       1.0f / float(frames->height));
}
else
{
   ubo.inv_resolution = vec2(1.0f / float(video.av_ctx->width),
                             1.0f / float(video.av_ctx->height));
}

// Have to emulate CLAMP_TO_EDGE to avoid filtering against garbage.
ubo.chroma_clamp =
  (vec2(ubo.resolution) - 0.5f * float(1u << plane_subsample_log2[1])) *
  ubo.inv_resolution;

Processing the frame itself starts with magic casts:

auto *frames =
  reinterpret_cast<AVHWFramesContext *>(ctx->hw_frames_ctx->data);
auto *vk = static_cast<AVVulkanFramesContext *>(frames->hwctx);
auto *vk_frame = reinterpret_cast<AVVkFrame *>(av_frame->data[0]);

We have to lock the frame while accessing it, FFmpeg is threaded.

vk->lock_frame(frames, vk_frame);
// Do stuff
vk->unlock_frame(frames, vk_frame);

Now, we have to wait on the timeline semaphore (note that Vulkan 1.3 is required, so this is guaranteed to be supported).

// Acquire the image from FFmpeg.
if (vk_frame->sem[0] != VK_NULL_HANDLE && vk_frame->sem_value[0])
{
   // vkQueueSubmit(wait = sem[0], value = sem_value[0])
}

Create a VkImageView from the provided image. Based on av_vkfmt_from_pixfmt2 or per-plane formats from earlier, we know the appropriate Vulkan format to use when creating a view.

Queue family ownership transfer is not needed. FFmpeg uses CONCURRENT for sake of our sanity.

Transition the layout:

cmd->image_barrier(
    *wrapped_image,
    vk_frame->layout[0],
    VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL,
    VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT /* sem wait stage */, 0,
    VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT,
    VK_ACCESS_2_SHADER_SAMPLED_READ_BIT);

vk_frame->layout[0] = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;

Now, we can convert this to RGB as we desire. I went with an async compute formulation. If this were a pure video player we could probably blit this directly to screen with some fancy scaling filters.

When we’re done, we have to “release” the image back to FFmpeg.

// Release the image back to FFmpeg.
if (vk_frame->sem[0] != VK_NULL_HANDLE)
{
   vk_frame->sem_value[0] += 1;
   // vkQueueSubmit(signal = sem[0], value = sem_value[0]);
}

And that’s it!

Test results

I tried various codec configurations to see state of things.

RADV
  • H.264 – 8bit: Works
  • H.264 – 10bit: Not supported by hardware
  • H.265 – 8bit: Works
  • H.265 – 10bit: Works
nvidia
  • H.264: Broken
  • H.265: Seems to work
ANV

There’s a preliminary branch by Airlie again, but it doesn’t seem to have been updated for final spec yet.

Conclusion

Exciting times for Vulkan video. The API is ridiculously low level and way too complicated for mere graphics programming mortals, which is why having first class support in FFmpeg and friends will be so important to make the API usable.