Important Vulkan tips

April 12, 2016

After 1 week of playing around with Vulkan, here are 4 tips I've learned. These are maybe not very obvious (some aren't clearly documented), but I think they're important to know.

1. Use buffers as staging areas to initialize images

The crux of this is we need to use a function, vkCmdCopyBufferToImage, to implement staging images with Vulkan. It's almost impossible to do any graphics work with Vulkan without doing this -- but a first glance it might seem a bit counter-intuitive. First a bit of a background.

Vulkan images have a "tiling" property, which can be either "linear" or "optimal." This is property relates to how pixels are arranged in memory within a single mip level.

In linear tiling, texels are arranged in row by row, column by column. So a texel's linear address follows this pattern:

address = (y*rowPitch) + x*bitsPerPixel/8

When we have images on disk, we usually expect them to have this tiling.

However, this has a big problem. Texels in the same row will be near each other in memory. But texels in the same column will be quite separate in memory.

Given that images can be mapped arbitrarily on the final 2D triangles, we can't control the order in which the GPU will access the texels. So this is big issue for the memory cache.

So all GPUs are able to rearrange the texels into a "swizzled" order that tries to guarantee that nearby texels in 2D (or 3D) image space will be nearby in linear memory. This is called "optimal" tiling in Vulkan.

Optimal tiling really is going to be much faster in almost all cases. Some GPUs have limited support for reading linear iamges in shaders -- but this will rarely be a good idea. The existential advantage of linear tiling is that it is easy to program the CPU code that reads and writes image memory. Also, optimal tiling isn't defined as any particular pattern -- it's vendor specific. I have a feeling that it might actually the same pattern for all hardware (that is, hierarchically packing into 2x2 or 2x2x2 blocks), but I don't know that for sure.

Because optimal tiling isn't defined, we can't pass the initial image data in that tiling. We must initialize the image data in linear tiling.

This is why staging images are required. We initialize a linear staging image in "host visible" memory first. Then we issue a command, and the GPU will copy from the staging image into the final image, swizzling into optimal layout as it goes. It's been like this for many, many years -- but to PC programmers, it may seem new because DirectX hides this behaviour.

So, I said "staging image" above. But what I really meant to say was "staging buffer." Here's the problem -- even though there are "linear" tiling images that can be positioned in "host visible" memory, these can only have a single mip level and a single array layer. So how do we initialize textures with multiple mip levels and multiple array layers?

It seems that we're intended to use a VkBuffer for this. That's the trick -- and every Vulkan programmer needs to know it :).

It may seem strange to do this, but it does make sense... In Vulkan, a VkImage contains functionality for driving the "sample" shader operations. All these concepts of tiling, mip levels, pixel formats, etc, are all related to shader sampling. But a staging texture will never be sampled by a shader. So, in effect, the VkImage concepts are redundant. We just want an area of video memory with no special restrictions (and VkBuffer fits that requirement better).

It does mean we have to implement our own functions for arranging mip levels and array layers within the buffer space. In one sense, this means writing a custom implementation of vkGetImageSubresourceLayout.

So, first we create the staging buffer, and initialize the device memory. Then we can issue the copy command with vkCmdCopyBufferToImage. This allows us to copy into the image subresources from arbitrary memory within the buffer. I find the interface for this function a little awkward (and there are some limitations) but it's not to bad.

There's a thread on nvidia's site that seems to verify that this was intended: https://devtalk.nvidia.com/default/topic/926085/texture-memory-management/.

Anyway, it's an important trick. Because (as far as I can tell) this is the only way to create textures with multiple mip levels or multiple array layers in Vulkan!

2. Write custom reflection for SPIR-V bytecode

The SPIR-V bytecode is a very simple format, and it's also an open standard. The bytecode also contains many of the variable and type names from the original source code (which are called decorations). These aren't used during execution -- but they are useful for debugging and reflection.

I've implemented a little bit of code to read SPIR-V bytecode and extract layout bindings and other useful information. This is similar to the ID3DShaderReflection interface in DirectX. But since it's custom coded, it's crazy efficient.

I recommend checking out the following files in the Vulkan SDK for a starting point for working with SPIR-V byte code:

glslang/SPIRV/disassemble.cpp
glslang/SPIRV/doc.cpp

3. Set the VK_LAYER_PATH variable!

The Vulkan SDK has a bunch of debugging "layers" built in. These are really useful!

But to get them working, you need to set the VK_LAYER_PATH variable to your sdk "bin" directory (eg C:/VulkanSDK/1.0.5.0/Bin), and maybe do a bunch of other things. This may not be documented anywhere...?!

When things go wrong with Vulkan, usually you'll just get a program crash (that is, if the video card driver doesn't crash and bluescreen). You won't get much debugging information normally. To get error and warning messages, you'll need the layers installed.

If you read the Vulkan specs document, you'll notice that all functions have a set of rules about input parameters -- written in contract style. These are the kinds of things the layers check for. But they also check for threading access and other frequent usage problems.

It's really helpful, believe me! Get it working early on. Play with the "enablevalidationwith_callback" sample until it's working.

4. Download the source code for RenderDoc

Compile a debug build of RenderDoc for yourself, and run it in the debugger. You'll get asserts and debugging information in those cases where your code is so screwy that even RenderDoc can't handle it.

Just go to renderdoc.org -- it'll redirect to github!

XLE 27