April 08, 2016

So, the Vulkan prototype is progress... But I'm running into many problems working with the drivers and the associated tools. Here's some examples of the problems I'm finding.

RenderDoc crashing

RenderDoc is the best tool for debugging Vulkan at the moment... But every time I tried to capture a log, it just crashed! The crash report didn't contain any useful information. All I could do was guess at the problem.

Fortunately, RenderDoc has one really great feature... It's open-source! So, I just downloaded the code and ran from Visual Studio (it compiled first time).

RenderDoc is still very unstable for Vulkan. But now that I have my own compiled version, that's not really a big issue. I can just debug any crashes and make changes as required. All of the other GPU debugging tools I've ever used (PIX, console tools, GPA, nsight, etc) have been unstable, as well. But they were all closed source. So whenever I got an error, my only choices were to either use another debugging, or guess at the problem. With this in mind, (open-source + very unstable) is probably better than (closed-source + mostly stable).

My issue was related to "binding" decorations in SPIR-V. RenderDoc requires that all resources have binding decorations. I found this was also an issue for run-time XLE code. Even though the GLSL compiler is capable of generating SPIR-V code without binding decorations, it seems like all practical uses require them.

My shaders weren't getting "bindings" for any resources, and this was the cause of RenderDoc's crashes!

HLSL cross compiler and "bindings"

Part of the issue is related to the HLSL cross compiler. In some cases, the cross compiler can attach "location" values, but it never attaches "binding" values for textures or constant buffers.

Fortunately, the HLSL cross compiler is also open-source... So I can create a fork with the modifications I need. That seems to be required in this case. I could try to attach the binding information later in the pipeline (eg, by modifying the output GLSL, or by inserting instructions into the SPIR-V bytecode)... But changing and improving the cross compiler seems like the best option.

We ideally also want to specify the "descriptor set" in the GLSL code. Unfortunately, HLSL 5 doesn't have an equivalent concept. That is going to require some more effort.

Cross compiler incorrect translation

The next problem was some HLSL instructions were generating incorrect GLSL code. In particular, I was using an expression like "float4(localPosition.xy, 0, 1)" to generate a 4D vector. But this was being treated as "float4(localPosition.xy, 0, 0)".

There are a number of unknown instructions that are incorrectly translated by the HLSL cross compiler... So it looks like I'll need to do some work improving the code.

Vulkan validation layer

It took me awhile to figure out how to enable the Vulkan validation layer. This is really important for finding usage errors! But it's not really documented and it's a very unclear how to get it working.

Eventually, I found out I needed to set the VK_LAYER_PATH environment variable. It seems like this should be set by the SDK installer, but maybe it was an oversight.

Anyway, I also needed to use the VK_EXT_DEBUG_REPORT_EXTENSION_NAME extension to install a message handler. It looks like there are other ways to use the validation layers. But I still don't know how to get them working. For now, I'm just catching errors and warnings and pushing them into the XLE logging system.

Binding samplers and images together

HLSL separates samplers and textures, but Vulkan seems to prefer to combine them together into one. This is going to cause a bit of any issue with the way XLE shaders use samplers... Probably to start with, I just use a single point filtering sampler for all textures.

vkCmdBindDescriptorSets ending command buffers

For some reason, vkCmdBindDescriptorSets is silently "ending" the command buffer. It not clear why -- there must be some error. But I haven't found it yet. Even the validation layers aren't much help in this case.

"Image layouts" concept confusing

There seems to be some confusion in the API over a concept called "image layouts." This appears to be related to how images are stored in memory. The exact detail are very implementation specific and opaque. We need to instruct the GPU (rather than the CPU side of the API) to change the layout of an image. So changing the layout involves appending commands to the command buffer.

But there are multiple different ways to do this... It's not really clear how to best handle this currently. The samples have their own way of dealing with image layouts. But that doesn't look like the most optimal approach -- and anyway, it's architecturally awkward (because it mixes unsynchronised initialisation functions with synchronised command buffer functions).

So, I'll need to do some experimentation to find the best way!

Now rendering geometry!

But, I've finally got some basic geometry rendering! It's just a few 2D triangles, but it's something!

In short, there are a lot of problems and difficulties with using Vulkan currently.

I think I've found problems with every step in the chain so far. But many of the tools and library are open-source, and that is helping a lot. If (for example) RenderDoc had been closed source, I would just be making guesses now, and probably not getting very far.

It would be nice if everything was stable and polished... But for now, as long as I can identify the particular cause of each problem, I think I can make educated decisions about how to navigate through the minefield.

blog comments powered by Disqus