Vulkan Workshop / DevU 2017

January 31, 2017

More on spherical harmonics is coming, but this is a slight intermission about Vulkan. Yesterday, I got a chance to attend a Vulkan workshop (called Vulkan DevU) in Vancouver, Canada. It was a short conference with talks by some of the Vulkan working group, with a mixture of both advanced and beginner sessions. You can get the slides here.

Unfortunately, isn't been awhile since I've touched the Vulkan implementation in XLE, and it wasn't fresh in my mind -- but I got a chance to meet some of the working group members and ask a bunch of random questions. Lately it's been difficult to find enough time to properly focus on low level graphics in XLE; I've had to prioritize other things.

My impressions of Vulkan's strong points were re-affirmed by this conference. Vulkan has a fundamentally open nature and is naturally industry driven -- and working group showed a desire to become even more open; both by inviting contributions and suggesting they would change their NDA structure so they can speak about upcoming drafts sooner in the process.

Vulkan creates a practical balance between compatibility and performance; and the team addressed that directly. They also spoke about their desire to keep Vulkan thin and avoid heuristics -- another great property of the library.

So, they gave me every reason to think that Vulkan was in good hands. However, during the conversations we also started to touch upon some of the potential risks of the Vulkan concept. They were upfront about the desire to create a successor to OpenGL, which implies a very broad usage of the API (awesome!) but, in my opinion, there are some possible risks:

DirectX 12

After what seemed like an uncertain start, DX12 looks like it could be very strong. The fact that it shares many fundamental properties with Vulkan makes the two natural competitors.

Part of DirectX's strength is that is has for many years worked hand and hand with GPU hardware development. Generally important (game oriented) hardware features need to be exposed by the DX API before they are "real" -- so Nvidia, AMD & Intel must fight it out to guide the API to best suit whatever hardware feature they want to emphasize. Over the years, both Nvidia and AMD have attempted to lessen their dependence on DX (with GL extensions or Mantle, etc) but it hasn't worked so well. If you want to make a major change (eg, multicore GPUs, bindless, etc), game developers have a tendency to ignore it until it's in DX.

The problem for Vulkan is that it risks having to just trot along after DX, or becoming too bound to whichever hardware vendor that feels left out in the cold by the DX council (or, alternatively, mobile oriented hardware vendors that don't have as much skin in DirectX)

Shifting responsibilities onto engine teams

Vulkan shifts some of the responsibilities that were previously handled by GPU driver teams onto the game engine team. This is particularly obvious in areas such as memory management, scheduling and coherency... But it's a general principle, sometimes referred to as the "explicit" principle. For me, this is Vulkan's greatest attribute; but there are risks associated also.

For example, what happens next time there's a big change GPU architecture? What happens if there's a mythical "paradigm shift"? In the old model, it would just be up to the GPU driver developers write a new driver and bind the old API to the new hardware.

The working group have said that major revisions of Vulkan will be compatibility breaks. That would open the door to deep re-architecting of the API after hardware shift. We already see that with DX every 10 years or so. But is that going to be enough to solve this problem?

For major hardware changes, I guess time will tell. But what abount minor hardware changes -- those that are not enough to require big API changes, but are significant enough to change the performance profiling of applications. What happens if a certain API usage practice, which was previously a best-practice, suddenly becomes extremely slow on some specific hardware...?

To hazard an example, what happens if the "descriptor set" model for shader inputs becomes inefficient for new hardware?

In the Vulkan model, if we want to keep the "explicit" principle, then this would require game engine teams going in and adapting their code to suit the new hardware. But there are hundreds more game engine teams than there are GPU driver teams! And game engine teams don't frequently maintain older games for newer hardware innovations. So there is a kind of "moral" hazard here for the GPU driver team to just go in an modify the explicit API path and replace it with some driver magic that is actually doing something else. But if that happens, we're all screwed in the long term -- because Vulkan is so explicit, it must do exactly what it says it's doing.

Something tells me that the "explicit" principle is both Vulkan's greatest attribute, and also it's greatest along-term test.

It's impossible to test for the correctness of a Vulkan application

This came up in the context of scheduling; but there's more to it than that. Scheduling in Vulkan is incredibly difficult. People were really shocked by the difficulty of the PS3 (and are still today!), but in some ways it's more difficult in Vulkan. We've got to specify, down to the exact pipeline stage, exactly when an operation is required to be completed.

The problem with this is that most hardware will work fine, even if there are major scheduling errors. But, as Tom said, for every scheduling mistake that can be made, there's at least one GPU that it will cause problems with (otherwise that API feature wouldn't exist!)

The difficulty is that there is no practical way to test our code, to see if it is correct. All we can do is run it on as much hardware as possible, and see if it breaks. But how do we know what hardware is the most unique, the most likely to break? When there are few Vulkan games, the GPU manufacturers will test for but -- but for the second wave, it's going to take some time to figure those things out.

Vulkan needs high quality layers on top of it

Vulkan is too difficult for many game engine teams to approach. Some early games can maybe get through it with help from the working group and driver developers; and experienced console developers will be fine. But for the second wave, many engine teams risk just falling down an endless rabbit hole.

Furthermore, the benefit in many cases will be limited. I've seen many game engines that have turned to technologies like Vulkan or Apple Metal for performance improvements. But they've not found what they sought because they didn't understand their profile well enough to understand that their existing API overhead was already low relative to the overhead in the rest of their engine code. I've seen that over and over again; it's just common for game engine code to be much less optimized than GPU driver code. In some cases this may mean that replacing driver code with engine code results in a net decrease in performance.

For the aspirational industry leaders, the opposite is going to be true (even that the generality of a driver will make it more difficult to optimized). But there are few that fall into that category.

The solution is to build reusable mini-engine layers on top of Vulkan. Layers that provide the same kinds of guarantees and ease of use that DirectX does. Ideally these layers should open-source, and they should be written to shield less experienced engine developers from the major pitfalls and vulgarities of low level coding.

Cross-platform gaps

One of Vulkan's greatest attributes is it's cross-platform nature. Windows, Mac, Linux, Android, etc, etc -- all in one very thin package. It's fantastic.

But there's some gaps here. Lack of support on Apple is a big one. There's another odd one that also came up in the conference -- and that is WebGL. Obviously Vulkan doesn't quite seem right for the web; but this is an odd case because I was thinking about exactly the same thing when the questioner asked about WebGL.

I actually have a really good reason for wanting a single solution that can work across IOS, Android, Windows, Mac & the web. But it still seems like an odd combination of platforms!

It turns out that OpenGLES actually works on all of these platforms (and it's the only solution that does). But OpenGLES is near end-of-life, and all of those platforms have much better solutions except for the web (and point-in-question Android).

The other possibility is to use Vulkan for Windows, Mac, Android, Apple Metal for IOS and OpenGLES for web and Android fallback. In some ways that's a ridiculous combination of very different APIs (but what else can I do??!)

My take-aways

So, I mostly represented myself in my professional context as a mobile games developer (but perhaps I should have put XLE first in this context?). My studio has both the technical and art capabilities to use Vulkan very effectively on mobile, and the willingness to invest in best-in-class solutions. We're also facing some significant internal engine refactoring in the near future, which requires that we consider future trends.

So I guess in part I wanted to find the answer to the question of how high should Vulkan be on our list of priorities? That list is deep with important stuff; so there's plenty of competition?

I very much want Vulkan to succeed; but I'm also aware that I'm a little biased in that way. It feels like we haven to plan for Vulkan (because we don't want to refactor towards a direction that will be incompatible with Vulkan in the long run). At the moment, I think that Vulkan can bring improvements to our internal day-to-day development; but it's unclear if there will be any great benefit to the final product (ie, we have better options for performance improvement currently, and we still need an OpenGLES fallback for low end hardware, anyway).

It feels like Vulkan still has a path to tread before it can establish itself fully. Vulkan must convince game engine programmers to change they way they work; to take on more responsibilities, to understand more of the pipeline, and to publish more code as open-source. In some ways, Vulkan feels like a reaction to Sony console hardware of the past. Right at the moment, my feeling is that Vulkan can only succeed if there are a sufficient number of low-level serious graphics coders spread throughout the industry (such those trained on the PS2 & PS3) that can help address the risks and demonstrate the advantages of the explicit model. Otherwise, the temptation is to just fall back on the likes of Microsoft, Nvidia & AMD, and let them shoulder the load for awhile longer.

XLE 27