My personal hell of translating DXIL to SPIR-V – part 2

In the previous blog post, I began a long form exploration of the DXIL shader format, and how it translates to SPIR-V. In this part, we’ll look more closely at the LLVM format, how it is parsed, and how to interpret the parsed result.

The LLVM IR binary format is mostly undocumented. Very early on we have to dig through the source to understand what is going on. LLVM IR was never intended to be used as a “standard” format that is shipped between different software stacks. It’s clearly an ad-hoc serialization format that serves the purpose of LLVM internals. The IR format is backwards compatible at the very least, which is why we can parse DXIL LLVM 3.7 modules with modern LLVM versions.

As we’ll see, LLVM IR is very complex to parse compared to SPIR-V. There are some interesting similarities however, as SPIR-V shares some DNA of LLVM.

Layered architecture

LLVM IR is parsed in multiple layers. At the lower level is a compression scheme which feels somewhat like LZ compression. The bit-stream teaches the decoder how to decode the stream, by emitting “templates” (or “code book entries”), and these templates can then be instantiated to form complete records.

The low-level bit-stream parser

The initial part of the LLVM IR puzzle is pulled from RenderDoc’s implementation. The basic gist of it is documented here. To summarize however, the idea is that a module consists of one top-level “block”. A block is a structure of blocks and records. A record has an ID with an array of uint64_t operands (quite similar to SPIR-V, except SPIR-V is array of 32-bit operands).

Storing full uint64_t operands is of course very wasteful, and this is where the primitives types of LLVM IR come in. We can express primitive types compactly with:

Variable length integers (configurable chunk size)
Fixed width integers (configurable bit width)
6-bit chars (useful for C-style identifiers, i.e. a-z, A-Z, 0-9 and _)

Variable length integers are encoded in a scheme where we look at N bits at a time, N – 1 bits contain useful data, and the MSB marks whether to keep looking at N more bits.

Blocks and records are invoked in an esoteric way, which is where abbreviations come in. When we parse, we’re parsing abbreviations one at a time, which either results in some action:

0 – END_BLOCK – Ends block scope
1 – ENTER_SUBBLOCK – Begins a new scope, can nest arbitrarily
2 – DEFINE_ABBREV – Defines a template for how to build new records. For example, we can specify that a record is [vbr4, char6, literal constant] or something. That abbreviation implicitly gets a new ID, starting with 4, which can be invoked when parsing new abbreviations. When decoding this abbreviation, the parser knows ahead of time how to decode the bits into arguments. For char6 strings in particular, it’s also possible to specify an array abbreviation.
3 – UNABBREV_RECORD – YOLO mode, directly decodes a record with a bunch of variable length integers, fairly inefficient. DXC seems to love to emit these 🙂
4+ – Invoke user abbreviations.

In typical LLVM IR fashion, the number of bits used to encode the abbreviation ID is variable. It starts at 2 bits (since there are no user abbreviations to worry about yet), but can grow as needed. Fun!

The details are not super interesting for this post, but suffice to say, there’s a decent amount of detail that goes into parsing this.

The calling code ends up looking something like this:

LLVMBC::BitcodeReader reader(static_cast<const uint8_t *>(data), size);
LLVMBC::BlockOrRecord toplevel = reader.ReadToplevelBlock();

// The top-level block must be MODULE_BLOCK.
if (KnownBlocks(toplevel.id) != KnownBlocks::MODULE_BLOCK)
   return nullptr;

// We should have consumed all bits, only one top-level block.
if (!reader.AtEndOfStream())
   return nullptr;

The BlockOrRecord struct is fairly straight forward, simplified here as:

struct BlockOrRecord
{
  uint32_t id; // What kind of record or block is this?
  Type type; // block or record
  dxil_spv::Vector<BlockOrRecord> children; // If block
  dxil_spv::Vector<uint64_t> ops; // If record
};

Higher level parser

Now, we’re at a level where we have recovered structure from the bit-stream, and now we need to turn the BlockOrRecord structs into actual API objects, llvm::Module, llvm::Function, llvm::Value, etc, etc … dxil-spirv implements a LLVM C++ API drop-in replacement to be able to cross-reference our implementation against the reference implementation at any time (which has saved me many times). The implementation only implements exactly what is needed for DXIL however, so don’t expect too much of it. 🙂

Refer to objects by ID

Very similar to SPIR-V, types and values are referred to by an uint64_t ID. The annoying part however is that types and values implicitly allocate their own IDs, meaning that forgetting to parse something can be fatal. On top of this, IDs may or may not refer to other IDs through deltas relative to their own values or absolute values. It is somewhat context sensitive which one to use, which gets quite annoying to deal with.

Decoding llvm::Type

LLVM IR has a type hierarchy similar to SPIR-V. You start by declaring fundamental types like ints and float, and then upgrade them to vectors, array, pointers or structs. While parsing, the top-level block can contain TYPE blocks, which contains a bunch of records.

for (auto &child : toplevel.children)
{
   if (child.IsBlock())
   {
      switch (KnownBlocks(child.id))
      {
      case KnownBlocks::TYPE_BLOCK:
         for (auto &entry : child.children)
             parse_type(entry);
         break;
      }
   }
}

bool ModuleParseContext::parse_type(const BlockOrRecord &child)
{
   Type *type = nullptr;
   switch (TypeRecord(child.id))
   {
   case TypeRecord::VOID_TYPE:
   case TypeRecord::HALF:
   case TypeRecord::INTEGER:
   case TypeRecord::POINTER:
   case TypeRecord::ARRAY:
   case TypeRecord::FUNCTION:
   // you get the idea
   }
   types.push_back(type);
}

Integers deserve special mention, because they are somewhat whacky in LLVM. First, they have no signage associated with them. This kinda makes sense, since signage only actually matters in certain opcodes, like signed min/max, signed compare, arithmetic vs logical right shift, signed vs unsigned float <-> int conversion, etc. SPIR-V maintains sign for its integer type, but we can ignore it in most scenarios. (There is an esoteric exception to this however where DXIL kinda breaks down, once we dig into relaxed precision signed integers!) Another annoying exception we have to deal with all the time is stage IO and resource variables which are explicitly signed or unsigned in DXIL.

As the grizzled C programmer will know, signed overflow is undefined, but unsigned overflow is not. Does LLVM just not care? Well, it does. LLVM can mark operations as being “no signed wrap”, or “no unsigned wrap” for optimization purposes, but we don’t have to care about those at all fortunately.

Booleans are expressed as 1-bit integers, which kind of makes sense, but at the same time feels like a very LLVM thing to do … Logical operations reduce to simple arithmetic operations on 1-bit values instead.

The final whack part is that you can declare non-POT integer sizes. There are shaders in the wild which declare 11-bit integers and rely on wrapping on these values to work! (dear lord … <_<) I even tried to compile this to x86_64 and yes, it does actually deal with it correctly. I’m kind of amazed, and scared at the same time.

Overall though, type declaration in LLVM IR is pretty easy to understand if you understand SPIR-V.

Decoding constants

Similar as types, constants are records within a block. They can appear at function scope or global scope.

bool ModuleParseContext::parse_constants_record(
    const BlockOrRecord &entry)
{
    llvm::Constant *value = nullptr;
    switch (ConstantsRecord(entry.id))
    {
        case ConstantsRecord::SETTYPE:
        case ConstantsRecord::CONST_NULL:
        case ConstantsRecord::UNDEF:
        case ConstantsRecord::INTEGER:
        // ...
    }
    values.push_back(value);
}

Roughly speaking, this looks very similar, with some quirks. SETTYPE informs subsequent constant blocks which type is actually used. CONST_NULL is example of a fully context sensitive constant, similar to OpConstantNull in SPIR-V.

INTEGERs are converted through sign rotations. Since small negative numbers would be horribly inefficient to encode with VBR otherwise, the first bit is the sign bit, encoded in a sign magnitude scheme. -0 is interpreted as INT64_MIN.

Where LLVM constants get disgusting however, is the pseudo-specialization constant operation support. It is possible to encode a constant cast operation, or constant access chain into a global object (wtf?) this way. I don’t understand the motivation behind this, but there are lots of super weird edge cases here that took some time to iron out.

AGGREGATE is the first time we start to see how value IDs are referenced.

Vector<Value *> constants;
constants.reserve(entry.ops.size());

for (auto &op : entry.ops)
{
   constants.push_back(get_value(op, element_type,
      /* force absolute IDs */ true));
}

Value *value;
// Ah, yes. Why have VECTOR and ARRAY types when you can
// have a context sensitive one instead.
if (current_constant_type_is_vector)
{
   value = context->construct<ConstantDataVector>(
      get_constant_type(), std::move(constants));
}
else
{
   value = context->construct<ConstantDataArray>(
      get_constant_type(), std::move(constants));
}

get_value() is quite sneaky. In LLVM IR, it is valid to forward reference an ID, as long as the type is known. This leads to ProxyValue objects being created, which are resolved later. get_value() can be relatively indexed, or absolutely indexed depending on the context, which is always fun.

Global variables

Global variables are declared in top-level records. Typically these are only used for groupshared variables. In DXIL, a special pointer address space is reserved for this purpose. Global variables can also be used for global look-up tables. Global variables can also have optional initializers. This is very similar to SPIR-V overall. The equivalent is an OpVariable with either Workgroup or Private storage class.

Resource handles are declared in a completely different way … unless we’re DXR (more on that later, sigh v_v …)

Function prototypes

We also get to declare function prototypes at this stage. Some functions only have prototypes, and the common case here is various prototypes which declare dx.op intrinsics functions. If a prototype is declared to also have a body, we place that in a queue.

We also have to parse parameter attribute lists (surprisingly tricky!), just in case the function declares LLVM attributes. The only case we have to care about here is FP32 denorm handling. Why that isn’t a metadata entry, I’ll never know. DXIL really likes splitting its implementation across two completely different systems for no good reason …

Parsing functions

A function body is a block, consisting of records (which express normal opcodes), and other blocks (e.g. constant blocks). The first record we’ll typically see is the DECLAREBLOCKS one, which specifies the number of basic blocks in the function.

Basic blocks

A basic block is a fundamental building block of SSA-based IRs. A basic block enters execution at the first instruction, and executes in a straight line fashion until a terminator instruction executes. A terminator instruction can be anything which transfers control like a direct branch, conditional branch, switch statement, returns, etc. If you know SPIR-V, this is nothing new. It’s the exact same concept.

Unlike SPIR-V, where we have explicit OpLabel opcodes to begin a new block, LLVM makes this implicit. When we observe a terminator, the next instruction will be added to the next basic block.

Context sensitive parsing

The opcodes in the IR match closely to the type hierarchy of LLVM, let’s look at parsing llvm::BinaryOperation. A binary operation is any c = op(a, b) kind of instruction, it’s not necessarily just and/or/xor, etc. This is a catch all for FAdd, IMul, IAdd, Xor, etc.

case FunctionRecord::INST_BINOP:
{
   unsigned index = 0;
   auto lhs = get_value_and_type(entry.ops, index);
   if (!lhs.first)
      return false;
   auto *rhs = get_value(entry.ops, index, lhs.second);
   if (!lhs.first || !rhs)
      return false;
   if (index == entry.ops.size())
      return false;
   auto op = BinOp(entry.ops[index++]);
   auto *value = context->construct<BinaryOperator>(lhs.first, rhs, translate_binop(op, lhs.second));
   if (index < entry.ops.size())
   {
      // Only relevant for FP math,
      // but we only look at fast math state for
      // FP operations anyways.
      auto fast_math_flags = entry.ops[index];
      bool fast = (fast_math_flags &
          (FAST_MATH_UNSAFE_ALGEBRA_BIT |
           FAST_MATH_ALLOW_CONTRACT_BIT)) != 0;
      value->set_fast_math(fast);
   }
   if (!add_instruction(value))
      return false;
   break;
}

In SPIR-V a binary operation would be encoded as:

%id = OpMyBinOp %type %operand_a %operand_b

Very explicit and understandable. LLVM IR on the other hand is more clever, for better or worse.

First, the result %id is implicit, and is allocated linearly as new opcodes come in. The type of an instruction is context sensitive. First, we parse %operand_a. If we have seen this ID already, %type is deduced directly from the operand. If it is a forward reference, the type of %operand_a is encoded in the record explicitly.

IDs in most opcodes are encoded with a relative scheme. Since SSA requires that declarations of an ID dominates all uses of it, the common case is that uses of an ID come after the declaration of it, so this is a decent compression scheme. The implementation of get_value_and_type() is something like:

std::pair<Value *, Type *> ModuleParseContext::get_value_and_type(
   const Vector<uint64_t> &ops, unsigned &index)
{
   if (index >= ops.size())
      return {};

   uint64_t op = ops[index++];
   // Context sensitive, for backwards compat mostly, but
   // modules can choose to use absolute or relative encoding.
   if (use_relative_id)
      op = uint32_t(values.size() - op);

   if (op < values.size())
   {
      // Normal reference.
      return { values[op], values[op]->getType() };
   }
   else
   {
      // Forward reference, the type is encoded in the next element.
      if (index >= ops.size())
         return {};

      auto *type = get_type(ops[index++]);
      auto *proxy = context->construct<ValueProxy>(type, *this, op);
      pending_forward_references.push_back(proxy);
      return { proxy, type };
   }
}

I had tons of bugs where I didn’t handle the possible forward references. Very awkward. I was of the impression that only PHI instructions could possibly have forward references, but of course, it’s never that simple.

Speaking of PHI, get_value() changes here to a signed-aware variant, where the relative value ID is encoded with sign_rotation, just like for INTEGER constants. This is because we expect that PHI inputs are forward referenced just as often as backwards referenced.

Overall, it’s just a grind to implement all relevant opcodes. DXIL only uses a subset, but it’s not well documented which subset of LLVM IR is actually used. We just have to implement new stuff as it comes in. DXIL.rst has a list of which LLVM instructions are supported, but this list cannot be trusted because in DXR, DXC emits various vector instructions (so much for being a scalar IR format) and the unreachable terminator which is missing from the table.

Metadata

Metadata lives in its own block hierarchy and has a completely different set of types, llvm::MDNode, llvm::MDOperand, llvm::ConstantAsMetadata, etc.

At the top of the hierarchy we can declare NamedMDNodes, which we see in the LLVM assembly as:

!llvm.ident = !{!0}
!dx.version = !{!1}
!dx.valver = !{!2}
!dx.shaderModel = !{!3}
!dx.resources = !{!4}
!dx.entryPoints = !{!7}

NamedNodes contain a list of MDNodes, which nest into more MDNodes, or terminate in constant values. These correspond to NAMED_NODE, NODE and VALUE record types, not too many surprises here.

Emitting SPIR-V opcodes

After we get through the parsing step, LLVM IR and SPIR-V has many similarities, and translating opcodes isn’t particularly difficult. For each LLVM basic block, we emit a basic block in SPIR-V and translate the opcodes one by one. We preserve the SSA nature as-is. There are of course a lot of details here, but they’re not very interesting, and too many details to enumerate. The two biggest problems we need to focus on are:

Resource access

Accessing textures, constant buffers, structured buffers and the weird and wonderful zoo of resource types is insanely intricate, and I’ll need to dedicate an entire blog post to this.

Control flow structurization

Another massive issue is the control flow. In LLVM, there is no structurization information whatsoever, and we’ll have to reconstruct this somehow. This is at least one more blog post. After emitting SPIR-V code into basic blocks, we need to rewrite the control flow and annotate the basic blocks with merge information, which then allows us to emit a final SPIR-V module, ready for driver consumption.

Conclusion

This was a rough overview of LLVM IR and how it is parsed from scratch. I think it’s safe to say it’s far more difficult to parse than SPIR-V, which is literally just a stream of [N x uint32_t] opcodes. Parsing IR is not the most exciting part of DXIL to SPIR-V conversion, but it had to be done. On the other hand, it might be useful starting knowledge for other projects.

For next post, we’ll look at how to translate the D3D12 binding model into Vulkan.