In part 1 we parsed a MIPS ELF file and set up a stack, we should now consider how to translate the MIPS machine code to LLVM IR. Before we get there, we need to learn how to use the LLVM APIs and how code-gen to LLVM IR works.
If you know LLVM well already, this post is not for you, wait for part 3 when we start considering the design of the recompiler. If you don’t, this post should give you all understanding needed to be able to generate some code in LLVM yourself and understand the implementation.
This turned out rather dry, but hey, it’s limited how exciting writing C++ to generate IR can be. 😀 At least it’s very educational.
Which LLVM version to target?
The LLVM JIT APIs are unfortunately rather unstable. The APIs appear to break between major versions. A common workaround to this seems to be to simply stick to a version and depend on that unless you’re ready to update. I got it working on the current LLVM 7, and also the upcoming LLVM 8, which seems to shuffle a lot of internal APIs around. For the sample here, we’re going to consider LLVM 7, which is the latest stable major version. The API changes as far as I know are only related to the JIT runtime, not codegen, so we can hide away the ugliness behind a clean interface at least.
LLVM IR basics
To get us started, let’s make a trivial function, which adds two arguments together and returns it. In C, we would represent this as:
int LLVMAdd(int a, int b);
To do this, we need to create an LLVMContext, an LLVM module (sort of like a translation unit) and a function to that module. We’ll need to create some types as well.
// We'll need a ton of headers for later, might as well just add it now.
#include "llvm/ExecutionEngine/Orc/CompileUtils.h"
#include "llvm/ExecutionEngine/Orc/Core.h"
#include "llvm/ExecutionEngine/Orc/ExecutionUtils.h"
#include "llvm/ExecutionEngine/Orc/IRCompileLayer.h"
#include "llvm/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.h"
#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/DataLayout.h"
#include "llvm/Target/TargetMachine.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/ExecutionEngine/SectionMemoryManager.h"
void test()
{
llvm::LLVMContext context;
llvm::Module module("mymodule", context);
llvm::Type *int32 = llvm::Type::getInt32Ty(context);
llvm::Type *result_type = int32;
llvm::Type *argument_types[] = { int32, int32 };
bool vararg = false;
llvm::FunctionType *function_type =
llvm::FunctionType::get(result_type, argument_types, vararg);
llvm::Function *function =
llvm::Function::Create(function_type,
llvm::Function::ExternalLinkage,
"LLVMAdd", module);
llvm::errs() << module << "\n";
}
; ModuleID = 'mymodule'
source_filename = "mymodule"
declare i32 @LLVMAdd(i32, i32)
For now, we have simply declared the function. If we do nothing more, this is equivalent to using “extern” in C. We can call it, but the linker needs to find the function elsewhere. Let’s expand it with some code.
Basic blocks
Basic blocks are the foundation of LLVM and a function consists of any number of basic blocks. Basic blocks represent linear control flow within a function. At the end of a basic block, the block must be terminated. Here, and only here can we do things like:
- Branch to another basic block, conditionally or not
- Return
- And some other special cases
For our first test, we can get away with one basic block.
Single static assignment (SSA)
Another core principle of LLVM is the use of single-static assignment. In short, it basically means that once you assign a value to a variable, it cannot ever change. If a variable is modified multiple times, each time it is assigned, it needs to be assigned to a new SSA value. You will see this when we start generating some code. This has various benefits for optimization and code-gen later in the pipeline, but that’s LLVM’s problem. Let’s implement the add function.
llvm::BasicBlock *bb = llvm::BasicBlock::Create(context, "entry", function);
llvm::IRBuilder<> builder(bb);
llvm::Value *arg0 = &function->arg_begin()[0];
llvm::Value *arg1 = &function->arg_begin()[1];
llvm::Value *value = builder.CreateAdd(arg0, arg1, "added_value");
builder.CreateRet(value);
Here we add a basic block to our function. The first basic block is where the function starts executing. The IRBuilder is the class which lets you build actual code. Here we create an add instruction, and return it. This ends the basic block, and we’re done. Not too bad! If you print it using the existing line of code, you’ll now see:
; ModuleID = 'mymodule'
source_filename = "mymodule"
define i32 @LLVMAdd(i32, i32) {
entry:
%added_value = add i32 %0, %1
ret i32 %added_value
}
Compiling IR to native code
We could just dump this directly into the JIT engine now, but let’s try compiling this as a dynamic library, using the offline tools. Copy this IR code into a file, e.g. test.ll. Let’s compile this to native code.
$ llc -relocation-model=pic -filetype obj -o test.o test.ll -O2
$ ld -shared -o test.so test.o
$ objdump -d test.so
0000000000001000 <LLVMAdd>:
1000: 8d 04 37 lea (%rdi,%rsi,1),%eax
1003: c3 retq
LLVM can usually target other architectures with one binary, so try adding -mtriple=mips to llc, disassemble with the MIPS binutils, and you’ll get:
00000000 <LLVMAdd>:
0: 03e00008 jr ra
4: 00851021 addu v0,a0,a1
This code might confuse you, it’s returning before the add? We’ll get to that later, it’s one of the weirder parts of MIPS, delay slots.
Branches
We’ll need some control flow. Let’s try implementing this function:
int foo(int a, int b, int c)
{
if (c > 0)
return a * b;
else
return a + b;
}
using namespace llvm;
// ...
// Remember to update this to take 3 arguments!
Type *argument_types[] = { int32, int32, int32 };
// ...
BasicBlock *bb = BasicBlock::Create(context, "entry", function);
BasicBlock *true_path = BasicBlock::Create(context, "true", function);
BasicBlock *false_path = BasicBlock::Create(context, "false", function);
IRBuilder<> builder(bb);
Value *arg0 = &function->arg_begin()[0];
Value *arg1 = &function->arg_begin()[1];
Value *arg2 = &function->arg_begin()[2];
Value *cmp = builder.CreateICmpSGT(
arg0,
ConstantInt::get(Type::getInt32Ty(context), 0),
"cmp");
BranchInst::Create(true_path, false_path, cmp, bb);
builder.SetInsertPoint(true_path);
Value *mul = builder.CreateMul(arg1, arg2, "multiplied");
builder.CreateRet(mul);
builder.SetInsertPoint(false_path);
Value *add = builder.CreateMul(arg1, arg2, "added");
builder.CreateRet(add);
Notice that the compare instruction is “signed greater than”. Normally, LLVM does not care about signedness, except for operations where this matters. To branch, we create a branch instruction, which ends the first basic block.
; ModuleID = 'mymodule'
source_filename = "mymodule"
define i32 @LLVMAdd(i32, i32, i32) {
entry:
%cmp = icmp sgt i32 %0, 0
br i1 %cmp, label %true, label %false
true: ; preds = %entry
%multiplied = mul i32 %1, %2
ret i32 %multiplied
false: ; preds = %entry
%added = add i32 %1, %2
ret i32 %added
}
And this becomes in x86-64 (sorry for the AT&T syntax):
0000000000000000 <LLVMAdd>:
0: 85 ff test %edi,%edi
2: 7e 06 jle a <LLVMAdd+0xa>
4: 0f af f2 imul %edx,%esi
7: 89 f0 mov %esi,%eax
9: c3 retq
a: 01 d6 add %edx,%esi
c: 89 f0 mov %esi,%eax
e: c3 retq
Memory and pointers
We can’t always use pure SSA values, and we have to deal with good old memory. LLVM has an “interesting” approach to pointers, but let’s start easy. Loading and storing memory in LLVM is usually done in two stages:
- Compute address with “get element pointer”
- Load/store with CreateLoad/CreateStore
The backend is responsible for translating this according to the memory addressing the CPU can deal with. Get element pointer translates very naturally to C-like pointer arithmetic. We don’t need to access in byte offsets and all sorts of gunk, but array elements and struct members. Let’s try an example with structs.
struct Foo
{
int a;
float b;
int c;
float d;
};
int LLVMAdd(Foo *foo)
{
foo[1].c += 10;
}
Type *int32 = Type::getInt32Ty(context);
Type *float32 = Type::getFloatTy(context);
Type *struct_types[] = {
int32,
float32,
int32,
float32
};
Type *struct_type = StructType::get(context, struct_types, false);
Type *p_struct_type = PointerType::get(struct_type, 0);
Type *result_type = int32;
Type *argument_types[] = { p_struct_type };
// ...
// ...
BasicBlock *bb = BasicBlock::Create(context, "entry", function);
IRBuilder<> builder(bb);
Value *arg0 = &function->arg_begin()[0];
Value *indices[] = {
ConstantInt::get(int32, 1), // pointer -> array index
ConstantInt::get(int32, 2), // struct -> member index
};
Value *ptr = builder.CreateInBoundsGEP(arg0, indices, "ptr");
Value *loaded = builder.CreateLoad(ptr, "loaded");
Value *added = builder.CreateAdd(
loaded,
ConstantInt::get(int32, 10), "added");
builder.CreateStore(added, ptr);
builder.CreateRet(added);
; ModuleID = 'mymodule'
source_filename = "mymodule"
define i32 @LLVMAdd({ i32, float, i32, float }*) {
entry:
%ptr = getelementptr inbounds { i32, float, i32, float }, { i32, float, i32, float }* %0, i32 1, i32 2
%loaded = load i32, i32* %ptr
%added = add i32 %loaded, 10
store i32 %added, i32* %ptr
ret i32 %added
}
0000000000000000 <LLVMAdd>:
0: 8b 47 18 mov 0x18(%rdi),%eax
3: 83 c0 0a add $0xa,%eax
6: 89 47 18 mov %eax,0x18(%rdi)
9: c3 retq
The mystical PHI node
I can’t really talk about SSA without mentioning PHI nodes. SSA values do not live in memory, so what happens when the control flow changes? Let’s look at an example function:
int foo(int a, int b, int c)
{
int v;
if (a > 0)
v = b + c;
else
v = b - c;
return v + a;
}
If we translate this directly to IR, we’ll see that v suddenly has two versions of it, but SSA says *single* static assignment. How is this resolved? A PHI node is used. The purpose of this node is to pull together multiple versions of a value into one, and you specify all basic blocks which can branch into your block. This is rather bizarre, since it’s kinda like an inverse goto.
Of course, we could translate this to memory allocated on the stack and just store to v rather than deal with this, and have LLVM optimize it. For completion, let’s try that first.
BasicBlock *bb = BasicBlock::Create(context, "entry", function);
BasicBlock *true_path = BasicBlock::Create(context, "true", function);
BasicBlock *false_path = BasicBlock::Create(context, "false", function);
BasicBlock *merge = BasicBlock::Create(context, "merge", function);
IRBuilder<> builder(bb);
Value *arg0 = &function->arg_begin()[0];
Value *arg1 = &function->arg_begin()[1];
Value *arg2 = &function->arg_begin()[2];
Value *v = builder.CreateAlloca(int32, ConstantInt::get(int32, 1),
"alloca");
Value *cmp = builder.CreateICmpSGT(arg0, ConstantInt::get(int32, 0),
"cmp");
BranchInst::Create(true_path, false_path, cmp, bb);
builder.SetInsertPoint(true_path);
Value *bc_add = builder.CreateAdd(arg1, arg2, "b_plus_c");
builder.CreateStore(bc_add, v);
BranchInst::Create(merge, true_path);
builder.SetInsertPoint(false_path);
Value *bc_sub = builder.CreateSub(arg1, arg2, "b_minus_c");
builder.CreateStore(bc_sub, v);
BranchInst::Create(merge, false_path);
builder.SetInsertPoint(merge);
Value *loaded = builder.CreateLoad(v, "loaded");
Value *added = builder.CreateAdd(loaded, arg0, "added");
builder.CreateRet(added);
source_filename = "mymodule"
define i32 @LLVMAdd(i32, i32, i32) {
entry:
%alloca = alloca i32
%cmp = icmp sgt i32 %0, 0
br i1 %cmp, label %true, label %false
true: ; preds = %entry
%b_plus_c = add i32 %1, %2
store i32 %b_plus_c, i32* %alloca
br label %merge
false: ; preds = %entry
%b_minus_c = sub i32 %1, %2
store i32 %b_minus_c, i32* %alloca
br label %merge
merge: ; preds = %false, %true
%loaded = load i32, i32* %alloca
%added = add i32 %loaded, %0
ret i32 %added
}
If we just give this to llc we see that it is sadly using stack space.
0000000000000000 <LLVMAdd>:
0: 85 ff test %edi,%edi
2: 7e 0d jle 11 <LLVMAdd+0x11>
4: 01 d6 add %edx,%esi
6: 89 74 24 fc mov %esi,-0x4(%rsp)
a: 03 7c 24 fc add -0x4(%rsp),%edi
e: 89 f8 mov %edi,%eax
10: c3 retq
11: 29 d6 sub %edx,%esi
13: 89 74 24 fc mov %esi,-0x4(%rsp)
17: 03 7c 24 fc add -0x4(%rsp),%edi
1b: 89 f8 mov %edi,%eax
1d: c3 retq
What’s going on, we gave llc the -O3 option? Well, we didn’t optimize the IR first. This problem is extremely common when compiling C and C++ code of course, so we can optimize in the IR domain. Here we can do the classic “mem2reg” optimization, which replaces stack space with SSA values, which is much easier for the backend to deal with.
$ opt -O3 -o test.bc test.ll
$ llvm-dis test.bc -o optimized.ll
$ cat optimized.ll
; ModuleID = 'test.bc'
source_filename = "mymodule"
; Function Attrs: norecurse nounwind readnone
define i32 @LLVMAdd(i32, i32, i32) local_unnamed_addr #0 {
entry:
%cmp = icmp sgt i32 %0, 0
%3 = sub i32 0, %2
%alloca.0.p = select i1 %cmp, i32 %2, i32 %3
%alloca.0 = add i32 %1, %0
%added = add i32 %alloca.0, %alloca.0.p
ret i32 %added
}
attributes #0 = { norecurse nounwind readnone }
0000000000000000 <LLVMAdd>:
0: 89 d1 mov %edx,%ecx
2: f7 d9 neg %ecx
4: 85 ff test %edi,%edi
6: 0f 4f ca cmovg %edx,%ecx
9: 8d 04 3e lea (%rsi,%rdi,1),%eax
c: 01 c8 add %ecx,%eax
e: c3 retq
That’s pretty nifty, and it even replaced the branch with select, but it also rewrote the entire function because it found some other optimizations. Let’s not rely on opt and rewrite this without stack space using PHI.
IRBuilder<> builder(bb);
Value *arg0 = &function->arg_begin()[0];
Value *arg1 = &function->arg_begin()[1];
Value *arg2 = &function->arg_begin()[2];
Value *cmp = builder.CreateICmpSGT(arg0, ConstantInt::get(int32, 0), "cmp");
BranchInst::Create(true_path, false_path, cmp, bb);
builder.SetInsertPoint(true_path);
Value *bc_add = builder.CreateAdd(arg1, arg2, "b_plus_c");
BranchInst::Create(merge, true_path);
builder.SetInsertPoint(false_path);
Value *bc_sub = builder.CreateSub(arg1, arg2, "b_minus_c");
BranchInst::Create(merge, false_path);
builder.SetInsertPoint(merge);
PHINode *phi = builder.CreatePHI(int32, 2, "phi");
phi->addIncoming(bc_add, true_path);
phi->addIncoming(bc_sub, false_path);
Value *added = builder.CreateAdd(phi, arg0, "added");
builder.CreateRet(added);
; ModuleID = 'mymodule'
source_filename = "mymodule"
define i32 @LLVMAdd(i32, i32, i32) {
entry:
%cmp = icmp sgt i32 %0, 0
br i1 %cmp, label %true, label %false
true: ; preds = %entry
%b_plus_c = add i32 %1, %2
br label %merge
false: ; preds = %entry
%b_minus_c = sub i32 %1, %2
br label %merge
merge: ; preds = %false, %true
%phi = phi i32 [ %b_plus_c, %true ], [ %b_minus_c, %false ]
%added = add i32 %phi, %0
ret i32 %added
}
0000000000000000 <LLVMAdd>:
0: 85 ff test %edi,%edi
2: 7e 07 jle b <LLVMAdd+0xb>
4: 01 d6 add %edx,%esi
6: 01 fe add %edi,%esi
8: 89 f0 mov %esi,%eax
a: c3 retq
b: 29 d6 sub %edx,%esi
d: 01 fe add %edi,%esi
f: 89 f0 mov %esi,%eax
11: c3 retq
This is a very pure translation of our IR, nice.
Calling functions
This is quite straight forward fortunately. Let’s replace the two branches with actual function calls. You can probably tell by now what the C code was supposed to look like.
FunctionType *function_type = FunctionType::get(result_type, argument_types, vararg);
Function *function = Function::Create(
function_type,
Function::ExternalLinkage,
"LLVMAdd",
module);
Function *true_function = Function::Create(
function_type,
Function::ExternalLinkage,
"LLVMTrue",
module);
Function *false_function = Function::Create(
function_type,
Function::ExternalLinkage,
"LLVMFalse", module);
BasicBlock *bb = BasicBlock::Create(context, "entry", function);
BasicBlock *true_path = BasicBlock::Create(context, "true", function);
BasicBlock *false_path = BasicBlock::Create(context, "false", function);
BasicBlock *merge = BasicBlock::Create(context, "merge", function);
IRBuilder<> builder(bb);
Value *arg0 = &function->arg_begin()[0];
Value *arg1 = &function->arg_begin()[1];
Value *arg2 = &function->arg_begin()[2];
Value *call_args[] = { arg0, arg1, arg2 };
Value *cmp = builder.CreateICmpSGT(arg0, ConstantInt::get(int32, 0), "cmp");
BranchInst::Create(true_path, false_path, cmp, bb);
builder.SetInsertPoint(true_path);
Value *true_call = builder.CreateCall(true_function, call_args);
BranchInst::Create(merge, true_path);
builder.SetInsertPoint(false_path);
Value *false_call = builder.CreateCall(false_function, call_args);
BranchInst::Create(merge, false_path);
builder.SetInsertPoint(merge);
PHINode *phi = builder.CreatePHI(int32, 2, "phi");
phi->addIncoming(true_call, true_path);
phi->addIncoming(false_call, false_path);
Value *added = builder.CreateAdd(phi, arg0, "added");
builder.CreateRet(added);
; ModuleID = 'mymodule'
source_filename = "mymodule"
define i32 @LLVMAdd(i32, i32, i32) {
entry:
%cmp = icmp sgt i32 %0, 0
br i1 %cmp, label %true, label %false
true: ; preds = %entry
%3 = call i32 @LLVMTrue(i32 %0, i32 %1, i32 %2)
br label %merge
false: ; preds = %entry
%4 = call i32 @LLVMFalse(i32 %0, i32 %1, i32 %2)
br label %merge
merge: ; preds = %false, %true
%phi = phi i32 [ %3, %true ], [ %4, %false ]
%added = add i32 %phi, %0
ret i32 %added
}
declare i32 @LLVMTrue(i32, i32, i32)
declare i32 @LLVMFalse(i32, i32, i32)
Here we’re trying to call functions which haven’t been defined in the module. It is up to the linker to resolve this later. Let’s see what happens if we build this as a dynamic library.
0000000000001000 <.plt>:
1000: ff 35 02 30 00 00 pushq 0x3002(%rip) # 4008 <_GLOBAL_OFFSET_TABLE_+0x8>
1006: ff 25 04 30 00 00 jmpq *0x3004(%rip) # 4010 <_GLOBAL_OFFSET_TABLE_+0x10>
100c: 0f 1f 40 00 nopl 0x0(%rax)
0000000000001010 <LLVMFalse@plt>:
1010: ff 25 02 30 00 00 jmpq *0x3002(%rip) # 4018 <LLVMFalse>
1016: 68 00 00 00 00 pushq $0x0
101b: e9 e0 ff ff ff jmpq 1000 <.plt>
0000000000001020 <LLVMTrue@plt>:
1020: ff 25 fa 2f 00 00 jmpq *0x2ffa(%rip) # 4020 <LLVMTrue>
1026: 68 01 00 00 00 pushq $0x1
102b: e9 d0 ff ff ff jmpq 1000 <.plt>
Disassembly of section .text:
0000000000001030 <LLVMAdd>:
1030: 53 push %rbx
1031: 89 fb mov %edi,%ebx
1033: 85 ff test %edi,%edi
1035: 7e 0b jle 1042 <LLVMAdd+0x12>
1037: 89 df mov %ebx,%edi
1039: e8 e2 ff ff ff callq 1020 <LLVMTrue@plt>
103e: 01 d8 add %ebx,%eax
1040: 5b pop %rbx
1041: c3 retq
1042: 89 df mov %ebx,%edi
1044: e8 c7 ff ff ff callq 1010 <LLVMFalse@plt>
1049: 01 d8 add %ebx,%eax
104b: 5b pop %rbx
104c: c3 retq
The noise before the function is dynamic library gunk. If we had linked with –no-undefined, we would see:
ld: test.o: in function `LLVMAdd':
mymodule:(.text+0xa): undefined reference to `LLVMTrue'
ld: mymodule:(.text+0x15): undefined reference to `LLVMFalse'
This is basically how we will call into our emulator host code to do various things. Handle syscalls, deal with function calls to other emulated code, etc. The possibilities are endless now. When we JIT, we can pass function pointers of our own host code into the symbol resolver and the JIT can patch in straight function calls into the code. Nifty!
At this point, with some searching around, you should be able to figure out how to generate the IR you want. We’ve covered the basics I think.
JIT compilation
JIT-ing these llvm::Modules is in theory quite straight forward, there’s just a lot of API noise to go through. We create:
- llvm::LLVMContext
- llvm::orc::ExecutionSession
- llvm::orc::RTDyldObjectLinkingLayer
- llvm::orc::IRCompileLayer<>
- llvm::TargetMachine
- llvm::orc::MangleAndInterner
- llvm::DataLayout
There is too much code to deal with, just have a look at the gists I made instead for now 😀 This should be reusable if you want to play around with it.
For LLVM 7, you’ll want to define JITTER_LLVM_VERSION_LEGACY. For LLVM 8, don’t.
The API usage should be fairly simple.
int my_add(int a, int b) { return a + b };
// ...
using namespace llvm;
JITTIR::Jitter jitter;
// Let JIT link against this symbol.
jitter.add_external_symbol("my_add", my_add);
// Allocate a module.
auto module = jitter.create_module("mymodule");
auto &context = module->getContext();
// Build the IR
Type *int32 = Type::getInt32Ty(context);
Type *result_type = int32;
Type *argument_types[] = { int32, int32 };
bool vararg = false;
FunctionType *function_type = FunctionType::get(
result_type,
argument_types,
vararg);
Function *function = Function::Create(
function_type,
Function::ExternalLinkage,
"LLVMAdd",
*module);
Function *my_add_function = Function::Create(
function_type,
Function::ExternalLinkage,
"my_add",
*module);
BasicBlock *bb = BasicBlock::Create(context, "entry", function);
IRBuilder<> builder(bb);
Value *arg0 = &function->arg_begin()[0];
Value *arg1 = &function->arg_begin()[1];
Value *args[] = { arg0, arg1 };
Value *added = builder.CreateCall(my_add_function, args, "added");
builder.CreateRet(added);
// Add the module to the JIT, it's immediately compiled.
jitter.add_module(std::move(module));
// Get function pointer to symbol and execute it.
auto *fn_ptr =
reinterpret_cast<int (*)(int, int)>(jitter.get_symbol_address("LLVMAdd"));
printf("10 + 20 = %d\n", fn_ptr(10, 20));
Which should print 30.
(gdb) disas fn_ptr,fn_ptr+20
Dump of assembler code from 0x7fb9d7bcc000 to 0x7fb9d7bcc014:
0x00007fb9d7bcc000: push %rax
0x00007fb9d7bcc001: movabs $0x55d0584c33ea,%rax
0x00007fb9d7bcc00b: callq *%rax
0x00007fb9d7bcc00d: pop %rcx
0x00007fb9d7bcc00e: retq
0x00007fb9d7bcc00f: add %al,(%rax) <- zero memory
0x00007fb9d7bcc011: add %al,(%rax)
0x00007fb9d7bcc013: add %al,(%rax)
This is just the bare minimum, simplest approach we can take to JIT compilation in LLVM, but it works. The code which calls external functions seems a bit strange though, but it might be this way so it’s easy to patch in other call addresses later, I’m not sure.
Conclusion
Hopefully this was informative at least. In part 3, we will use these tools at our disposal and bring up a MIPS.