Mono:Runtime:Documentation:LLVM

LLVM Backend

Mono now includes an experimental backend which compiles methods to native code using LLVM instead of the built in JIT.

Usage

The llvm back end can be enabled by passing --enable-llvm=yes to configure. LLVM 2.6 or later is required.

Architecture

The backend works as follows:

  • first, normal mono JIT IR is generated from the IL code
  • the IR is transformed to SSA form
  • the IR is converted to the LLVM IR
  • the LLVM IR is compiled by LLVM into native code

LLVM is accessed through the LLVM C binding.

The backend doesn't currently support all IL features, like exception handlers. Methods using such features are compiled using the normal mono JIT. Thus LLVM compiled and JITted code can coexist in the same process.

Sources

The backend is in the files mini-llvm.c and mini-llvm-cpp.c. The former contains the bulk of the backend, while the latter contains c++ code which is needed because of deficiencies in the LLVM C binding which the backend uses.

The LLVM Mono Branch

We have a branch of LLVM with various modifications to work around LLVM restrictions. The branch is named 'mono', and it is in a GIT repo at:

http://github.com/mono/llvm

This is the recommended version of LLVM which should be used with mono. When using this version, the LLVM backend can compile about 99% of mscorlib methods.

The branch currently contains the following changes:

  • additional mono specific calling conventions.
  • support for loads/stores which can fault using LLVM intrinsics.
  • support for saving the stack locations of some variables into the exception handling info emitted by LLVM.
  • support for stores into TLS on x86.
  • the LLVM version string is changed to signal that this is a branch, i.e. it looks like "2.8svn-mono".
  • workarounds to force LLVM to generate direct calls on amd64.
  • support for passing a blockaddress value as a parameter.

The GIT repo is forked from the unofficial LLVM git mirror at:

http://github.com/earl/llvm-mirror

To merge changes from llvm-mirror to this repo, do:

git remote add llvm-mirror http://github.com/earl/llvm-mirror.git
git pull llvm-mirror master:master
git merge master

The GIT repo contains the following branches:

  • 'master' is the original LLVM code
  • 'mono' is branched off master and contains our changes
  • 'mono-2-8' is a version which works with mono 2.8

To view all changes, use:

git diff origin/master..mono

Restrictions

There are a number of constructs that are not supported by the LLVM backend. In those cases the Mono code generation engine will fall back to Mono's default compilation engine.

Exception Handlers

These are currently not supported when using stock LLVM, mainly because LLVM doesn't support implicit exceptions thrown by the execution of instructions.

An implicit exception is for example a NullReferenceException that would be raised when you access an invalid memory location, typically in Mono and .NET, an uninitialized pointer.

Generics sharing

The main problem here is the hidden rgctx argument passed to/received by generic shared methods. We can't force LLVM to pass this argument, which is passed in an extra non-ABI register in mono.

Implementation details

Virtual calls

The problem here is that the trampoline handing virtual calls needs to be able to obtain the vtable address and the offset. This is currently done by an arch specific function named mono_arch_get_vcall_slot_addr (), which works by disassembling the calling code to find out which register contains the vtable address. This doesn't work for LLVM since we can't control the format of the generated code, so disassembly would be very hard. Also, sometimes the code generated by LLVM is such that the vtable address cannot be obtained at all, i.e.:

 mov %rax, <offset>(%rax)
 call %rax

To work around these problems, we use a separate vtable trampoline for each vtable slot index. The trampoline obtains the 'this' argument from the registers/stack, whose location is dicated by the calling convention. The 'this' argument plus the slot index can be used to compute the vtable slot and the called method.

Interface calls

The problem here is that these calls receive a hidden argument called the IMT argument which is passed in a non-ABI register by the JIT, which cannot be done with LLVM. So we call a trampoline instead, which sets the IMT argument, then makes the virtual call.

Unwind info

The JIT needs unwind info to unwind through LLVM generated methods. This is solved by obtaining the exception handling info generated by LLVM, then extracting the unwind info from it.

Exception Handling

There is some support for compiling methods with exception clauses, but it is not enabled yet. LLVM uses the platform specific exception handling abi, which is the c++ ehabi on linux, while we use our home grown exception handling system. To make these two work together, we only use one LLVM EH intrinsic, the llvm.eh.selector intrinsic. This will force LLVM to generate exception handling tables. We decode those tables in mono_unwind_decode_fde () to obtain the addresses of the try-catch clauses, and save those to MonoJitInfo, just as with JIT compiled code. Finally clauses are handled differently than with JITted code. Instead of calling them from mono_handle_exception (), we save the exception handling state in TLS, then branch to them the same way we would branch to a catch handler. the code generated from ENDFINALLY will call mono_resume_unwind (), which will resume exception handling from the information saved in TLS.

Since LLVM has no support for implicit exceptions, LLVM is disabled for methods with clauses whose body contain such instructions.

The LLVM mono branch supports implicit exceptions by adding a bunch of LLVM intrinsics to do loads/stores, and calling them using the LLVM 'invoke' instruction.

It would be useful for monotouch to support AOT+LLVM on ARM. This is currently not possible because LLVM doesn't generate unwind info on ARM. There are multiple ways to fix it:

  • Have LLVM generate dwarf unwind info. The problem here is that the .eh_frame_hdr section, which is needed for fast lookup of unwind info is only generated by GNU linker and the apple linker probably doesn't support it. Also, at least some versions of the GNU linker removed the .eh_frame section when linking.
  • Have LLVM generate ARM unwind info described in the ARM EHABI docs. The problem here is that the GAS directives which emit this unwind info are probably not supported by apple's assembler. It would probably need linker support as well.
  • Generate unwind info in mono specific sections/tables. This is not upstream-able to LLVM, but would solve the problems above.

Generic Sharing

Generic Sharing is only supported when using the LLVM mono branch.

There are two problems here: passing/receiving the hidden rgctx argument passed to some shared methods, and obtaining its value/the value of 'this' during exception handling.

The former is implemented by adding two new mono specific calling conventions which pass the 'rgctx' argument in the non-ABI register where mono expects it, i.e. R10 on amd64. The latter is implemented by marking the variables where these are stored with a mono specific LLVM custom metadata, and modifying LLVM to emit the final stack location of these variables into the exception handling info, where the runtime can retrieve it.

AOT Support

There is some support for using LLVM for AOT compilation. This is implemented by emitting the LLVM IR into a LLVM bytecode file, then using the LLVM llc compiler to compile it, producing a .s file, then we append our normal AOT data structures, plus the code for methods not supported by LLVM to this file.

A runtime which is not configured by --enable-llvm=yes can be made to use LLVM compiled AOT modules by using the --llvm command line argument: mono --llvm hello.exe

Porting the backend to new architectures

The following changes has to be made to port the LLVM backend to a new architecture:

  • Define MONO_ARCH_LLVM_SUPPORTED in mini-<ARCH>.h.
  • Implement mono_arch_get_llvm_call_info () in mini-<ARCH>.h. This function is a variant of the arch specific get_call_info () function, it should return calling convention information for a signature.
  • Define MONO_CONTEXT_SET_LLVM_EXC_REG() in mini-<ARCH>.h to the register used to pass the exception object to LLVM compiled landing pads. This is usually defined by the platform ABI.
  • Make sure the 'this' argument is passed as the first argument to methods, even those that return a valuetype by passing a hidden argument. Define MONO_ARCH_THIS_AS_FIRST_ARG in mini-<ARCH>.h.
  • Implement the LLVM exception throwing trampolines in exceptions-<ARCH>.c. These trampolines differ from the normal ones because they receive the PC address of the throw site, instead of a displacement from the start of the method. See exceptions-amd64.c for an example.
  • Implement the resume_unwind () trampoline, which is similar to the throw trampolines, but instead of throwing an exception, it should call mono_resume_unwind () with the constructed MonoContext.

LLVM problems

Here is a list of problems whose solution would probably require changes to LLVM itself. Some of these problems are solved in various ways by changes on the LLVM Mono Branch.

  • the llvm.sqrt intrinsic doesn't work with NaNs, even through the underlying C function/machine instruction probably works with them. Worse, an optimization pass transforms sqrt(NaN) to 0.0, changing program behaviour, and masking the problem.
  • there is no fabs intrinsic, instead llc seems to replace calls to functions named 'fabs' with the corresponding assembly, even if they are not the fabs from libm ?
  • There is no way to tell LLVM that a result of a load is constant, i.e. in a loop like this:
  for (int i = 0; i < arr.Length; ++i)
     arr [i] = 0

The arr.Length load cannot be moved outside the loop, since the store inside the loop can alias it. There is a llvm.invariant.start/end intrinsic, but that seems to be only useful for marking a memory area as invariant inside a basic block, so it cannot be used to mark a load globally invariant.

http://hlvm.llvm.org/bugs/show_bug.cgi?id=5441

  • LLVM has no support for implicit exceptions:
http://llvm.org/bugs/show_bug.cgi?id=1269
  • LLVM thinks that loads from a NULL address lead to undefined behaviour, while it is quite well defined on most unices (SIGSEGV signal being sent). If an optimization pass determines that the source address of a load is NULL, it changes it to undef/unreachable, changing program behaviour. The only way to work around this seems to be marking all loads as volatile, which probably doesn't help optimizations.
  • There seems to be no way to disable specific optimizations when running 'opt', i.e. do -std-compile-opts except tailcallelim.
  • The x86 JIT seems to generate normal calls as
  mov reg, imm
  call *reg

This makes it hard/impossible to patch the calling address after the called method has been compiled. <p> http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-December/027999.html

  • llc generates invalid c++ LSDA tables when compiling with -relocation-model=pic:
.byte	0x9B  # @TType format (indirect pcrel sdata4)
and later:
.quad	type_info_1  # TypeInfo

http://llvm.org/bugs/show_bug.cgi?id=5977

  • LLVM doesn't emit unwind info on arm, neither the DWARF style, or the ARM style.

ARM Exception Handling ABI (http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ihi0038a/index.html)

GAS Directives to emit ARM unwind info (http://sourceware.org/binutils/docs-2.20/as/ARM-Directives.html#ARM-Directives)

  • LLVM Bugs: [1] (http://llvm.org/bugs/show_bug.cgi?id=6102)