MLIR & JLCS Dialect
For Julia developers: why this page matters
Julia doesn't ship DWARF tools, an IR sanitizer, or a way to call llvm-as from a package. RepliBuild fills that gap — and this page documents the piece that handles the cases ccall can't: packed structs, virtual method dispatch, strided array views, and unions. If your wrapped function uses any of those, its generated code goes through the MLIR pipeline described here (Tier 2). You don't need to understand MLIR to use RepliBuild — tier selection is automatic — but this page explains what happens under the hood when ccall isn't safe.
Background: what is MLIR?
MLIR (Multi-Level Intermediate Representation) is a compiler infrastructure developed as part of the LLVM project. Unlike traditional compilers that operate on a single IR (e.g., LLVM IR), MLIR supports multiple levels of abstraction through user-defined dialects — each dialect defines its own types, operations, and semantics. Dialects can be progressively lowered from high-level domain-specific operations down to LLVM IR and then to native machine code.
MLIR is used in production by TensorFlow (MHLO dialect), PyTorch (Torch-MLIR), and hardware compilers (CIRCT). In the Julia ecosystem, Enzyme's Reactant uses MLIR to optimize IR. RepliBuild uses MLIR differently — not for optimization, but for safe ABI marshalling. C++ ABI interop involves operations (struct field access at byte offsets, vtable-based virtual dispatch, strided array views) that are error-prone to express directly as LLVM IR but natural to represent as structured, typed MLIR operations.
Reference: MLIR Language Reference, Defining Dialects
Why a custom dialect?
When RepliBuild's cross-verification detects that a struct's DWARF size doesn't match Julia's alignment calculation (i.e., the struct is packed), or encounters virtual methods or unions, it can't emit a safe ccall. These cases need machine code that respects the exact byte offsets from DWARF. That's what JLCS does.
Concretely, calling a C++ virtual method from Julia requires:
- Reading the vtable pointer from the object at a known byte offset
- Indexing into the vtable to get the function pointer for the correct slot
- Calling that function pointer with the correct calling convention (sret for struct returns, pointer-to-value for arguments)
Encoding this as raw LLVM IR is possible but fragile — byte offsets must be manually computed, pointer casts must be correct, and struct return conventions vary by platform. A single mistake produces silent memory corruption.
The JLCS dialect expresses these operations as typed, verifiable IR that the MLIR framework can validate, optimize, and lower to correct LLVM IR automatically. The dialect also carries ABI metadata (field offsets, packing flags, struct sizes) that would be lost if emitted directly as LLVM IR.
JLCS dialect specification
JLCS (Julia C-Struct) is a custom MLIR dialect that models C-ABI-compatible struct layout and foreign function execution. It is the core of Tier 2 dispatch.
Source files:
| File | Role |
|---|---|
src/mlir/JLCSDialect.td | Dialect registration and namespace (jlcs) |
src/mlir/JLCSOps.td | Operation definitions |
src/mlir/Types.td | Type definitions |
src/mlir/JLInterfaces.td | Interface definitions |
src/mlir/impl/ | C++ implementations for operation verification and lowering |
Type system
The JLCS dialect defines two custom types.
!jlcs.c_struct — C-ABI-compatible struct
Defined in: src/mlir/Types.td
Models a C struct with explicit field types, byte offsets, and a packing flag. This type carries the full ABI contract — the MLIR lowering uses these offsets to generate correct getelementptr instructions regardless of platform alignment rules.
TableGen definition:
def CStructType : JLCS_Type<"CStruct", "c_struct"> {
let parameters = (ins
"StringAttr":$juliaTypeName,
ArrayRefParameter<"Type", "field types">:$fieldTypes,
"ArrayAttr":$fieldOffsets,
"bool":$isPacked
);
}Parameters:
| Parameter | Type | Description |
|---|---|---|
juliaTypeName | StringAttr | Julia-side type name (e.g., "MyModule.Outer") |
fieldTypes | Type[] | Ordered list of MLIR types for each field |
fieldOffsets | ArrayAttr of i64 | Byte offset of each field from struct base |
isPacked | bool | Whether the struct uses __attribute__((packed)) layout |
MLIR syntax:
!jlcs.c_struct<"MyStruct", [i32, i64, f64], [0 : i64, 4 : i64, 12 : i64], packed = false>This declares a struct MyStruct with three fields: an i32 at byte offset 0, an i64 at offset 4, and an f64 at offset 12. The packed = false flag indicates standard alignment rules apply.
!jlcs.array_view — strided multi-dimensional array descriptor
Defined in: src/mlir/Types.td
A universal array descriptor for zero-copy interop with Julia arrays, NumPy ndarrays, and C++ containers. The rank (number of dimensions) is a compile-time constant; the actual dimensions and strides are runtime values.
TableGen definition:
def ArrayViewType : JLCS_Type<"ArrayView", "array_view"> {
let parameters = (ins
"Type":$elementType,
"unsigned":$rank
);
}Runtime memory layout:
struct ArrayView {
T* data_ptr; // offset 0: pointer to element data
int64_t* dims_ptr; // offset 8: pointer to dimension sizes
int64_t* strides_ptr; // offset 16: pointer to stride values (in elements)
int64_t rank; // offset 24: number of dimensions
};MLIR syntax:
!jlcs.array_view<f64, 3> // 3D array of float64This layout is compatible with Julia's Array (column-major strides), NumPy's ndarray (arbitrary strides), and C++ row-major arrays, enabling zero-copy data sharing across language boundaries.
Operations
The JLCS dialect defines seven operations, all specified in src/mlir/JLCSOps.td.
jlcs.type_info — register struct type and layout
Declares a CStruct type and its C++ base class mapping. Placed in the module's top-level region as a module-scope declaration.
jlcs.type_info "Base",
!jlcs.c_struct<"Base", [!llvm.ptr, i32, i32],
[0 : i64, 8 : i64, 12 : i64], packed = false>, ""| Argument | Type | Description |
|---|---|---|
typeName | StrAttr | Julia-side type name |
structType | TypeAttr | Must be a CStructType |
superType | StrAttr | Base class name (empty string if none) |
The superType field enables the MLIR lowering to handle C++ inheritance chains — base class members are flattened into the derived struct at their correct offsets.
jlcs.get_field — read a struct field
Read a field at a byte offset from a C struct pointer.
%value = jlcs.get_field %struct_ref { fieldOffset = 4 : i64 } : (!llvm.ptr) -> i32Lowers to a getelementptr + load sequence with the correct byte offset. The field type is carried in the operation's result type, ensuring type safety through the lowering pipeline.
jlcs.set_field — write a struct field
Write a value at a byte offset into a C struct pointer.
jlcs.set_field %struct_ref, %new_value { fieldOffset = 4 : i64 } : (!llvm.ptr, i32) -> ()Lowers to a getelementptr + store sequence.
jlcs.vcall — virtual method dispatch
Call a C++ virtual method via vtable lookup. This is the operation that makes Tier 2 dispatch possible for polymorphic C++ classes.
%result = jlcs.vcall @Base::foo(%obj) {vtable_offset = 0 : i64, slot = 0 : i64}
: (!llvm.ptr) -> i32| Argument | Type | Description |
|---|---|---|
class_name | SymbolRefAttr | Class name for the vtable |
args | Variadic<AnyType> | Arguments (first is always the object pointer) |
vtable_offset | I64Attr | Byte offset of the vptr within the object (usually 0) |
slot | I64Attr | Index into the vtable function pointer array |
Lowering semantics:
- Load vtable pointer from object at
vtable_offset - Load function pointer from
vtable[slot] - Call the function pointer with the object pointer + remaining arguments
jlcs.load_array_element — strided array read
Read an element from a multi-dimensional strided array.
%elem = jlcs.load_array_element %view[%i, %j, %k] : !jlcs.array_view<f64, 3> -> f64Index computation: linear_offset = sum(index_i * stride_i) for each dimension. This supports both row-major and column-major layouts depending on the stride values.
jlcs.store_array_element — strided array write
Write an element to a multi-dimensional strided array.
jlcs.store_array_element %value, %view[%i, %j] : f64, !jlcs.array_view<f64, 2>jlcs.ffe_call — foreign function execution
Call an external C function using FFE (Foreign Function Execution) metadata.
%result = jlcs.ffe_call(%arg0, %arg1) : (i32, !llvm.ptr) -> i32This is a general-purpose foreign call operation used for non-virtual C functions that still require MLIR-level ABI handling (e.g., struct return conventions).
IR generation pipeline
The path from compiled C++ binary to executable MLIR thunks involves three stages.
Stage 1: DWARF to structured metadata
Module: src/DWARFParser.jl
llvm-dwarfdump is invoked on the compiled binary. The parser extracts ClassInfo, VtableInfo, and VirtualMethod structs from the DWARF tags (DW_TAG_class_type, DW_TAG_subprogram, DW_TAG_inheritance, etc.).
Stage 2: metadata to MLIR IR text
Module: src/JLCSIRGenerator.jl, src/ir_gen/ submodules
The IR generator transforms parsed DWARF metadata into MLIR source text. Each submodule handles a specific concern:
| Submodule | Input | Output |
|---|---|---|
ir_gen/TypeUtils.jl | C++ type string | MLIR type string |
ir_gen/StructGen.jl | ClassInfo + members | jlcs.type_info operation |
ir_gen/FunctionGen.jl | VirtualMethod | func.func @thunk_... wrapper |
ir_gen/STLContainerGen.jl | STL method metadata | Accessor thunks for size(), data(), etc. |
Type mapping (src/ir_gen/TypeUtils.jl):
| C++ Type | MLIR Type |
|---|---|
double | f64 |
float | f32 |
int, unsigned int | i32 |
long, long long | i64 |
char, int8_t | i8 |
void | none |
T*, T& | !llvm.ptr |
std::vector<T> | !llvm.ptr (opaque) |
| Unknown | !llvm.ptr (fallback) |
Complete generated module example:
For a C++ class Base with virtual methods foo() and bar(int), the IR generator produces:
module {
// 1. External dispatch declarations (resolved by the JIT linker)
llvm.func @_ZN4Base3fooEv(!llvm.ptr) -> i32
llvm.func @_ZN4Base3barEv(!llvm.ptr, i32) -> i32
// 2. Type info (registers struct layout with the dialect)
jlcs.type_info "Base",
!jlcs.c_struct<"Base", [!llvm.ptr, i32, i32],
[0 : i64, 8 : i64, 12 : i64], packed = false>, ""
// 3. Thunk wrappers (bridge Julia calling convention to C++ ABI)
func.func @thunk__ZN4Base3fooEv(%arg0: !llvm.ptr) -> i32 {
%result = llvm.call @_ZN4Base3fooEv(%arg0) : (!llvm.ptr) -> i32
return %result : i32
}
func.func @thunk__ZN4Base3barEv(%arg0: !llvm.ptr, %arg1: i32) -> i32 {
%result = llvm.call @_ZN4Base3barEv(%arg0, %arg1) : (!llvm.ptr, i32) -> i32
return %result : i32
}
}The llvm.func declarations at the top tell the JIT execution engine to resolve these symbols from the loaded shared library at link time. The func.func thunk wrappers provide the MLIR ciface entry points that the Julia-side JITManager.invoke() calls into.
Stage 3: MLIR to machine code
Module: src/MLIRNative.jl
The generated MLIR text is:
- Parsed into an MLIR module via
MLIRNative.parse_module() - Lowered through the MLIR pass pipeline:
jlcsdialect →funcdialect →llvmdialect → LLVM IR - JIT-compiled to native machine code by
MLIRExecutionEngine - Symbol-resolved: External symbols (
llvm.funcdeclarations) are linked against the loaded shared library
The lower_to_llvm() function in MLIRNative drives the full lowering pass pipeline. MLIR dependencies used:
| MLIR Component | Role |
|---|---|
MLIRExecutionEngine | JIT compilation and execution |
MLIRTargetLLVMIRExport | MLIR module to LLVM IR translation |
MLIRLLVMToLLVMIRTranslation | LLVM dialect lowering to native LLVM IR |
JIT manager
Module: src/JITManager.jl
The JIT manager provides the runtime execution path for Tier 2 functions. It is a singleton (GLOBAL_JIT) that manages the MLIR context, JIT execution engine, and compiled symbol cache.
Architecture
+---------------------------------------------------+
| GLOBAL_JIT (singleton) |
| |
| mlir_ctx -> Ptr{Cvoid} (MLIR context) |
| jit_engine -> Ptr{Cvoid} (execution engine) |
| compiled_symbols -> Dict{String, Ptr{Cvoid}} |
| vtable_info -> VtableInfo |
| lock -> ReentrantLock |
+---------------------------------------------------+Lock-free lookup (double-check pattern)
invoke("_mlir_ciface_foo_thunk", RetType, args...)
|
v
_lookup_cached(func_name)
|
+-- FAST PATH: Dict read (no lock) --> cache hit -> return Ptr
|
+-- SLOW PATH: lock -> double-check -> MLIRNative.lookup() -> cache -> return Ptr- Hot path (cached): Single
Dictread with no synchronization. Julia'sDictis safe for concurrent reads under a single-writer pattern. - Cold path (first call): Lock acquisition, JIT symbol resolution via
MLIRNative.lookup(), cache insertion. Only happens once per symbol over the lifetime of the process.
Calling convention
All Tier 2 functions use a unified calling convention for MLIR ciface thunks:
| Return type | Signature |
|---|---|
| Scalar | T ciface(void** args_ptr) |
| Struct | void ciface(T* sret_buf, void** args_ptr) |
| Void | void ciface(void** args_ptr) |
Arguments are passed as pointers to values via Ref{T} conversion:
inner_ptrs = [ptr_to_arg1, ptr_to_arg2, ..., ptr_to_argN]Arity specialization
To avoid heap-allocating Any[] for common small argument counts, the JIT manager provides hand-specialized invoke methods for 0 through 4 arguments. Each creates stack-allocated Refs and a fixed-size Ptr{Cvoid}[], avoiding all boxing:
function invoke(func_name::String, ::Type{T}, a1, a2) where T
fptr = _lookup_cached(func_name)
r1 = Ref(a1); r2 = Ref(a2)
inner_ptrs = Ptr{Cvoid}[
Base.unsafe_convert(Ptr{Cvoid}, r1),
Base.unsafe_convert(Ptr{Cvoid}, r2)
]
GC.@preserve r1 r2 begin
return _invoke_call(fptr, T, inner_ptrs)
end
endA variadic fallback handles 5+ arguments with dynamic allocation.
Return type dispatch is resolved at compile time via @generated:
isprimitivetype(T)→ directccallreturn- Otherwise →
sretbuffer allocation,ccallwith out-pointer, dereference
Building the dialect
The JLCS MLIR dialect is built as a shared library (libJLCS.so) via CMake with TableGen code generation.
Prerequisites: LLVM 21+ development headers, CMake 3.20+, mlir-tblgen
cd src/mlir
./build.sh
# Produces: src/mlir/build/libJLCS.soThe build configuration (src/mlir/CMakeLists.txt) processes the .td TableGen definitions to generate C++ header and source files, then links the dialect implementation with whole-archive semantics so the JIT execution engine can discover and register the dialect at runtime.
Build dependencies:
| MLIR Library | Role |
|---|---|
MLIRExecutionEngine | JIT compilation engine |
MLIRTargetLLVMIRExport | MLIR to LLVM IR export |
MLIRLLVMToLLVMIRTranslation | LLVM dialect to native IR |
libJLCS.so is only required for Tier 2 dispatch. If it is not built, Tier 1 (ccall / llvmcall) still works for all POD-safe functions. Run RepliBuild.check_environment() to verify which tiers are available on your system.
MLIRNative API reference
RepliBuild.MLIRNative provides the low-level Julia bindings to the MLIR C API.
Context and modules
RepliBuild.MLIRNative.create_context — Function
create_context() -> MlirContextCreate a new MLIR context and register the JLCS dialect.
The context must be destroyed with destroy_context() when done.
RepliBuild.MLIRNative.destroy_context — Function
destroy_context(ctx::MlirContext)Destroy an MLIR context and free its resources.
RepliBuild.MLIRNative.@with_context — Macro
@with_context(body)Execute body with an MLIR context, automatically cleaning up afterwards.
Example:
@with_context begin
mod = create_module(ctx)
print_module(mod)
endRepliBuild.MLIRNative.create_module — Function
create_module(ctx::MlirContext, location::MlirLocation) -> MlirModuleCreate an empty MLIR module in the given context.
create_module(ctx::MlirContext) -> MlirModuleCreate an empty MLIR module with unknown location.
RepliBuild.MLIRNative.parse_module — Function
parse_module(ctx::MlirContext, source::String) -> MlirModuleParse an MLIR module from a string.
RepliBuild.MLIRNative.clone_module — Function
clone_module(mod::MlirModule) -> MlirModuleClone an MLIR module.
RepliBuild.MLIRNative.print_module — Function
print_module(mlir_module::MlirModule)Print an MLIR module to stdout.
JIT execution
RepliBuild.MLIRNative.create_jit — Function
create_jit(module::MlirModule; opt_level=2, dump_object=false, shared_libs=String[]) -> MlirExecutionEngineCreate a JIT execution engine for the module. Automatically attaches host data layout. Pass shared_libs to register shared libraries for symbol resolution (e.g., the C++ library whose functions are called by JIT thunks).
RepliBuild.MLIRNative.destroy_jit — Function
destroy_jit(jit::MlirExecutionEngine)Destroy the JIT execution engine.
RepliBuild.MLIRNative.register_symbol — Function
register_symbol(jit::MlirExecutionEngine, name::String, addr::Ptr{Cvoid})Register a runtime address (symbol) with the JIT. Call this BEFORE invoking JIT functions that rely on external symbols.
RepliBuild.MLIRNative.lookup — Function
lookup(jit::MlirExecutionEngine, name::String) -> Ptr{Cvoid}Lookup a function address in the JIT.
RepliBuild.MLIRNative.jit_invoke — Function
jit_invoke(jit::MlirExecutionEngine, name::String, args::Vector{Any})Invoke a JIT function with arguments. Note: Arguments must be pointers to the actual values (double indirection).
RepliBuild.MLIRNative.invoke_safe — Function
invoke_safe(jit::MlirExecutionEngine, mod::MlirModule, name::String, args...)Safely invoke a JIT function by verifying argument types against the MLIR module signature.
Transformations
RepliBuild.MLIRNative.lower_to_llvm — Function
lower_to_llvm(module::MlirModule) -> BoolRun standard lowering passes (Func -> LLVM, Arith -> LLVM) on the module. Returns true on success.
Diagnostics
RepliBuild.MLIRNative.test_dialect — Function
test_dialect()Test that the JLCS dialect loads and works correctly.
This creates a context, loads the dialect, and verifies basic functionality.