RepliBuild Internals

This section documents the internal modules that power RepliBuild. These are generally not needed for standard usage but are valuable for contributors, advanced integration, or understanding the system's behavior. For the high-level architecture see Architecture.

Wrapper

Source: src/Wrapper.jl, src/Wrapper/

The Wrapper package generates Julia FFI modules from DWARF metadata and binary symbol tables. It is structured as a two-track system: a C generator and a C++ generator, selected automatically via config.wrap.language.

Module layout

ModuleSourceRole
Wrapper.Generatorsrc/Wrapper/Generator.jlTop-level wrap_library() entry point; dispatches to C or C++ generator
Wrapper.TypeRegistrysrc/Wrapper/TypeRegistry.jlTypeRegistry and TypeStrictness — shared type-resolution context
Wrapper.Symbolssrc/Wrapper/Symbols.jlParamInfo / SymbolInfo structs for structured symbol data
Wrapper.FunctionPointerssrc/Wrapper/FunctionPointers.jlDWARF function_ptr(...) signature to Julia @cfunction type string
Wrapper.Utilssrc/Wrapper/Utils.jlKeyword escaping, identifier sanitization shared between generators
Wrapper.C.GeneratorCsrc/Wrapper/C/GeneratorC.jlFull C wrapper generator (structs, enums, functions, LTO, thunks)
Wrapper.C.TypesCsrc/Wrapper/C/TypesC.jlC type heuristics and base type map
Wrapper.Cpp.GeneratorCppsrc/Wrapper/Cpp/GeneratorCpp.jlFull C++ wrapper generator (same feature set + virtual dispatch)
Wrapper.Cpp.TypesCppsrc/Wrapper/Cpp/TypesCpp.jlC++ type map including STL, templates, references
Wrapper.Cpp.IdentifiersCppsrc/Wrapper/Cpp/IdentifiersCpp.jlNamespace stripping, operator sanitization

Language selection

wrap.language is an extensible dispatch key — "c" and "cpp" are the first two targets, with additional language generators planned:

[wrap]
language = "c"   # selects C generator + clang toolchain
language = "cpp" # selects C++ generator + clang++ toolchain (default)

discover() sets this automatically based on the scanned source files. Adding a new language means adding a generator under src/Wrapper/<Lang>/ and registering it in Wrapper/Generator.jl.

Tier selection logic

The function is_ccall_safe() in src/Wrapper.jl is the core dispatch decision. It inspects each function's DWARF metadata and returns true (Tier 1 / ccall) or false (Tier 2 / MLIR).

Checks performed:

  1. STL container types — Any STL type in parameters or return forces Tier 2
  2. Return type safety:
    • Template returns (contains <) → Tier 2 (unpredictable ABI)
    • Struct return by value > 16 bytes → Tier 2 (too large for ccall sret)
    • Non-POD class return → Tier 2
    • Packed struct return (DWARF size != Julia aligned size) → Tier 2
  3. Parameter type safety:
    • Union parameters → Tier 2
    • Packed struct parameters → Tier 2

Functions routed to Tier 2 are further divided between JIT dispatch (JITManager.invoke()) and AOT thunks (ccall to _thunks.so), controlled by the aot_thunks config flag.

Idiomatic wrapper generation

Beyond raw ccall bindings, the wrapper generator clusters related C++ functions by class name to produce idiomatic Julia types:

  1. Factory detection: Functions matching create_X, new_X, make_X, alloc_X, init_X, or returning X* are identified as constructors.
  2. Destructor detection: Functions matching delete_X, destroy_X, free_X, dealloc_X, or X_destroy are identified as destructors.
  3. Method clustering: Functions taking X* as their first parameter and associated with the same DWARF class are grouped as instance methods.

The result is a mutable struct ManagedX with a raw Ptr{Cvoid} handle, a registered finalizer calling the C++ destructor, and multiple-dispatch method proxies that pass the pointer via Base.unsafe_convert.

RepliBuild.Wrapper.extract_symbolsMethod
extract_symbols(binary_path::String, registry::TypeRegistry; demangle::Bool=true, method::Symbol=:nm)

Extract symbols from compiled binary using specified method.

Methods

  • :nm - Fast, basic symbol extraction
  • :objdump - Detailed with debug info (TODO)
  • :all - Try all methods and merge (TODO)

Returns vector of SymbolInfo objects.

source
RepliBuild.Wrapper.is_enum_likeMethod
is_enum_like(cpp_type::String)::Bool

Check if a C++ type looks like an enum based on naming conventions. Returns false — enum detection is unreliable by name alone (uppercase names are more likely structs/classes). Real enum identification uses DWARF metadata via _is_enum_type() in DispatchLogic.jl which checks __enum__ prefixed keys.

source
RepliBuild.Wrapper.is_struct_likeMethod
is_struct_like(cpp_type::String)::Bool

Check if a C++ type looks like a struct/class based on naming conventions. Structs typically: start with uppercase, contain only alphanumeric + underscore.

source
RepliBuild.Wrapper.wrap_basicMethod
wrap_basic(config::RepliBuildConfig, library_path::String; generate_docs::Bool=true)

Generate basic Julia wrapper from binary symbols only (no headers required).

Quality: ~40% - Conservative types, placeholder signatures, requires manual refinement. Use when: Headers not available, quick prototyping, binary-only distribution.

source
RepliBuild.Wrapper.wrap_libraryMethod
wrap_library(config::RepliBuildConfig, library_path::String;
             headers::Vector{String}=String[],
             generate_tests::Bool=false,
             generate_docs::Bool=true)

Generate Julia wrapper for compiled library.

Always uses introspective (DWARF metadata) wrapping when metadata is available, otherwise falls back to basic symbol-only extraction with conservative types.

Arguments

  • config: RepliBuildConfig with wrapper settings
  • library_path: Path to compiled library (.so, .dylib, .dll)
  • headers: Optional header files (currently unused, reserved for future)
  • generate_tests: Generate test file (default: false, TODO)
  • generate_docs: Include comprehensive documentation (default: true)

Returns

Path to generated Julia wrapper file

source

Compiler

Source: src/Compiler.jl

The Compiler module handles the translation of C/C++ source code into LLVM IR and shared libraries. It oversees the entire build pipeline from dependency management down to IR optimization.

Build pipeline

  1. Auto-discovery and dependency resolution: Scans the project directory, resolving file paths and external git/local dependencies to merge into the build graph.
  2. Pre-processing (shims and templates): Dynamically generates C/C++ shim files for configured macros and explicitly instantiates templates based on replibuild.toml settings. This allows normally invisible constructs to manifest in the final binary and DWARF metadata.
  3. Compilation to LLVM IR: Translates source code into .ll text format via clang/clang++.
  4. IR transformation and sanitization: Strips LLVM 19+ attributes incompatible with Julia's internal LLVM JIT, removes va_start/va_end intrinsics from varargs function bodies (varargs are routed entirely through ccall wrapper generation), and cleans mismatched debug metadata.
  5. Bitcode assembly: The sanitized IR is converted into .bc binary format for zero-cost LTO loading. Uses Clang_unified_jll to guarantee LLVM version compatibility with Julia.
  6. Linking: Object files are linked into the target shared library.

Language-aware compilation

.c files are compiled with clang; .cpp files with clang++. For C projects, create_library() and create_executable() also use clang as the linker driver.

Bitcode assembly

Compiler.assemble_bitcode(ll_path, bc_path) converts sanitized LLVM IR text (.ll) to binary bitcode (.bc). It prefers Clang_unified_jll.clang -emit-llvm so the resulting bitcode exactly matches the LLVM version bundled with Julia, maximizing Base.llvmcall compatibility. Falls back to system llvm-as if the JLL is unavailable.

This function is called by both the main LTO pipeline (link_optimize_ir) and the AOT thunks path (_build_aot_thunks).

RepliBuild.Compiler.assemble_bitcodeMethod

Assemble LLVM IR text (.ll) to bitcode (.bc). Uses Julia's own libLLVM via the C API (LLVMParseIRInContext + LLVMWriteBitcodeToFile) to guarantee bitcode version compatibility with Base.llvmcall. Falls back to Clangunifiedjll, then system llvm-as.

source
RepliBuild.Compiler.cpp_to_julia_typeFunction

Convert C++ type to Julia type. Basic type mapping - can be enhanced with type registry.

Arguments

  • cpp_type: C++ type string
  • struct_names: Set of known struct/class names from DWARF
  • enum_names: Set of known enum names from DWARF
source
RepliBuild.Compiler.extract_class_nameMethod

Extract class name from method signature (only considers '::' in the function-name prefix). Example: "Calculator::compute(int, int)" -> "Calculator" Example: "sum_vector(std::vector<int> const&)" -> "" (free function)

source
RepliBuild.Compiler.extract_dwarf_return_typesMethod

Extract return types and struct definitions from DWARF debug info. Returns: (returntypesdict, structdefsdict)

  • returntypes: Dict{mangledname => {ctype, juliatype, size}}
  • structdefs: Dict{structname => {members: [{name, type, offset}]}}
source
RepliBuild.Compiler.extract_function_nameMethod

Extract function name from demangled signature. Example: "Calculator::compute(int, int, char)" -> "compute" Example: "sumvector(std::vector<int, std::allocator<int> > const&)" -> "sumvector"

source
RepliBuild.Compiler.extract_stl_method_symbolsMethod

Extract STL method symbols from the compiled binary. Uses nm to find mangled names of template-instantiated STL methods. Returns Dict mapping normalized container type -> vector of method info dicts.

source
RepliBuild.Compiler.sanitize_ir_for_juliaMethod
sanitize_ir_for_julia(ir_text::String) -> String

Sanitize LLVM IR text for compatibility with Julia's internal LLVM (18). Strips LLVM 19–22 attributes/instructions, debug metadata, and converts varargs function bodies to extern declarations (vastart/vaend can't be JIT-compiled). Uses inlinehint (not alwaysinline) to avoid recursive inliner explosion on large modules.

Coverage by LLVM version: 19: GEP nuw, #dbg* records, inrange(), captures(), deadonunwind, initializes(), allocptr, icmp samesign, range(), trunc nuw/nsw, zext/uitofp nneg 20: GEP nusw, or disjoint, fptrunc/fpext fast-math flags, ptrtoaddr→ptrtoint 21: (covered by 19 patterns — captures/deadon_unwind/allocptr landed here) 22: ptrtoaddr instruction

This is the single source of truth for IR compatibility — used by both the C source LTO pipeline and the MLIR AOT thunks pipeline.

source

Configuration Manager

Source: src/ConfigurationManager.jl

The single source of truth for all build settings. Handles TOML parsing, validation, and merging into a typed RepliBuildConfig struct.

Discovery

Source: src/Discovery.jl

Scans the filesystem to identify C/C++ source files, headers, and dependencies. Auto-detects project language (:c vs :cpp) from the scanned source extensions and sets wrap.language accordingly in the generated replibuild.toml.

RepliBuild.Discovery.discoverFunction
discover(target_dir::String=pwd(); force::Bool=false, unsafe::Bool=false, build::Bool=false, wrap::Bool=false) -> String

Main discovery pipeline - scans project and generates configuration.

Process:

  1. Check for existing replibuild.toml (project identified by presence of replibuild.toml)
  2. Scan all files and categorize
  3. Detect and analyze binaries
  4. Walk AST dependencies using clang
  5. Generate or update replibuild.toml with discovered data
  6. Optionally run build and wrap pipeline

Arguments

  • target_dir: Project directory (default: current directory)
  • force: Force rediscovery even if replibuild.toml exists
  • unsafe: Bypass safety checks (use with extreme caution)
  • build: Automatically run build() after discovery (default: false)
  • wrap: Automatically run wrap() after build (requires build=true, default: false)

Safety Features

  • Discovery is scoped ONLY to target_dir and subdirectories
  • Will not scan outside the project root
  • Skips .git, build, node_modules, .cache directories

Returns

  • Path to generated replibuild.toml file

Examples

# Discover only
toml_path = RepliBuild.Discovery.discover()

# Discover and build
toml_path = RepliBuild.Discovery.discover(build=true)

# Full pipeline: discover → build → wrap
toml_path = RepliBuild.Discovery.discover(build=true, wrap=true)

# Then use the TOML path:
RepliBuild.build(toml_path)
RepliBuild.wrap(toml_path)
source

DWARFParser

Source: src/DWARFParser.jl

Parses llvm-dwarfdump output to extract structured type information from compiled binaries. This is the bridge between C++ debug metadata and Julia wrapper generation.

Data structures

TypeFieldsRole
ClassInfoname, vtable_ptr_offset, base_classes, virtual_methods, members, sizeComplete class/struct description with byte-level layout
VtableInfoclasses, vtable_addresses, method_addressesAggregate metadata for all classes in a binary
VirtualMethodname, mangled_name, slot, return_type, parametersSingle virtual method with vtable slot index
MemberInfoname, type_name, offsetStruct field with byte offset from struct base

Extraction targets

DWARF TagExtracted Data
DW_TAG_class_type / DW_TAG_structure_typeClass/struct name, byte size, members, virtual methods, inheritance
DW_TAG_memberField name, type, DW_AT_data_member_location (byte offset)
DW_TAG_subprogram (with virtual flag)Virtual method name, mangled name, vtable slot
DW_TAG_inheritanceBase class references
DW_TAG_enumeration_typeEnum definitions
DW_TAG_union_typeUnion layout
DW_TAG_variableGlobal variables
DW_TAG_typedefType aliases
RepliBuild.DWARFParser.parse_symbol_tableMethod
parse_symbol_table(nm_output::String) -> Tuple{Dict{String, UInt64}, Dict{String, UInt64}}

Parse nm output to extract vtable and method addresses. Returns (vtableaddresses, methodaddresses).

source
RepliBuild.DWARFParser.parse_vtablesMethod
parse_vtables(binary_path::String) -> VtableInfo

Extract complete vtable information from a binary using DWARF and symbol table.

Arguments

  • binary_path: Path to compiled binary with debug info

Returns

  • VtableInfo containing classes, vtable addresses, and method addresses
source

JLCSIRGenerator

Source: src/JLCSIRGenerator.jl, src/ir_gen/

Transforms parsed DWARF metadata (VtableInfo) into MLIR source text in the JLCS dialect. The generated IR is then parsed and JIT-compiled by MLIRNative.

Submodules

ModuleSourceInputOutput
TypeUtilssrc/ir_gen/TypeUtils.jlC++ type stringMLIR type string (f64, i32, !llvm.ptr, etc.)
StructGensrc/ir_gen/StructGen.jlClassInfo + membersjlcs.type_info operation with field types and offsets
FunctionGensrc/ir_gen/FunctionGen.jlVirtualMethodfunc.func @thunk_... wrapper function
STLContainerGensrc/ir_gen/STLContainerGen.jlSTL method metadataAccessor thunks for size(), data(), etc.

Generation flow

generate_jlcs_ir(vtinfo::VtableInfo) produces a complete MLIR module:

  1. External dispatch declarations: llvm.func @mangled_name(...) for each method with a resolved address
  2. Type info operations: jlcs.type_info for each class with non-empty members (topological sort for inheritance)
  3. Virtual method thunks: func.func @thunk_... for each virtual method
  4. Function thunks: Thunks for regular functions from compilation metadata
  5. STL container thunks: Accessor thunks for detected STL containers

MLIRNative

Source: src/MLIRNative.jl

Low-level ccall bindings to libJLCS.so, the compiled JLCS MLIR dialect shared library. Provides context management, module parsing, JIT engine creation, LLVM lowering, and symbol lookup.

See the MLIR / JLCS Dialect page for the full API reference.

JITManager

Source: src/JITManager.jl

Singleton runtime (GLOBAL_JIT) for Tier 2 function dispatch. Manages the MLIR context, JIT execution engine, and compiled symbol cache.

Key design points

  • Lock-free hot path: The _lookup_cached() function uses a double-check pattern — cached symbols are read from a Dict without locking. Only first-call misses acquire the lock.
  • Arity specialization: Hand-specialized invoke methods for 0-4 arguments avoid heap allocation of Any[]. Stack-allocated Refs and fixed-size Ptr{Cvoid}[] keep the hot path allocation-free.
  • @generated return dispatch: _invoke_call uses @generated to resolve at compile time whether the return type is a primitive (direct ccall return) or a struct (sret buffer allocation).
  • Variadic fallback: 5+ argument calls use dynamic allocation as a fallback.

Calling convention

All Tier 2 functions use a unified ciface calling convention:

ReturnSignature
ScalarT ciface(void** args_ptr)
Structvoid ciface(T* sret, void** args_ptr)
Voidvoid ciface(void** args_ptr)

BuildBridge

Source: src/BuildBridge.jl

Low-level compiler driver that shells out to clang, clang++, llvm-link, llvm-opt, llvm-as, and nm. All subprocess invocations go through this module, providing a single point of control for toolchain interaction.

LLVMEnvironment

Source: src/LLVMEnvironment.jl

Detects the system LLVM/Clang toolchain by searching standard paths and version-suffixed binaries. Falls back to LLVM_full_jll when no system toolchain is found. Caches results in ~/.replibuild/toolchain.toml with a 24-hour TTL.

EnvironmentDoctor

Source: src/EnvironmentDoctor.jl

check_environment() validates the complete toolchain: LLVM 21+, Clang, mlir-tblgen, CMake 3.20+, and libJLCS.so. Returns a ToolchainStatus struct indicating which tiers are available. Provides OS-specific install instructions for missing components.

DependencyResolver

Source: src/DependencyResolver.jl

Processes the [dependencies] table from replibuild.toml. Supports three dependency types:

TypeMechanism
gitShallow clone (--depth 1) into .replibuild_cache/deps/<name>/; re-fetches on tag change
localScanned in-place; no copying
systempkg-config --cflags to inject include paths

The exclude list is applied after scanning. Resolved source files merge into the compilation graph before the compile step.

PackageRegistry

Source: src/PackageRegistry.jl

Global package registry at ~/.replibuild/registry/. Provides:

  • register() — Store a project's build configuration
  • use() — Build + wrap + load, with artifact caching in ~/.replibuild/builds/<hash>/
  • list_registry() — Print all registered packages with hash, source, and build status
  • unregister() — Remove a package and clean cached builds

The REPLIBUILD_HOME environment variable can override the default registry location.

STLWrappers

Source: src/STLWrappers.jl

Detects STL container types (std::vector, std::string, std::map, etc.) in DWARF metadata and generates accessor functions. These are used by the MLIR IR generator (ir_gen/STLContainerGen.jl) to produce JIT thunks for STL container methods.

ASTWalker

Source: src/ASTWalker.jl

Clang.jl-based AST walker for enum extraction. Handles enum class, hex values, namespaces, and other constructs that are difficult to extract reliably from DWARF alone. Replaces the earlier regex-based approach.

ClangJLBridge

Source: src/ClangJLBridge.jl

Integration module for Clang.jl header parsing. Used by the wrapper generator when use_clang_jl = true to supplement DWARF metadata with AST-level information.

Scaffold

Source: src/Scaffold.jl

Generates a distributable Julia package from a registered RepliBuild project. The scaffolded package includes the compiled shared library, generated wrapper module, and a standard Julia Project.toml — ready for Pkg.add().

Introspect

Source: src/Introspect.jl, src/Introspect/

Umbrella module for binary analysis, Julia IR inspection, LLVM pass tooling, benchmarking, and data export. See Introspection Tools for the full API reference.

SubmoduleSourceRole
Binarysrc/Introspect/Binary.jlsymbols(), dwarf_info(), dwarf_dump(), disassemble(), headers()
Juliasrc/Introspect/Julia.jlcode_lowered(), code_typed(), code_llvm(), code_native(), code_warntype(), analysis functions
LLVMsrc/Introspect/LLVM.jlllvm_ir(), optimize_ir(), compare_optimization(), run_passes(), compile_to_asm()
Benchmarkingsrc/Introspect/Benchmarking.jlbenchmark(), benchmark_suite(), track_allocations()
DataExportsrc/Introspect/DataExport.jlexport_json(), export_csv(), export_dataset()
Typessrc/Introspect/Types.jlShared type definitions for the introspection subsystem