Skip to main content
enigma.compile runs entirely in the compiler pipeline and does not require a Metal device. You can use this to develop and inspect kernels on any macOS machine with Xcode CLT, including CI runners without Apple Silicon.

Basic usage

import enigma

@enigma.kernel
def add(A: enigma.f32, B: enigma.f32, C: enigma.f32):
    tid = enigma.thread_position_in_grid
    C[tid] = A[tid] + B[tid]

compiled = enigma.compile(add)

# All of these are available without a GPU:
print(compiled.kernel_name)
print(compiled.metallib_path)
print(compiled.metal_source)

Enabling debug artifacts

Use compile flags to capture every stage of the pipeline:
compiled = enigma.compile(
    add,
    dump_ir=True,           # print traced IR to stdout
    dump_mlir=True,         # print MLIR before emission
    keep_metal_source=True, # write .metal file to work_dir
    work_dir="./build/enigma",
)

What each flag does

FlagEffect
dump_ir=TruePrints the traced op graph (loads, stores, arithmetic) to stdout
dump_mlir=TruePrints the Enigma dialect MLIR before MSL emission
keep_metal_source=TrueWrites .metal source to work_dir (default: temp dir)
work_dir=pathDirectory for all intermediate build artifacts

Exporting Metal source

To save the generated source to a specific path for review or diffing:
path = compiled.export_metal("./artifacts/add.metal")
print(path)  # ./artifacts/add.metal
This is useful for:
  • Code review of generated kernels
  • Tracking codegen regressions in CI with git diff
  • Profiling with Metal’s GPU capture tools using the original source

Vectorized kernels

Pass vec_width to request vectorized codegen. The pipeline will emit float4-style loads and stores when alignment allows:
compiled = enigma.compile(add, vec_width=4, keep_metal_source=True)
print(compiled.metal_source)
# Output will use float4 pointers where possible

When to use this workflow

ScenarioUse
Developing on a non-Metal machineCompile-only — no GPU needed
CI codegen regression testskeep_metal_source=True + diff artifacts
Debugging incorrect outputdump_ir=True + dump_mlir=True
Performance tuning with InstrumentsExport source with export_metal()