Compile-Only Workflow

enigma.compile runs entirely in the compiler pipeline and does not require a Metal device. You can use this to develop and inspect kernels on any macOS machine with Xcode CLT, including CI runners without Apple Silicon.

Basic usage

import enigma

@enigma.kernel
def add(A: enigma.f32, B: enigma.f32, C: enigma.f32):
    tid = enigma.thread_position_in_grid
    C[tid] = A[tid] + B[tid]

compiled = enigma.compile(add)

# All of these are available without a GPU:
print(compiled.kernel_name)
print(compiled.metallib_path)
print(compiled.metal_source)

Enabling debug artifacts

Use compile flags to capture every stage of the pipeline:

compiled = enigma.compile(
    add,
    dump_ir=True,           # print traced IR to stdout
    dump_mlir=True,         # print MLIR before emission
    keep_metal_source=True, # write .metal file to work_dir
    work_dir="./build/enigma",
)

What each flag does

Flag	Effect
`dump_ir=True`	Prints the traced op graph (loads, stores, arithmetic) to stdout
`dump_mlir=True`	Prints the Enigma dialect MLIR before MSL emission
`keep_metal_source=True`	Writes `.metal` source to `work_dir` (default: temp dir)
`work_dir=path`	Directory for all intermediate build artifacts

Exporting Metal source

To save the generated source to a specific path for review or diffing:

path = compiled.export_metal("./artifacts/add.metal")
print(path)  # ./artifacts/add.metal

This is useful for:

Code review of generated kernels
Tracking codegen regressions in CI with git diff
Profiling with Metal’s GPU capture tools using the original source

Vectorized kernels

Pass vec_width to request vectorized codegen. The pipeline will emit float4-style loads and stores when alignment allows:

compiled = enigma.compile(add, vec_width=4, keep_metal_source=True)
print(compiled.metal_source)
# Output will use float4 pointers where possible

When to use this workflow

Scenario	Use
Developing on a non-Metal machine	Compile-only — no GPU needed
CI codegen regression tests	`keep_metal_source=True` + diff artifacts
Debugging incorrect output	`dump_ir=True` + `dump_mlir=True`
Performance tuning with Instruments	Export source with `export_metal()`

​Basic usage

​Enabling debug artifacts

​What each flag does

​Exporting Metal source

​Vectorized kernels

​When to use this workflow