Step 1: define the kernel
Decorate a Python function with@enigma.kernel. The function body is traced, not executed: each expression becomes an IR node, not a Python value.
A,B,Care typed buffer parameters (f32=floatin Metal)thread_position_in_gridreturns the global thread ID as anIRValue- Indexing
A[tid]emits a load; assigningC[tid] = ...emits a store
Step 2: compile to a Metal library
enigma.compile runs the full pipeline:
- Traces the Python function to an IR
- Lowers the IR to the Enigma MLIR dialect
- Emits Metal Shading Language (MSL) source
- Invokes
xcrun metalandxcrun metallib - Returns a
CompiledKernelwith all artifacts
Inspect the output
Compile with verbose output
Step 3: dispatch
Key parameters
| Parameter | Meaning |
|---|---|
inputs | List of numpy arrays passed as read-only device buffers |
output_size | Size in bytes of the output buffer |
grid | (gx, gy, gz) — total threads in each dimension |
threads | (tx, ty, tz) — threads per threadgroup |
Step 4: validate
Common first-run mistakes
| Mistake | Symptom |
|---|---|
output_size too small | Partial result or garbage at tail |
grid smaller than data domain | Some elements not written |
Wrong dtype in np.frombuffer | Nonsensical values |
| Running on non-Metal machine | MetalRuntime dispatch exception |
Next steps
- Compile-Only Workflow — iterate without a GPU
- Execution Model — understand grid, threadgroups, and index queries
- Examples — more complete kernel patterns
