Enable debug output at compile time
What to look for in the IR (dump_ir=True)
The IR lists every operation the tracer recorded. Verify:
- Loads and stores appear in the right order
- Thread index ops (
thread_position_in_grid, etc.) are present - Arithmetic ops match your kernel logic
- No unexpected extra ops (can indicate a scoping issue in for_range)
What to look for in MLIR (dump_mlir=True)
The MLIR representation shows the dialect ops before Metal emission. Verify:
- Buffer arguments have the correct types (
!enigma.buffer<f32>) - Control-flow regions (
scf.for,scf.if) have the correct structure - Thread index ops map to the correct
Dimensionattribute (x/y/z)
Inspecting the Metal source
- Kernel function signature: buffer count and types match kernel parameters
- Index arithmetic: confirm the generated expressions match your intent
[[thread_position_in_grid]]vs[[thread_position_in_threadgroup]]: correct attribute?- Barrier placement:
threadgroup_barrierappears between write and read of shared memory
Saving Metal source for diffing
Minimal reproducible inputs
When diagnosing a correctness issue:- Use the smallest input that reproduces the problem (e.g.
n=32) - Use
np.arange(n).astype(np.float32)as input so values are predictable - Check each element individually with
out[i]rather than aggregate assertions
Debugging layout algebra
When a tiling produces unexpected output, print shape and stride at each transform:threads×vals/thr= total elements in one tile- Tiler dimensions fit within the tensor dimensions
Fast debugging loop
- Lock a minimal reproducible input (small
n, predictable values) - Compile with
dump_ir=True, dump_mlir=True, keep_metal_source=True - Read the Metal source — the bug is usually visible there
- Make one change at a time and re-check the Metal source before dispatching
- Only run dispatch once the Metal source looks correct
