Common Errors - Enigma

Compilation failures

`xcrun: error: unable to find utility "metal"`

The Metal compiler is not in your PATH. This almost always means Xcode Command Line Tools are not installed or need to be re-selected.

# Install or reinstall
xcode-select --install

# Verify
xcrun -sdk macosx metal -v

If xcrun is present but the metal tool is missing, try:

sudo xcode-select --reset

Tracer TypeError

TypeError: unsupported operand type(s) for +: 'int' and 'IRValue'

You are mixing a Python value with an IRValue in a way the tracer cannot handle. Make sure all kernel-body computation goes through IRValue objects. Integer and float literals are fine — Python variables holding IRValue nodes are fine — but numpy arrays, Python lists, or conditionals on IRValue objects are not. Wrong:

@enigma.kernel
def bad(A: enigma.f32, C: enigma.f32):
    n = 1024            # Python int
    tid = enigma.thread_position_in_grid
    if tid < n:         # comparing IRValue with Python int — not allowed
        C[tid] = A[tid]

Correct:

@enigma.kernel
def good(A: enigma.f32, C: enigma.f32):
    tid = enigma.thread_position_in_grid
    with enigma.if_(enigma.cmp_lt(tid, 1024)):
        C[tid] = A[tid]

`EnigmaError: tiler exceeds tensor in mode N`

A layout division produced a tiler dimension larger than the tensor dimension. Print the shape and stride of the tensor and tiler at each step:

print("tensor:", mA.layout)
print("tiler:", tiler_mn)

Dispatch failures

Wrong output shape or garbage values

The most common causes:

Symptom	Likely cause	Fix
Last N elements are zero	`grid` too small	Set `grid=(n, 1, 1)` to cover all elements
Values are 2× or 4× wrong offset	Wrong stride in index math	Check row/column ordering in kernel index
Values look like memory garbage	`output_size` too small	Use `elements * sizeof(dtype)` in bytes
Values are consistently off by a constant	Buffer bound wrong	Verify input array ordering and dtype

`output_size` gotcha

output_size is in bytes, not elements:

# Wrong
rt.execute(compiled, [a, b], output_size=n, ...)

# Correct (f32 = 4 bytes)
rt.execute(compiled, [a, b], output_size=n * 4, ...)

# Correct (f16 = 2 bytes)
rt.execute(compiled, [a, b], output_size=n * 2, ...)

Runtime dispatch exception

When rt.execute() raises, the error message includes the kernel name, grid, threads, and the Metal return code. Check:

Kernel name matches what you compiled
Grid × threadgroup size covers your data range
Input dtypes match kernel parameter types

Try dispatching with a minimal input (e.g. n=32) to isolate the failure.

Correctness issues

Values differ between GPU and CPU

GPU floating-point order of operations differs from CPU. Use loose tolerances:

np.testing.assert_allclose(gpu_result, cpu_result, rtol=1e-4, atol=1e-4)

For reductions especially, large input arrays accumulate floating-point error. Compare relative error rather than absolute.

Vectorized kernel produces wrong results

When using vec_width=4, the grid must be divided by the vector width:

# Scalar grid
rt.execute(compiled_scalar, ..., grid=(n, 1, 1), ...)

# vec_width=4 grid — 4× fewer threads, each handles 4 elements
rt.execute(compiled_vec4, ..., grid=(n // 4, 1, 1), ...)

2D kernel has transposed output

Enigma uses the Metal convention: x is the fast (column) dimension and y is the slow (row) dimension. Ensure grid and kernel indexing agree:

# Metal convention: grid=(cols, rows, 1)
row = enigma.thread_position_in_grid_xyz("y")
col = enigma.thread_position_in_grid_xyz("x")
idx = row * N + col   # row-major

rt.execute(..., grid=(N, M, 1), threads=(16, 16, 1))

​Compilation failures

​xcrun: error: unable to find utility "metal"

​Tracer TypeError

​EnigmaError: tiler exceeds tensor in mode N

​Dispatch failures

​Wrong output shape or garbage values

​output_size gotcha

​Runtime dispatch exception

​Correctness issues

​Values differ between GPU and CPU

​Vectorized kernel produces wrong results

​2D kernel has transposed output