> ## Documentation Index
> Fetch the complete documentation index at: https://klyne-research.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Common Errors

> Typical failure modes with concrete diagnostic steps.

## Compilation failures

### `xcrun: error: unable to find utility "metal"`

The Metal compiler is not in your PATH. This almost always means Xcode Command Line Tools are not installed or need to be re-selected.

```bash theme={null}
# Install or reinstall
xcode-select --install

# Verify
xcrun -sdk macosx metal -v
```

If `xcrun` is present but the metal tool is missing, try:

```bash theme={null}
sudo xcode-select --reset
```

### Tracer TypeError

```
TypeError: unsupported operand type(s) for +: 'int' and 'IRValue'
```

You are mixing a Python value with an `IRValue` in a way the tracer cannot handle. Make sure all kernel-body computation goes through `IRValue` objects. Integer and float literals are fine — Python variables holding `IRValue` nodes are fine — but numpy arrays, Python lists, or conditionals on `IRValue` objects are not.

**Wrong:**

```python theme={null}
@enigma.kernel
def bad(A: enigma.f32, C: enigma.f32):
    n = 1024            # Python int
    tid = enigma.thread_position_in_grid
    if tid < n:         # comparing IRValue with Python int — not allowed
        C[tid] = A[tid]
```

**Correct:**

```python theme={null}
@enigma.kernel
def good(A: enigma.f32, C: enigma.f32):
    tid = enigma.thread_position_in_grid
    with enigma.if_(enigma.cmp_lt(tid, 1024)):
        C[tid] = A[tid]
```

### `EnigmaError: tiler exceeds tensor in mode N`

A layout division produced a tiler dimension larger than the tensor dimension. Print the shape and stride of the tensor and tiler at each step:

```python theme={null}
print("tensor:", mA.layout)
print("tiler:", tiler_mn)
```

***

## Dispatch failures

### Wrong output shape or garbage values

The most common causes:

| Symptom                                   | Likely cause               | Fix                                        |
| ----------------------------------------- | -------------------------- | ------------------------------------------ |
| Last N elements are zero                  | `grid` too small           | Set `grid=(n, 1, 1)` to cover all elements |
| Values are 2× or 4× wrong offset          | Wrong stride in index math | Check row/column ordering in kernel index  |
| Values look like memory garbage           | `output_size` too small    | Use `elements * sizeof(dtype)` in bytes    |
| Values are consistently off by a constant | Buffer bound wrong         | Verify input array ordering and dtype      |

### `output_size` gotcha

`output_size` is in **bytes**, not elements:

```python theme={null}
# Wrong
rt.execute(compiled, [a, b], output_size=n, ...)

# Correct (f32 = 4 bytes)
rt.execute(compiled, [a, b], output_size=n * 4, ...)

# Correct (f16 = 2 bytes)
rt.execute(compiled, [a, b], output_size=n * 2, ...)
```

### Runtime dispatch exception

When `rt.execute()` raises, the error message includes the kernel name, grid, threads, and the Metal return code. Check:

1. Kernel name matches what you compiled
2. Grid × threadgroup size covers your data range
3. Input dtypes match kernel parameter types

Try dispatching with a minimal input (e.g. `n=32`) to isolate the failure.

***

## Correctness issues

### Values differ between GPU and CPU

GPU floating-point order of operations differs from CPU. Use loose tolerances:

```python theme={null}
np.testing.assert_allclose(gpu_result, cpu_result, rtol=1e-4, atol=1e-4)
```

For reductions especially, large input arrays accumulate floating-point error. Compare relative error rather than absolute.

### Vectorized kernel produces wrong results

When using `vec_width=4`, the grid must be divided by the vector width:

```python theme={null}
# Scalar grid
rt.execute(compiled_scalar, ..., grid=(n, 1, 1), ...)

# vec_width=4 grid — 4× fewer threads, each handles 4 elements
rt.execute(compiled_vec4, ..., grid=(n // 4, 1, 1), ...)
```

### 2D kernel has transposed output

Enigma uses the Metal convention: `x` is the fast (column) dimension and `y` is the slow (row) dimension. Ensure grid and kernel indexing agree:

```python theme={null}
# Metal convention: grid=(cols, rows, 1)
row = enigma.thread_position_in_grid_xyz("y")
col = enigma.thread_position_in_grid_xyz("x")
idx = row * N + col   # row-major
```

```python theme={null}
rt.execute(..., grid=(N, M, 1), threads=(16, 16, 1))
```