> ## Documentation Index
> Fetch the complete documentation index at: https://klyne-research.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Kernel Decorators & Grid Queries

> Reference for @enigma.kernel, @enigma.jit, and the thread/grid query intrinsics available inside kernel bodies.

This page documents the decorators that mark a Python function for GPU
compilation, and the built-in queries that return thread and grid indices.

For the conceptual overview, see [Kernels and JIT](/programming-guide/kernels-and-jit).

## `@enigma.kernel`

Marks a function as a GPU compute kernel. The body is traced once at compile
time, producing an IR graph that is lowered to Metal Shading Language.

```python theme={null}
@enigma.kernel
def fn(A: enigma.f32, B: enigma.f32, C: enigma.f32):
    tid = enigma.thread_position_in_grid
    C[tid] = A[tid] + B[tid]
```

### Parameter types

Each parameter must have a type annotation:

| Annotation                                                   | Meaning                                                                   |
| ------------------------------------------------------------ | ------------------------------------------------------------------------- |
| `enigma.f32`, `enigma.f16`, `enigma.bf16`, `enigma.i32`, ... | Typed device buffer (`device T*` in MSL)                                  |
| `enigma.Scalar(dtype)`                                       | Per-dispatch constant, lowered as a 1-element buffer auto-loaded at entry |

Parameters map to MSL `[[buffer(N)]]` bindings in declaration order, starting
at index 0.

See [Data Types](/programming-guide/data-types) for the full type table.

### Returns

A `KernelDef` object. Do not call directly — pass to `enigma.compile()`.

***

## `@enigma.jit`

Marks a host-side function that runs at compile time. Use for layout algebra,
tile partitioning, and multi-kernel orchestration.

```python theme={null}
@enigma.jit
def tiled_add(mA: enigma.Tensor, mB: enigma.Tensor, mC: enigma.Tensor):
    ...
```

Pass `Tensor` arguments to `enigma.compile()`:

```python theme={null}
compiled = enigma.compile(tiled_add, A, B, C)
```

See [Kernels and JIT](/programming-guide/kernels-and-jit#enigma-jit) for the
full pattern.

***

## Thread & grid queries

These return an `IRValue` of dtype `"uint"` representing a thread or group
index. They are valid only inside `@enigma.kernel` bodies.

### Shorthand (x dimension)

```python theme={null}
tid = enigma.thread_position_in_grid   # property — returns x dimension
```

### Per-dimension queries

Each query takes an optional `dim` argument: `"x"` (default), `"y"`, or `"z"`.

| Function                                         | Metal equivalent                           | Description                       |
| ------------------------------------------------ | ------------------------------------------ | --------------------------------- |
| `enigma.thread_position_in_grid_xyz(dim="x")`    | `thread_position_in_grid.{x\|y\|z}`        | Global thread index               |
| `enigma.thread_position_in_threadgroup(dim="x")` | `thread_position_in_threadgroup.{x\|y\|z}` | Index within threadgroup          |
| `enigma.threadgroup_position_in_grid(dim="x")`   | `threadgroup_position_in_grid.{x\|y\|z}`   | Threadgroup index in grid         |
| `enigma.threads_per_threadgroup(dim="x")`        | `threads_per_threadgroup.{x\|y\|z}`        | Threads per threadgroup           |
| `enigma.threads_per_grid(dim="x")`               | `threads_per_grid.{x\|y\|z}`               | Total threads in grid             |
| `enigma.threadgroups_per_grid(dim="x")`          | `threadgroups_per_grid.{x\|y\|z}`          | Threadgroups in grid              |
| `enigma.grid_size(dim="x")`                      | `grid_size.{x\|y\|z}`                      | Alias for `threadgroups_per_grid` |

### Flat queries (no dim parameter)

| Function                                  | Metal equivalent                 | Description                           |
| ----------------------------------------- | -------------------------------- | ------------------------------------- |
| `enigma.thread_index_in_threadgroup()`    | `thread_index_in_threadgroup`    | Flattened 1D index within threadgroup |
| `enigma.thread_index_in_simdgroup()`      | `thread_index_in_simdgroup`      | Lane index within SIMD group (0–31)   |
| `enigma.simdgroup_index_in_threadgroup()` | `simdgroup_index_in_threadgroup` | SIMD group index within threadgroup   |
| `enigma.threads_per_simdgroup()`          | `threads_per_simdgroup`          | Threads per SIMD group (typically 32) |
| `enigma.simdgroups_per_threadgroup()`     | `simdgroups_per_threadgroup`     | SIMD groups per threadgroup           |

### Example: 2D grid

```python theme={null}
@enigma.kernel
def transpose(In: enigma.f32, Out: enigma.f32):
    row = enigma.thread_position_in_grid_xyz("y")
    col = enigma.thread_position_in_grid_xyz("x")
    Out[col * 64 + row] = In[row * 64 + col]
```

***

## Function constants

Metal specialization constants bound at pipeline creation time. Use these for
values that should be compile-time-constant in the pipeline but selectable per
dispatch (e.g. tile sizes, fusion flags).

### `enigma.function_constant(dtype, index) -> IRValue`

| Parameter | Type | Description                                           |
| --------- | ---- | ----------------------------------------------------- |
| `dtype`   | str  | `"float"`, `"int"`, `"uint"`, `"bool"`                |
| `index`   | int  | Function constant index, matched at pipeline creation |

```python theme={null}
@enigma.kernel
def scaled_add(A: enigma.f32, B: enigma.f32, C: enigma.f32):
    scale = enigma.function_constant("float", index=0)
    tid = enigma.thread_position_in_grid
    C[tid] = A[tid] * scale + B[tid]
```

***

## `arch` namespace

Hardware-feature gating helpers. Use these to write kernels that adapt to
the active GPU family.

| Function                                  | Returns | Description                     |
| ----------------------------------------- | ------- | ------------------------------- |
| `enigma.arch.is_apple_silicon()`          | bool    | True on arm64 macOS             |
| `enigma.arch.gpu_family()`                | str     | e.g. `"apple9"`                 |
| `enigma.arch.supports_simdgroup_matrix()` | bool    | 8×8 matrix unit availability    |
| `enigma.arch.supports_async_copy()`       | bool    | AIR async copy intrinsics (M3+) |
| `enigma.arch.simdgroup_size()`            | int     | SIMD group lane count           |

These are host-side helpers (not callable inside `@enigma.kernel`). For the
runtime-side equivalent, see `MetalRuntime.device_capabilities()` in
[Runtime](/api-reference/runtime).
