@enigma.kernel
Marks a function as a GPU compute kernel. The body is traced once at compile
time, producing an IR graph that is lowered to Metal Shading Language.
Parameter types
Each parameter must have a type annotation:| Annotation | Meaning |
|---|---|
enigma.f32, enigma.f16, enigma.bf16, enigma.i32, … | Typed device buffer (device T* in MSL) |
enigma.Scalar(dtype) | Per-dispatch constant, lowered as a 1-element buffer auto-loaded at entry |
[[buffer(N)]] bindings in declaration order, starting
at index 0.
See Data Types for the full type table.
Returns
AKernelDef object. Do not call directly — pass to enigma.compile().
@enigma.jit
Marks a host-side function that runs at compile time. Use for layout algebra,
tile partitioning, and multi-kernel orchestration.
Tensor arguments to enigma.compile():
Thread & grid queries
These return anIRValue of dtype "uint" representing a thread or group
index. They are valid only inside @enigma.kernel bodies.
Shorthand (x dimension)
Per-dimension queries
Each query takes an optionaldim argument: "x" (default), "y", or "z".
| Function | Metal equivalent | Description |
|---|---|---|
enigma.thread_position_in_grid_xyz(dim="x") | thread_position_in_grid.{x|y|z} | Global thread index |
enigma.thread_position_in_threadgroup(dim="x") | thread_position_in_threadgroup.{x|y|z} | Index within threadgroup |
enigma.threadgroup_position_in_grid(dim="x") | threadgroup_position_in_grid.{x|y|z} | Threadgroup index in grid |
enigma.threads_per_threadgroup(dim="x") | threads_per_threadgroup.{x|y|z} | Threads per threadgroup |
enigma.threads_per_grid(dim="x") | threads_per_grid.{x|y|z} | Total threads in grid |
enigma.threadgroups_per_grid(dim="x") | threadgroups_per_grid.{x|y|z} | Threadgroups in grid |
enigma.grid_size(dim="x") | grid_size.{x|y|z} | Alias for threadgroups_per_grid |
Flat queries (no dim parameter)
| Function | Metal equivalent | Description |
|---|---|---|
enigma.thread_index_in_threadgroup() | thread_index_in_threadgroup | Flattened 1D index within threadgroup |
enigma.thread_index_in_simdgroup() | thread_index_in_simdgroup | Lane index within SIMD group (0–31) |
enigma.simdgroup_index_in_threadgroup() | simdgroup_index_in_threadgroup | SIMD group index within threadgroup |
enigma.threads_per_simdgroup() | threads_per_simdgroup | Threads per SIMD group (typically 32) |
enigma.simdgroups_per_threadgroup() | simdgroups_per_threadgroup | SIMD groups per threadgroup |
Example: 2D grid
Function constants
Metal specialization constants bound at pipeline creation time. Use these for values that should be compile-time-constant in the pipeline but selectable per dispatch (e.g. tile sizes, fusion flags).enigma.function_constant(dtype, index) -> IRValue
| Parameter | Type | Description |
|---|---|---|
dtype | str | "float", "int", "uint", "bool" |
index | int | Function constant index, matched at pipeline creation |
arch namespace
Hardware-feature gating helpers. Use these to write kernels that adapt to
the active GPU family.
| Function | Returns | Description |
|---|---|---|
enigma.arch.is_apple_silicon() | bool | True on arm64 macOS |
enigma.arch.gpu_family() | str | e.g. "apple9" |
enigma.arch.supports_simdgroup_matrix() | bool | 8×8 matrix unit availability |
enigma.arch.supports_async_copy() | bool | AIR async copy intrinsics (M3+) |
enigma.arch.simdgroup_size() | int | SIMD group lane count |
@enigma.kernel). For the
runtime-side equivalent, see MetalRuntime.device_capabilities() in
Runtime.