Enigma exposes two families of matrix ops:
- Simdgroup matrix ops — 8×8 tiles distributed across the 32 lanes of
a SIMD group, lowered to Apple Silicon’s hardware MMA units.
- Matrix-type ops — operations on regular Metal
floatNxM matrix types
(matmul, transpose, determinant).
Detection at runtime: MetalRuntime.device_capabilities().supports_simdgroup_matrix.
Simdgroup matrix ops
Operate on simdgroup_float8x8 (or other element types). Each matrix is held
collaboratively in registers across the 32 threads of one SIMD group — they
are not addressable per-thread.
enigma.simdgroup_matrix_load(buf, elements_per_row, elem="float", rows=8, cols=8) -> IRValue
Load an 8×8 tile from a device buffer into a simdgroup matrix register.
| Parameter | Type | Default | Description |
|---|
buf | TracingTensor | — | Source buffer (kernel parameter) |
elements_per_row | int | — | Row stride in the source buffer |
elem | str | "float" | Element type |
rows | int | 8 | Tile rows |
cols | int | 8 | Tile columns |
enigma.simdgroup_matrix_store(matrix, buf, elements_per_row) -> None
Store a simdgroup matrix back to a device buffer.
| Parameter | Type | Description |
|---|
matrix | IRValue | Simdgroup matrix to store |
buf | TracingTensor | Destination buffer |
elements_per_row | int | Row stride in the destination |
enigma.simdgroup_multiply_accumulate(a, b, c) -> IRValue
Matrix multiply-accumulate: result = a * b + c. All operands and the result
are simdgroup matrices.
| Parameter | Type | Description |
|---|
a | IRValue | Left matrix |
b | IRValue | Right matrix |
c | IRValue | Accumulator |
enigma.make_filled_simdgroup_matrix(value, elem="float", rows=8, cols=8) -> IRValue
Create a simdgroup matrix initialized with a scalar value (typically zero
for accumulator setup).
| Parameter | Type | Default | Description |
|---|
value | IRValue or number | — | Fill value |
elem | str | "float" | Element type |
rows | int | 8 | Rows |
cols | int | 8 | Columns |
Example: single-tile GEMM
@enigma.kernel
def simd_gemm(A: enigma.f32, B: enigma.f32, C: enigma.f32):
a_mat = enigma.simdgroup_matrix_load(A, elements_per_row=8)
b_mat = enigma.simdgroup_matrix_load(B, elements_per_row=8)
zero = enigma.metal_cast(0, "float")
c_mat = enigma.make_filled_simdgroup_matrix(zero)
result = enigma.simdgroup_multiply_accumulate(a_mat, b_mat, c_mat)
enigma.simdgroup_matrix_store(result, C, elements_per_row=8)
For multi-tile GEMM, prefer enigma.gemm
— it picks the simdgroup path automatically when the tile shape is 8×8×8.
Matrix-type operations
These work on regular Metal matrix types (e.g. float4x4), modeled as
multi-dimensional vector types in MLIR. They are useful for affine transforms
and small dense linear algebra inside compute kernels.
| Function | Description |
|---|
enigma.matmul(a, b, result_dtype=None) | Matrix multiply: a * b |
enigma.transpose(m, result_dtype=None) | Transpose |
enigma.determinant(m, scalar_dtype=None) | Determinant (returns scalar) |
Construction of matrix-typed values is currently blocked on a dialect-side
mat_make op. See docs/blocked-features.md in the repo for status.