Matrix Operations

Enigma exposes two families of matrix ops:

Simdgroup matrix ops — 8×8 tiles distributed across the 32 lanes of a SIMD group, lowered to Apple Silicon’s hardware MMA units.
Matrix-type ops — operations on regular Metal floatNxM matrix types (matmul, transpose, determinant).

Detection at runtime: MetalRuntime.device_capabilities().supports_simdgroup_matrix.

Simdgroup matrix ops

Operate on simdgroup_float8x8 (or other element types). Each matrix is held collaboratively in registers across the 32 threads of one SIMD group — they are not addressable per-thread.

`enigma.simdgroup_matrix_load(buf, elements_per_row, elem="float", rows=8, cols=8) -> IRValue`

Load an 8×8 tile from a device buffer into a simdgroup matrix register.

Parameter	Type	Default	Description
`buf`	TracingTensor	—	Source buffer (kernel parameter)
`elements_per_row`	int	—	Row stride in the source buffer
`elem`	str	`"float"`	Element type
`rows`	int	`8`	Tile rows
`cols`	int	`8`	Tile columns

`enigma.simdgroup_matrix_store(matrix, buf, elements_per_row) -> None`

Store a simdgroup matrix back to a device buffer.

Parameter	Type	Description
`matrix`	IRValue	Simdgroup matrix to store
`buf`	TracingTensor	Destination buffer
`elements_per_row`	int	Row stride in the destination

`enigma.simdgroup_multiply_accumulate(a, b, c) -> IRValue`

Matrix multiply-accumulate: result = a * b + c. All operands and the result are simdgroup matrices.

Parameter	Type	Description
`a`	IRValue	Left matrix
`b`	IRValue	Right matrix
`c`	IRValue	Accumulator

`enigma.make_filled_simdgroup_matrix(value, elem="float", rows=8, cols=8) -> IRValue`

Create a simdgroup matrix initialized with a scalar value (typically zero for accumulator setup).

Parameter	Type	Default	Description
`value`	IRValue or number	—	Fill value
`elem`	str	`"float"`	Element type
`rows`	int	`8`	Rows
`cols`	int	`8`	Columns

Example: single-tile GEMM

@enigma.kernel
def simd_gemm(A: enigma.f32, B: enigma.f32, C: enigma.f32):
    a_mat = enigma.simdgroup_matrix_load(A, elements_per_row=8)
    b_mat = enigma.simdgroup_matrix_load(B, elements_per_row=8)
    zero = enigma.metal_cast(0, "float")
    c_mat = enigma.make_filled_simdgroup_matrix(zero)
    result = enigma.simdgroup_multiply_accumulate(a_mat, b_mat, c_mat)
    enigma.simdgroup_matrix_store(result, C, elements_per_row=8)

For multi-tile GEMM, prefer enigma.gemm — it picks the simdgroup path automatically when the tile shape is 8×8×8.

Matrix-type operations

These work on regular Metal matrix types (e.g. float4x4), modeled as multi-dimensional vector types in MLIR. They are useful for affine transforms and small dense linear algebra inside compute kernels.

Function	Description
`enigma.matmul(a, b, result_dtype=None)`	Matrix multiply: `a * b`
`enigma.transpose(m, result_dtype=None)`	Transpose
`enigma.determinant(m, scalar_dtype=None)`	Determinant (returns scalar)

Construction of matrix-typed values is currently blocked on a dialect-side mat_make op. See docs/blocked-features.md in the repo for status.

​Simdgroup matrix ops

​enigma.simdgroup_matrix_load(buf, elements_per_row, elem="float", rows=8, cols=8) -> IRValue

​enigma.simdgroup_matrix_store(matrix, buf, elements_per_row) -> None

​enigma.simdgroup_multiply_accumulate(a, b, c) -> IRValue

​enigma.make_filled_simdgroup_matrix(value, elem="float", rows=8, cols=8) -> IRValue

​Example: single-tile GEMM

​Matrix-type operations

Simdgroup matrix ops

`enigma.simdgroup_matrix_load(buf, elements_per_row, elem="float", rows=8, cols=8) -> IRValue`

`enigma.simdgroup_matrix_store(matrix, buf, elements_per_row) -> None`

`enigma.simdgroup_multiply_accumulate(a, b, c) -> IRValue`

`enigma.make_filled_simdgroup_matrix(value, elem="float", rows=8, cols=8) -> IRValue`

Example: single-tile GEMM

Matrix-type operations