Skip to main content
Enigma exposes two families of matrix ops:
  1. Simdgroup matrix ops — 8×8 tiles distributed across the 32 lanes of a SIMD group, lowered to Apple Silicon’s hardware MMA units.
  2. Matrix-type ops — operations on regular Metal floatNxM matrix types (matmul, transpose, determinant).
Detection at runtime: MetalRuntime.device_capabilities().supports_simdgroup_matrix.

Simdgroup matrix ops

Operate on simdgroup_float8x8 (or other element types). Each matrix is held collaboratively in registers across the 32 threads of one SIMD group — they are not addressable per-thread.

enigma.simdgroup_matrix_load(buf, elements_per_row, elem="float", rows=8, cols=8) -> IRValue

Load an 8×8 tile from a device buffer into a simdgroup matrix register.
ParameterTypeDefaultDescription
bufTracingTensorSource buffer (kernel parameter)
elements_per_rowintRow stride in the source buffer
elemstr"float"Element type
rowsint8Tile rows
colsint8Tile columns

enigma.simdgroup_matrix_store(matrix, buf, elements_per_row) -> None

Store a simdgroup matrix back to a device buffer.
ParameterTypeDescription
matrixIRValueSimdgroup matrix to store
bufTracingTensorDestination buffer
elements_per_rowintRow stride in the destination

enigma.simdgroup_multiply_accumulate(a, b, c) -> IRValue

Matrix multiply-accumulate: result = a * b + c. All operands and the result are simdgroup matrices.
ParameterTypeDescription
aIRValueLeft matrix
bIRValueRight matrix
cIRValueAccumulator

enigma.make_filled_simdgroup_matrix(value, elem="float", rows=8, cols=8) -> IRValue

Create a simdgroup matrix initialized with a scalar value (typically zero for accumulator setup).
ParameterTypeDefaultDescription
valueIRValue or numberFill value
elemstr"float"Element type
rowsint8Rows
colsint8Columns

Example: single-tile GEMM

@enigma.kernel
def simd_gemm(A: enigma.f32, B: enigma.f32, C: enigma.f32):
    a_mat = enigma.simdgroup_matrix_load(A, elements_per_row=8)
    b_mat = enigma.simdgroup_matrix_load(B, elements_per_row=8)
    zero = enigma.metal_cast(0, "float")
    c_mat = enigma.make_filled_simdgroup_matrix(zero)
    result = enigma.simdgroup_multiply_accumulate(a_mat, b_mat, c_mat)
    enigma.simdgroup_matrix_store(result, C, elements_per_row=8)
For multi-tile GEMM, prefer enigma.gemm — it picks the simdgroup path automatically when the tile shape is 8×8×8.

Matrix-type operations

These work on regular Metal matrix types (e.g. float4x4), modeled as multi-dimensional vector types in MLIR. They are useful for affine transforms and small dense linear algebra inside compute kernels.
FunctionDescription
enigma.matmul(a, b, result_dtype=None)Matrix multiply: a * b
enigma.transpose(m, result_dtype=None)Transpose
enigma.determinant(m, scalar_dtype=None)Determinant (returns scalar)
Construction of matrix-typed values is currently blocked on a dialect-side mat_make op. See docs/blocked-features.md in the repo for status.