> ## Documentation Index
> Fetch the complete documentation index at: https://klyne-research.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Matrix Operations

> Hardware-accelerated 8x8 simdgroup matrix multiply-accumulate, plus the regular Metal matrix-type operations.

Enigma exposes two families of matrix ops:

1. **Simdgroup matrix ops** — 8×8 tiles distributed across the 32 lanes of
   a SIMD group, lowered to Apple Silicon's hardware MMA units.
2. **Matrix-type ops** — operations on regular Metal `floatNxM` matrix types
   (matmul, transpose, determinant).

Detection at runtime: `MetalRuntime.device_capabilities().supports_simdgroup_matrix`.

***

## Simdgroup matrix ops

Operate on `simdgroup_float8x8` (or other element types). Each matrix is held
collaboratively in registers across the 32 threads of one SIMD group — they
are not addressable per-thread.

### `enigma.simdgroup_matrix_load(buf, elements_per_row, elem="float", rows=8, cols=8) -> IRValue`

Load an 8×8 tile from a device buffer into a simdgroup matrix register.

| Parameter          | Type          | Default   | Description                      |
| ------------------ | ------------- | --------- | -------------------------------- |
| `buf`              | TracingTensor | —         | Source buffer (kernel parameter) |
| `elements_per_row` | int           | —         | Row stride in the source buffer  |
| `elem`             | str           | `"float"` | Element type                     |
| `rows`             | int           | `8`       | Tile rows                        |
| `cols`             | int           | `8`       | Tile columns                     |

### `enigma.simdgroup_matrix_store(matrix, buf, elements_per_row) -> None`

Store a simdgroup matrix back to a device buffer.

| Parameter          | Type          | Description                   |
| ------------------ | ------------- | ----------------------------- |
| `matrix`           | IRValue       | Simdgroup matrix to store     |
| `buf`              | TracingTensor | Destination buffer            |
| `elements_per_row` | int           | Row stride in the destination |

### `enigma.simdgroup_multiply_accumulate(a, b, c) -> IRValue`

Matrix multiply-accumulate: `result = a * b + c`. All operands and the result
are simdgroup matrices.

| Parameter | Type    | Description  |
| --------- | ------- | ------------ |
| `a`       | IRValue | Left matrix  |
| `b`       | IRValue | Right matrix |
| `c`       | IRValue | Accumulator  |

### `enigma.make_filled_simdgroup_matrix(value, elem="float", rows=8, cols=8) -> IRValue`

Create a simdgroup matrix initialized with a scalar value (typically zero
for accumulator setup).

| Parameter | Type              | Default   | Description  |
| --------- | ----------------- | --------- | ------------ |
| `value`   | IRValue or number | —         | Fill value   |
| `elem`    | str               | `"float"` | Element type |
| `rows`    | int               | `8`       | Rows         |
| `cols`    | int               | `8`       | Columns      |

### Example: single-tile GEMM

```python theme={null}
@enigma.kernel
def simd_gemm(A: enigma.f32, B: enigma.f32, C: enigma.f32):
    a_mat = enigma.simdgroup_matrix_load(A, elements_per_row=8)
    b_mat = enigma.simdgroup_matrix_load(B, elements_per_row=8)
    zero = enigma.metal_cast(0, "float")
    c_mat = enigma.make_filled_simdgroup_matrix(zero)
    result = enigma.simdgroup_multiply_accumulate(a_mat, b_mat, c_mat)
    enigma.simdgroup_matrix_store(result, C, elements_per_row=8)
```

For multi-tile GEMM, prefer [`enigma.gemm`](/programming-guide/high-level-ops#gemm)
— it picks the simdgroup path automatically when the tile shape is 8×8×8.

***

## Matrix-type operations

These work on regular Metal matrix types (e.g. `float4x4`), modeled as
multi-dimensional vector types in MLIR. They are useful for affine transforms
and small dense linear algebra inside compute kernels.

| Function                                   | Description                  |
| ------------------------------------------ | ---------------------------- |
| `enigma.matmul(a, b, result_dtype=None)`   | Matrix multiply: `a * b`     |
| `enigma.transpose(m, result_dtype=None)`   | Transpose                    |
| `enigma.determinant(m, scalar_dtype=None)` | Determinant (returns scalar) |

<Note>
  Construction of matrix-typed values is currently blocked on a dialect-side
  `mat_make` op. See `docs/blocked-features.md` in the repo for status.
</Note>
