Register tensors
enigma.register_tensor creates a small fixed-size tensor backed by per-thread SSA values (registers). Unlike Tensor (device/threadgroup memory), indices must be compile-time constants.
| Parameter | Type | Default | Description |
|---|---|---|---|
shape | tuple | required | Dimensions (e.g. (4, 4)) |
dtype | str | "float" | Element type |
fill | number | 0 | Initial value for all elements |
Copy
enigma.copy moves elements between buffers (device or threadgroup) using a traced for_range loop.
| Parameter | Type | Default | Description |
|---|---|---|---|
src, dst | Tensor | required | Source and destination buffers |
count | int | required | Number of elements |
src_offset, dst_offset | int or IRValue | 0 | Base offsets |
mask_fn | callable or None | None | Per-element predicate fn(i) -> i1 |
coalesced_width | int | 1 | Wider loads/stores per iteration (1, 2, or 4) |
Pipeline
enigma.pipeline creates a multi-stage ring buffer of threadgroup tiles for compute/load overlap.
| Parameter | Type | Default | Description |
|---|---|---|---|
dtype | str | required | Element type |
size | int | required | Elements per tile |
stages | int | 2 | Number of buffered stages (>= 2) |
| Method | Description |
|---|---|
pipe.front() | Current iteration’s consume buffer (stage 0) |
pipe.back() | Most-distant prefetch buffer (stage N-1) |
pipe.stage(k) | Stage k buffer |
pipe.advance() | Rotate all stages by one (Python-side, no MSL emitted) |
