Skip to main content

Register tensors

enigma.register_tensor creates a small fixed-size tensor backed by per-thread SSA values (registers). Unlike Tensor (device/threadgroup memory), indices must be compile-time constants.
enigma.register_tensor(shape, dtype="float", fill=0)
ParameterTypeDefaultDescription
shapetuplerequiredDimensions (e.g. (4, 4))
dtypestr"float"Element type
fillnumber0Initial value for all elements
acc = enigma.register_tensor((8, 8), dtype="float", fill=0.0)
with enigma.for_range(0, K) as k:
    for i in enigma.range_constexpr(8):
        for j in enigma.range_constexpr(8):
            acc[i, j] = enigma.fma(a_tile[i, k], b_tile[k, j], acc[i, j])

Copy

enigma.copy moves elements between buffers (device or threadgroup) using a traced for_range loop.
enigma.copy(src, dst, count, src_offset=0, dst_offset=0, mask_fn=None, coalesced_width=1)
ParameterTypeDefaultDescription
src, dstTensorrequiredSource and destination buffers
countintrequiredNumber of elements
src_offset, dst_offsetint or IRValue0Base offsets
mask_fncallable or NoneNonePer-element predicate fn(i) -> i1
coalesced_widthint1Wider loads/stores per iteration (1, 2, or 4)
tile = enigma.threadgroup_alloc("float", 256)
enigma.copy(A, tile, count=256, src_offset=block_start)
enigma.barrier()

Pipeline

enigma.pipeline creates a multi-stage ring buffer of threadgroup tiles for compute/load overlap.
enigma.pipeline(dtype, size, stages=2)
ParameterTypeDefaultDescription
dtypestrrequiredElement type
sizeintrequiredElements per tile
stagesint2Number of buffered stages (>= 2)
MethodDescription
pipe.front()Current iteration’s consume buffer (stage 0)
pipe.back()Most-distant prefetch buffer (stage N-1)
pipe.stage(k)Stage k buffer
pipe.advance()Rotate all stages by one (Python-side, no MSL emitted)
pipe = enigma.pipeline("float", 256, stages=3)

# Prefill first two stages
for s in range(2):
    enigma.copy(A, pipe.stage(s), count=256, src_offset=s * 256)
enigma.barrier()

with enigma.for_range(0, NUM_TILES) as t:
    compute(pipe.front())
    enigma.copy(A, pipe.back(), count=256, src_offset=(t + 2) * 256)
    enigma.barrier()
    pipe.advance()