Layout Algebra

Enigma’s layout algebra is inspired by NVIDIA’s CuTe library. A Layout is a (shape, stride) pair that maps multi-dimensional logical coordinates to linear memory offsets. Composing and dividing layouts is how you express tiling, thread-value partitioning, and vectorized access patterns.

The Layout type

L = enigma.Layout(shape, stride)

A layout with shape=(4, 8) and stride=(1, 4) maps coordinate (r, c) to offset r*1 + c*4.

L = enigma.Layout((4, 8), (1, 4))
L((0, 0))   # → 0
L((1, 0))   # → 1
L((0, 1))   # → 4
L((2, 3))   # → 2*1 + 3*4 = 14

Total size

enigma.size(L)           # total elements: 4 * 8 = 32
enigma.size(L, mode=[0]) # size of mode 0: 4
enigma.size(L, mode=[1]) # size of mode 1: 8

Creating layouts

Function	Description
`enigma.Layout(shape, stride)`	Explicit shape and stride
`enigma.make_layout(shape, stride=None)`	Alias; omit stride for row-major default
`enigma.make_ordered_layout(shape, order)`	Custom dimension ordering
`enigma.make_identity_layout(shape)`	Column-major (order = `(0, 1, ...)`)

# Row-major 4×8
row_major = enigma.Layout((4, 8), (8, 1))

# Column-major 4×8
col_major = enigma.Layout((4, 8), (1, 4))

# Custom order: slow-dim first, fast-dim second
ordered = enigma.make_ordered_layout((4, 64), order=(1, 0))  # 4 rows, 64 cols, rows vary fastest

Transforming layouts

`coalesce`

Merges adjacent modes with compatible strides into a single flat mode. Use after composition to simplify the layout.

L = enigma.Layout(((4, 8),), ((1, 4),))  # nested shape
L2 = enigma.coalesce(L)                   # → Layout((32,), (1,))

`complement`

Returns the layout that covers the elements not covered by the input layout, within a given total size.

Lc = enigma.complement(L)

`logical_divide` and `zipped_divide`

Split a layout into a (tile, rest) pair. zipped_divide applies the division per mode and returns a layout whose first mode is the tile and second mode is the rest.

tile = (16, 8)
Lt = enigma.zipped_divide(L, tile)
# Lt shape: ((tile_m, rest_m), (tile_n, rest_n))

`composition`

Compose two layouts: composition(a, b) treats b as a re-indexing of a.

composed = enigma.composition(a, b)

`blocked_product`

Compute the blocked outer product of two layouts:

Lb = enigma.blocked_product(a, b)

`recast_layout`

Rescale a layout for a different element bit width. For example, viewing a float32 layout as a float16 layout:

L_f16 = enigma.recast_layout(new_bits=16, old_bits=32, layout=L_f32)

Thread-value layout

make_layout_tv is the central tiling primitive. It takes a thread layout and a value layout and returns:

A tiler describing the tile shape at each level
A TV layout mapping (thread_id, value_id) → tile coordinate

thr = enigma.make_ordered_layout((4, 64), order=(1, 0))   # 256 threads in 4×64
val = enigma.make_ordered_layout((4, 4), order=(1, 0))    # 4×4 values per thread

tiler_mn, tv_layout = enigma.make_layout_tv(thr, val)

Using TV layouts in `@enigma.jit`

@enigma.jit
def tiled_kernel(mA, mB, mC):
    thr = enigma.make_ordered_layout((4, 64), order=(1, 0))
    val = enigma.make_ordered_layout((4, 4), order=(1, 0))
    tiler_mn, tv_layout = enigma.make_layout_tv(thr, val)

    # Partition tensor into blocks
    gA = enigma.tensor_zipped_divide(mA, tiler_mn)
    block_idx = ...

    # Slice one block
    blkA = gA[((None, None), block_idx)]

    # Per-thread fragment
    thrA = enigma.tensor_composition(blkA, tv_layout, tiler_mn)[(thread_idx, None)]

Tensor operations

Function	Description
`enigma.tensor_zipped_divide(tensor, tiler)`	Partition tensor into tiles
`enigma.tensor_composition(tensor, tv, tiler)`	Map thread-value indices to tensor
`tensor.load()`	Vectorized load of a thread fragment
`tensor.store(value)`	Vectorized store

Tiling workflow summary

Define thread and value layouts with make_ordered_layout
Call make_layout_tv to get tiler and TV layout
In @enigma.jit, use tensor_zipped_divide to partition the global tensor
Slice per-block with blk = gtensor[((None, None), block_idx)]
Compose with TV layout to get per-thread fragment
Use .load() and .store() on the fragment

Debugging layouts

When a tiling produces unexpected output, print shape and stride at each step:

print(tiler_mn)      # Layout((tile_m, tile_n), ...)
print(tv_layout)     # Layout(((thr_m, thr_n), (val_m, val_n)), ...)
print(enigma.size(tv_layout, mode=[0]))   # number of threads
print(enigma.size(tv_layout, mode=[1]))   # values per thread

Invariant: If any tiler dimension exceeds the tensor dimension, Enigma raises an EnigmaError rather than silently producing an invalid layout.

​The Layout type

​Total size

​Creating layouts

​Transforming layouts

​coalesce

​complement

​logical_divide and zipped_divide

​composition

​blocked_product

​recast_layout

​Thread-value layout

​Using TV layouts in @enigma.jit

​Tensor operations

​Tiling workflow summary

​Debugging layouts