Skip to main content
Enigma’s layout algebra is inspired by NVIDIA’s CuTe library. A Layout is a (shape, stride) pair that maps multi-dimensional logical coordinates to linear memory offsets. Composing and dividing layouts is how you express tiling, thread-value partitioning, and vectorized access patterns.

The Layout type

L = enigma.Layout(shape, stride)
A layout with shape=(4, 8) and stride=(1, 4) maps coordinate (r, c) to offset r*1 + c*4.
L = enigma.Layout((4, 8), (1, 4))
L((0, 0))   # → 0
L((1, 0))   # → 1
L((0, 1))   # → 4
L((2, 3))   # → 2*1 + 3*4 = 14

Total size

enigma.size(L)           # total elements: 4 * 8 = 32
enigma.size(L, mode=[0]) # size of mode 0: 4
enigma.size(L, mode=[1]) # size of mode 1: 8

Creating layouts

FunctionDescription
enigma.Layout(shape, stride)Explicit shape and stride
enigma.make_layout(shape, stride=None)Alias; omit stride for row-major default
enigma.make_ordered_layout(shape, order)Custom dimension ordering
enigma.make_identity_layout(shape)Column-major (order = (0, 1, ...))
# Row-major 4×8
row_major = enigma.Layout((4, 8), (8, 1))

# Column-major 4×8
col_major = enigma.Layout((4, 8), (1, 4))

# Custom order: slow-dim first, fast-dim second
ordered = enigma.make_ordered_layout((4, 64), order=(1, 0))  # 4 rows, 64 cols, rows vary fastest

Transforming layouts

coalesce

Merges adjacent modes with compatible strides into a single flat mode. Use after composition to simplify the layout.
L = enigma.Layout(((4, 8),), ((1, 4),))  # nested shape
L2 = enigma.coalesce(L)                   # → Layout((32,), (1,))

complement

Returns the layout that covers the elements not covered by the input layout, within a given total size.
Lc = enigma.complement(L)

logical_divide and zipped_divide

Split a layout into a (tile, rest) pair. zipped_divide applies the division per mode and returns a layout whose first mode is the tile and second mode is the rest.
tile = (16, 8)
Lt = enigma.zipped_divide(L, tile)
# Lt shape: ((tile_m, rest_m), (tile_n, rest_n))

composition

Compose two layouts: composition(a, b) treats b as a re-indexing of a.
composed = enigma.composition(a, b)

blocked_product

Compute the blocked outer product of two layouts:
Lb = enigma.blocked_product(a, b)

recast_layout

Rescale a layout for a different element bit width. For example, viewing a float32 layout as a float16 layout:
L_f16 = enigma.recast_layout(new_bits=16, old_bits=32, layout=L_f32)

Thread-value layout

make_layout_tv is the central tiling primitive. It takes a thread layout and a value layout and returns:
  1. A tiler describing the tile shape at each level
  2. A TV layout mapping (thread_id, value_id) → tile coordinate
thr = enigma.make_ordered_layout((4, 64), order=(1, 0))   # 256 threads in 4×64
val = enigma.make_ordered_layout((4, 4), order=(1, 0))    # 4×4 values per thread

tiler_mn, tv_layout = enigma.make_layout_tv(thr, val)

Using TV layouts in @enigma.jit

@enigma.jit
def tiled_kernel(mA, mB, mC):
    thr = enigma.make_ordered_layout((4, 64), order=(1, 0))
    val = enigma.make_ordered_layout((4, 4), order=(1, 0))
    tiler_mn, tv_layout = enigma.make_layout_tv(thr, val)

    # Partition tensor into blocks
    gA = enigma.tensor_zipped_divide(mA, tiler_mn)
    block_idx = ...

    # Slice one block
    blkA = gA[((None, None), block_idx)]

    # Per-thread fragment
    thrA = enigma.tensor_composition(blkA, tv_layout, tiler_mn)[(thread_idx, None)]

Tensor operations

FunctionDescription
enigma.tensor_zipped_divide(tensor, tiler)Partition tensor into tiles
enigma.tensor_composition(tensor, tv, tiler)Map thread-value indices to tensor
tensor.load()Vectorized load of a thread fragment
tensor.store(value)Vectorized store

Tiling workflow summary

  1. Define thread and value layouts with make_ordered_layout
  2. Call make_layout_tv to get tiler and TV layout
  3. In @enigma.jit, use tensor_zipped_divide to partition the global tensor
  4. Slice per-block with blk = gtensor[((None, None), block_idx)]
  5. Compose with TV layout to get per-thread fragment
  6. Use .load() and .store() on the fragment

Debugging layouts

When a tiling produces unexpected output, print shape and stride at each step:
print(tiler_mn)      # Layout((tile_m, tile_n), ...)
print(tv_layout)     # Layout(((thr_m, thr_n), (val_m, val_n)), ...)
print(enigma.size(tv_layout, mode=[0]))   # number of threads
print(enigma.size(tv_layout, mode=[1]))   # values per thread
Invariant: If any tiler dimension exceeds the tensor dimension, Enigma raises an EnigmaError rather than silently producing an invalid layout.