Skip to main content
April 2025
New release

v0.1.0 — Initial Release

First public release of Enigma DSL for Apple Metal.

New features

Layout Algebra

CuTe-style primitives: Layout, coalesce, complement, zipped_divide, recast_layout, make_layout_tv, tensor_zipped_divide, tensor_composition.

Compiler Pipeline

Trace pipeline for @enigma.kernel and @enigma.jit. MLIR + Metal emission via xcrun metal / xcrun metallib. Vectorized codegen with vec_width (float4 paths).

Metal Runtime

MetalRuntime dispatch via Swift dylib bridge. PreparedKernel for pre-allocated repeated dispatch. dispatch_timed() for benchmarking.

Device Capabilities

DeviceCapabilities reporting including M3+ feature detection. Scalar parameter packing via enigma.Scalar(...).

DSL surface

  • Full set of thread, threadgroup, and SIMD index query ops
  • Extended dtypes: bf16, i64, u64, i8, u8, i16, u16, b1
  • Arithmetic, float math, integer math, and bit manipulation ops
  • SIMD group and quad group reductions, prefix scans, and shuffle ops
  • Atomic operations with full memory ordering: relaxed, acquire, release, acq_rel
  • Threadgroup shared memory via threadgroup_alloc and barriers
  • Simdgroup matrix ops: simdgroup_matrix_load, simdgroup_multiply_accumulate, simdgroup_matrix_store
  • Control-flow tracing: for_range, if_, while_
  • Predicated load/store: load_if, store_if
  • Register/pipeline primitives: register_tensor, copy, pipeline

Export

  • CompiledKernel.export_metal() — export generated MSL source
  • keep_metal_source=True — retain .metal artifact after compilation

Testing

  • Portable CI: tracing, layout algebra, and compile surfaces under tests/portable
  • Metal runtime execution tests under tests/metal