MetalRuntime manages the Metal device, command queue, and buffer lifecycle. It dispatches compiled kernels and returns output data.
Constructor
dylib_path overrides the default path to the Swift runtime dylib. Leave it None in almost all cases.
execute()
One-shot dispatch. Allocates buffers, runs the kernel, reads output, and returns raw bytes.
Parameters
| Parameter | Type | Description |
|---|---|---|
compiled | CompiledKernel | Compiled kernel from enigma.compile() |
inputs | list[np.ndarray] | Input buffers in kernel parameter order |
output_size | int | Output buffer size in bytes |
grid | tuple | (gx, gy, gz) — total threads |
threads | tuple | (tx, ty, tz) — threads per threadgroup |
Returns
Rawbytes of the output buffer. Convert with np.frombuffer:
prepare()
Pre-allocate Metal buffers for repeated dispatch. Use this in hot loops to avoid per-call allocation overhead.
device_capabilities()
Query hardware features of the active Metal device.
DeviceCapabilities below.
PreparedKernel
Returned by rt.prepare(...). Holds pre-allocated Metal buffers.
dispatch()
read_output() afterward.
dispatch_timed()
read_output()
Example: benchmarking loop
DeviceCapabilities
Fields
| Field | Type | Description |
|---|---|---|
gpu_family | str | Human-readable GPU family, e.g. "apple9" |
gpu_family_raw | int | Metal GPU family integer |
is_m3_or_newer | bool | True if device is M3 chip or newer |
supports_async_copy | bool | Async threadgroup copy support (M3+) |
supports_simdgroup_matrix | bool | 8×8 simdgroup matrix multiply |
simdgroup_size | int | SIMD group size (typically 32) |
max_threadgroup_memory | int | Max threadgroup memory in bytes |
max_threads_per_threadgroup | int | Typically 1024 |
require_m3()
RuntimeError if the device is older than M3. Pass a description of what you’re gating:
Example
Dispatch errors
When dispatch fails,rt.execute() raises a RuntimeError containing:
- Kernel name
gridandthreadsused- The underlying Metal error code and description
