> ## Documentation Index
> Fetch the complete documentation index at: https://klyne-research.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Atomic Operations

> Atomic load, store, and RMW operations with memory ordering semantics.

Enigma provides a full set of atomic operations for safe concurrent access to shared device memory. All atomic ops map directly to Metal's `atomic_*` functions.

## Memory ordering

All atomic operations accept a `memory_order` argument:

| Order       | Description                                                      |
| ----------- | ---------------------------------------------------------------- |
| `"relaxed"` | No ordering guarantees beyond atomicity                          |
| `"acquire"` | All subsequent loads observe stores released before this acquire |
| `"release"` | All prior stores become visible to threads that acquire this     |
| `"acq_rel"` | Combines acquire and release semantics                           |

The default is `"relaxed"` unless otherwise specified.

## Atomic load and store

```python theme={null}
val = enigma.atomic_load(ptr, index, memory_order="acquire")
enigma.atomic_store(ptr, index, value, memory_order="release")
```

## Atomic exchange

Returns the old value and stores `value` atomically:

```python theme={null}
old = enigma.atomic_exchange(ptr, index, value, memory_order="acq_rel")
```

## Atomic fetch-and-modify (RMW)

Each operation atomically reads the current value, applies the operation, stores the result, and returns the **old** value.

| Function                                        | Operation               |
| ----------------------------------------------- | ----------------------- |
| `enigma.atomic_fetch_add(ptr, idx, val, order)` | `*ptr += val`           |
| `enigma.atomic_fetch_sub(ptr, idx, val, order)` | `*ptr -= val`           |
| `enigma.atomic_fetch_min(ptr, idx, val, order)` | `*ptr = min(*ptr, val)` |
| `enigma.atomic_fetch_max(ptr, idx, val, order)` | `*ptr = max(*ptr, val)` |
| `enigma.atomic_fetch_and(ptr, idx, val, order)` | `*ptr &= val`           |
| `enigma.atomic_fetch_or(ptr, idx, val, order)`  | `*ptr \|= val`          |
| `enigma.atomic_fetch_xor(ptr, idx, val, order)` | `*ptr ^= val`           |

### Example: global counter

```python theme={null}
@enigma.kernel
def count_positives(A: enigma.f32, Counter: enigma.i32):
    tid = enigma.thread_position_in_grid
    val = A[tid]
    with enigma.if_(enigma.cmp_gt(val, enigma.f32(0.0))):
        enigma.atomic_fetch_add(Counter, 0, 1, memory_order="relaxed")
```

### Example: parallel histogram

```python theme={null}
@enigma.kernel
def histogram(A: enigma.u32, Hist: enigma.u32):
    tid = enigma.thread_position_in_grid
    bin_idx = A[tid] % 256
    enigma.atomic_fetch_add(Hist, bin_idx, 1, memory_order="relaxed")
```

## Compare-and-swap (CAS)

`atomic_compare_exchange_weak` attempts to replace `expected` with `desired`. Returns `True` if the swap succeeded:

```python theme={null}
swapped = enigma.atomic_compare_exchange_weak(
    ptr, index,
    expected,
    desired,
    success_order="acq_rel",
    failure_order="acquire",
)
```

### Example: lock-free maximum

```python theme={null}
@enigma.kernel
def atomic_max_f32(A: enigma.f32, Max: enigma.u32):
    tid = enigma.thread_position_in_grid
    val_bits = enigma.as_type(A[tid], enigma.u32)
    # CAS loop to update running maximum using bit representation
    with enigma.while_(lambda: ...):
        old_bits = enigma.atomic_load(Max, 0, memory_order="relaxed")
        with enigma.if_(enigma.cmp_le(val_bits, old_bits)):
            enigma.break_()
        swapped = enigma.atomic_compare_exchange_weak(
            Max, 0, old_bits, val_bits,
            success_order="release", failure_order="relaxed",
        )
```

## Ordering guidelines

| Use case                       | Recommended order |
| ------------------------------ | ----------------- |
| Independent counter increments | `"relaxed"`       |
| Producer side of a flag        | `"release"`       |
| Consumer side of a flag        | `"acquire"`       |
| RMW used as a mutex            | `"acq_rel"`       |

Using `"relaxed"` where ordering matters is a common source of subtle bugs. When in doubt, use `"acq_rel"`.
