[PATCH bpf-next] bpf, docs: add LOAD_AQCUIRE and STORE_RELEASE instructions

Alexis Lothoré (eBPF Foundation) posted 1 patch 4 days, 8 hours ago
.../bpf/standardization/instruction-set.rst         | 21 ++++++++++++++-------
1 file changed, 14 insertions(+), 7 deletions(-)
[PATCH bpf-next] bpf, docs: add LOAD_AQCUIRE and STORE_RELEASE instructions
Posted by Alexis Lothoré (eBPF Foundation) 4 days, 8 hours ago
Commit 880442305a39 ("bpf: Introduce load-acquire and store-release
instructions") instroduced the LOAD_ACQUIRE and STORE_RELEASE atomic
instructions modifiers. Those are currently not described in the
documentation, despite being used in the verifier and the various JIT
compilers supporting them.

Add the missing entries in the instruction set documentation.

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
---
 .../bpf/standardization/instruction-set.rst         | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/Documentation/bpf/standardization/instruction-set.rst b/Documentation/bpf/standardization/instruction-set.rst
index 39c74611752b..4f10bcd03150 100644
--- a/Documentation/bpf/standardization/instruction-set.rst
+++ b/Documentation/bpf/standardization/instruction-set.rst
@@ -695,22 +695,24 @@ arithmetic operations in the 'imm' field to encode the atomic operation:
   *(u64 *)(dst + offset) += src
 
 In addition to the simple atomic operations, there also is a modifier and
-two complex atomic operations:
+four complex atomic operations:
 
 .. table:: Complex atomic operations
 
   ===========  ================  ===========================
   imm          value             description
   ===========  ================  ===========================
-  FETCH        0x01              modifier: return old value
-  XCHG         0xe0 | FETCH      atomic exchange
-  CMPXCHG      0xf0 | FETCH      atomic compare and exchange
+  FETCH        0x0001            modifier: return old value
+  XCHG         0x00e0 | FETCH    atomic exchange
+  CMPXCHG      0x00f0 | FETCH    atomic compare and exchange
+  LOAD_ACQ     0x0100            atomic load with barrier
+  STORE_REL    0x0110            atomic store with barrier
   ===========  ================  ===========================
 
 The ``FETCH`` modifier is optional for simple atomic operations, and
-always set for the complex atomic operations.  If the ``FETCH`` flag
-is set, then the operation also overwrites ``src`` with the value that
-was in memory before it was modified.
+always set for the ``XCHG`` and ``CMPXCHG`` complex atomic operations.  If
+the ``FETCH`` flag is set, then the operation also overwrites ``src`` with
+the value that was in memory before it was modified.
 
 The ``XCHG`` operation atomically exchanges ``src`` with the value
 addressed by ``dst + offset``.
@@ -721,6 +723,11 @@ The ``CMPXCHG`` operation atomically compares the value addressed by
 value that was at ``dst + offset`` before the operation is zero-extended
 and loaded back to ``R0``.
 
+The ``LOAD_ACQ`` and ``STORE_REL`` operations implement lighter LOAD and
+STORE memory barriers than full barriers. The corresponding accesses must
+be aligned, but are allowed for any access size (8-bit up to 64-bit
+operations).
+
 64-bit immediate instructions
 -----------------------------
 

---
base-commit: ceeb3aa37bff895116944acf4347fcded0b7692d
change-id: 20260520-bpf-insn-doc-756b369ca328

Best regards,
--  
Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>

Re: [PATCH bpf-next] bpf, docs: add LOAD_AQCUIRE and STORE_RELEASE instructions
Posted by bot+bpf-ci@kernel.org 4 days, 7 hours ago
> diff --git a/Documentation/bpf/standardization/instruction-set.rst b/Documentation/bpf/standardization/instruction-set.rst
> --- a/Documentation/bpf/standardization/instruction-set.rst
> +++ b/Documentation/bpf/standardization/instruction-set.rst
> @@ -695,22 +695,24 @@
>    *(u64 *)(dst + offset) += src
>
>  In addition to the simple atomic operations, there also is a modifier and
> -two complex atomic operations:
> +four complex atomic operations:
>
>  .. table:: Complex atomic operations
>
>    ===========  ================  ===========================
>    imm          value             description
>    ===========  ================  ===========================
> -  FETCH        0x01              modifier: return old value
> -  XCHG         0xe0 | FETCH      atomic exchange
> -  CMPXCHG      0xf0 | FETCH      atomic compare and exchange
> +  FETCH        0x0001            modifier: return old value
> +  XCHG         0x00e0 | FETCH    atomic exchange
> +  CMPXCHG      0x00f0 | FETCH    atomic compare and exchange
> +  LOAD_ACQ     0x0100            atomic load with barrier
> +  STORE_REL    0x0110            atomic store with barrier
>    ===========  ================  ===========================
>
>  The ``FETCH`` modifier is optional for simple atomic operations, and
> -always set for the complex atomic operations.  If the ``FETCH`` flag
> -is set, then the operation also overwrites ``src`` with the value that
> -was in memory before it was modified.
> +always set for the ``XCHG`` and ``CMPXCHG`` complex atomic operations.  If
> +the ``FETCH`` flag is set, then the operation also overwrites ``src`` with
> +the value that was in memory before it was modified.
>
>  The ``XCHG`` operation atomically exchanges ``src`` with the value
>  addressed by ``dst + offset``.

[ ... ]

> @@ -721,6 +723,11 @@
>  value that was at ``dst + offset`` before the operation is zero-extended
>  and loaded back to ``R0``.
>
> +The ``LOAD_ACQ`` and ``STORE_REL`` operations implement lighter LOAD and
> +STORE memory barriers than full barriers. The corresponding accesses must
> +be aligned, but are allowed for any access size (8-bit up to 64-bit
> +operations).

Does the LOAD_ACQ documentation need to clarify its non-standard register
semantics? BPF atomic operations are encoded as STX instructions, which
conventionally use dst as the memory base pointer and src as the value
register. However, LOAD_ACQ reverses this pattern by using src as the
memory base pointer and dst as the destination register for the fetched
value, acting like an LDX instruction.

Additionally, should the documentation clarify that the loaded value is
zero-extended into the 64-bit destination register for 8, 16, and 32-bit
accesses? Omitting these details might lead to incorrect implementations
based on standard STX rules.

This contradiction creates specification ambiguity: the documentation
states that LOAD_ACQ and STORE_REL "are allowed for any access size
(8-bit up to 64-bit operations)", but earlier in the document (line 671)
explicitly states "8-bit and 16-bit wide atomic operations are not
supported."

The instruction encoding section (lines 667-671) only defines two size
encodings for ATOMIC mode:
- {ATOMIC, W, STX} for 32-bit operations
- {ATOMIC, DW, STX} for 64-bit operations

No encoding is shown for B (byte/8-bit) or H (half-word/16-bit) with
ATOMIC mode.

If LOAD_ACQ/STORE_REL truly support 8-bit and 16-bit sizes, shouldn't the
encoding section document how B and H size modifiers work with ATOMIC
mode, and line 671 clarify the exception?

Alternatively, if LOAD_ACQ/STORE_REL only support 32-bit and 64-bit sizes
like other atomic operations, should line 728 be corrected to state
"32-bit and 64-bit operations" instead of "8-bit up to 64-bit operations"?

For a standardized instruction set specification used by multiple
implementations (verifier, JIT compilers, potentially other BPF runtimes),
internal contradictions can lead to implementation divergence and
interoperability issues.

> +
>  64-bit immediate instructions
>  -----------------------------
>


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/26170628668
Re: [PATCH bpf-next] bpf, docs: add LOAD_AQCUIRE and STORE_RELEASE instructions
Posted by Alexis Lothoré 4 days, 6 hours ago
On Wed May 20, 2026 at 5:18 PM CEST, bot+bpf-ci wrote:
>> diff --git a/Documentation/bpf/standardization/instruction-set.rst b/Documentation/bpf/standardization/instruction-set.rst
>> --- a/Documentation/bpf/standardization/instruction-set.rst
>> +++ b/Documentation/bpf/standardization/instruction-set.rst
>> @@ -695,22 +695,24 @@
>>    *(u64 *)(dst + offset) += src
>>
>>  In addition to the simple atomic operations, there also is a modifier and
>> -two complex atomic operations:
>> +four complex atomic operations:
>>
>>  .. table:: Complex atomic operations
>>
>>    ===========  ================  ===========================
>>    imm          value             description
>>    ===========  ================  ===========================
>> -  FETCH        0x01              modifier: return old value
>> -  XCHG         0xe0 | FETCH      atomic exchange
>> -  CMPXCHG      0xf0 | FETCH      atomic compare and exchange
>> +  FETCH        0x0001            modifier: return old value
>> +  XCHG         0x00e0 | FETCH    atomic exchange
>> +  CMPXCHG      0x00f0 | FETCH    atomic compare and exchange
>> +  LOAD_ACQ     0x0100            atomic load with barrier
>> +  STORE_REL    0x0110            atomic store with barrier
>>    ===========  ================  ===========================
>>
>>  The ``FETCH`` modifier is optional for simple atomic operations, and
>> -always set for the complex atomic operations.  If the ``FETCH`` flag
>> -is set, then the operation also overwrites ``src`` with the value that
>> -was in memory before it was modified.
>> +always set for the ``XCHG`` and ``CMPXCHG`` complex atomic operations.  If
>> +the ``FETCH`` flag is set, then the operation also overwrites ``src`` with
>> +the value that was in memory before it was modified.
>>
>>  The ``XCHG`` operation atomically exchanges ``src`` with the value
>>  addressed by ``dst + offset``.
>
> [ ... ]
>
>> @@ -721,6 +723,11 @@
>>  value that was at ``dst + offset`` before the operation is zero-extended
>>  and loaded back to ``R0``.
>>
>> +The ``LOAD_ACQ`` and ``STORE_REL`` operations implement lighter LOAD and
>> +STORE memory barriers than full barriers. The corresponding accesses must
>> +be aligned, but are allowed for any access size (8-bit up to 64-bit
>> +operations).
>
> Does the LOAD_ACQ documentation need to clarify its non-standard register
> semantics? BPF atomic operations are encoded as STX instructions, which
> conventionally use dst as the memory base pointer and src as the value
> register. However, LOAD_ACQ reverses this pattern by using src as the
> memory base pointer and dst as the destination register for the fetched
> value, acting like an LDX instruction.

True, I can specify that both are anyway supported by a STX instruction. 

> Additionally, should the documentation clarify that the loaded value is
> zero-extended into the 64-bit destination register for 8, 16, and 32-bit
> accesses? Omitting these details might lead to incorrect implementations
> based on standard STX rules.

I guess I can add this detail, yes.

> This contradiction creates specification ambiguity: the documentation
> states that LOAD_ACQ and STORE_REL "are allowed for any access size
> (8-bit up to 64-bit operations)", but earlier in the document (line 671)
> explicitly states "8-bit and 16-bit wide atomic operations are not
> supported."
>
> The instruction encoding section (lines 667-671) only defines two size
> encodings for ATOMIC mode:
> - {ATOMIC, W, STX} for 32-bit operations
> - {ATOMIC, DW, STX} for 64-bit operations
>
> No encoding is shown for B (byte/8-bit) or H (half-word/16-bit) with
> ATOMIC mode.
>
> If LOAD_ACQ/STORE_REL truly support 8-bit and 16-bit sizes, shouldn't the
> encoding section document how B and H size modifiers work with ATOMIC
> mode, and line 671 clarify the exception?

This point, and the corresponding mentions to the "atomic32 conformance
group" and "atomic64 conformance group", made me realize that the kernel
doc seems to be in sync with the eBPF ISA RFC
(https://www.rfc-editor.org/rfc/rfc9669.html). It makes me wonder if
it's really ok to add those LOAD_ACQUIRE/STORE_RELEASE mentions in the
kernel doc only ?

> Alternatively, if LOAD_ACQ/STORE_REL only support 32-bit and 64-bit sizes
> like other atomic operations, should line 728 be corrected to state
> "32-bit and 64-bit operations" instead of "8-bit up to 64-bit operations"?
>
> For a standardized instruction set specification used by multiple
> implementations (verifier, JIT compilers, potentially other BPF runtimes),
> internal contradictions can lead to implementation divergence and
> interoperability issues.
>
>> +
>>  64-bit immediate instructions
>>  -----------------------------
>>
>
>
> ---
> AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
> See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
>
> CI run summary: https://github.com/kernel-patches/bpf/actions/runs/26170628668




-- 
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
Re: [PATCH bpf-next] bpf, docs: add LOAD_AQCUIRE and STORE_RELEASE instructions
Posted by Alexei Starovoitov 4 days, 6 hours ago
On Wed, May 20, 2026 at 5:46 PM Alexis Lothoré
<alexis.lothore@bootlin.com> wrote:
>
> On Wed May 20, 2026 at 5:18 PM CEST, bot+bpf-ci wrote:
> >> diff --git a/Documentation/bpf/standardization/instruction-set.rst b/Documentation/bpf/standardization/instruction-set.rst
> >> --- a/Documentation/bpf/standardization/instruction-set.rst
> >> +++ b/Documentation/bpf/standardization/instruction-set.rst
> >> @@ -695,22 +695,24 @@
> >>    *(u64 *)(dst + offset) += src
> >>
> >>  In addition to the simple atomic operations, there also is a modifier and
> >> -two complex atomic operations:
> >> +four complex atomic operations:
> >>
> >>  .. table:: Complex atomic operations
> >>
> >>    ===========  ================  ===========================
> >>    imm          value             description
> >>    ===========  ================  ===========================
> >> -  FETCH        0x01              modifier: return old value
> >> -  XCHG         0xe0 | FETCH      atomic exchange
> >> -  CMPXCHG      0xf0 | FETCH      atomic compare and exchange
> >> +  FETCH        0x0001            modifier: return old value
> >> +  XCHG         0x00e0 | FETCH    atomic exchange
> >> +  CMPXCHG      0x00f0 | FETCH    atomic compare and exchange
> >> +  LOAD_ACQ     0x0100            atomic load with barrier
> >> +  STORE_REL    0x0110            atomic store with barrier
> >>    ===========  ================  ===========================
> >>
> >>  The ``FETCH`` modifier is optional for simple atomic operations, and
> >> -always set for the complex atomic operations.  If the ``FETCH`` flag
> >> -is set, then the operation also overwrites ``src`` with the value that
> >> -was in memory before it was modified.
> >> +always set for the ``XCHG`` and ``CMPXCHG`` complex atomic operations.  If
> >> +the ``FETCH`` flag is set, then the operation also overwrites ``src`` with
> >> +the value that was in memory before it was modified.
> >>
> >>  The ``XCHG`` operation atomically exchanges ``src`` with the value
> >>  addressed by ``dst + offset``.
> >
> > [ ... ]
> >
> >> @@ -721,6 +723,11 @@
> >>  value that was at ``dst + offset`` before the operation is zero-extended
> >>  and loaded back to ``R0``.
> >>
> >> +The ``LOAD_ACQ`` and ``STORE_REL`` operations implement lighter LOAD and
> >> +STORE memory barriers than full barriers. The corresponding accesses must
> >> +be aligned, but are allowed for any access size (8-bit up to 64-bit
> >> +operations).
> >
> > Does the LOAD_ACQ documentation need to clarify its non-standard register
> > semantics? BPF atomic operations are encoded as STX instructions, which
> > conventionally use dst as the memory base pointer and src as the value
> > register. However, LOAD_ACQ reverses this pattern by using src as the
> > memory base pointer and dst as the destination register for the fetched
> > value, acting like an LDX instruction.
>
> True, I can specify that both are anyway supported by a STX instruction.
>
> > Additionally, should the documentation clarify that the loaded value is
> > zero-extended into the 64-bit destination register for 8, 16, and 32-bit
> > accesses? Omitting these details might lead to incorrect implementations
> > based on standard STX rules.
>
> I guess I can add this detail, yes.
>
> > This contradiction creates specification ambiguity: the documentation
> > states that LOAD_ACQ and STORE_REL "are allowed for any access size
> > (8-bit up to 64-bit operations)", but earlier in the document (line 671)
> > explicitly states "8-bit and 16-bit wide atomic operations are not
> > supported."
> >
> > The instruction encoding section (lines 667-671) only defines two size
> > encodings for ATOMIC mode:
> > - {ATOMIC, W, STX} for 32-bit operations
> > - {ATOMIC, DW, STX} for 64-bit operations
> >
> > No encoding is shown for B (byte/8-bit) or H (half-word/16-bit) with
> > ATOMIC mode.
> >
> > If LOAD_ACQ/STORE_REL truly support 8-bit and 16-bit sizes, shouldn't the
> > encoding section document how B and H size modifiers work with ATOMIC
> > mode, and line 671 clarify the exception?
>
> This point, and the corresponding mentions to the "atomic32 conformance
> group" and "atomic64 conformance group", made me realize that the kernel
> doc seems to be in sync with the eBPF ISA RFC
> (https://www.rfc-editor.org/rfc/rfc9669.html). It makes me wonder if
> it's really ok to add those LOAD_ACQUIRE/STORE_RELEASE mentions in the
> kernel doc only ?

It's ok. It already diverged a bit. Eventually we will do an RFC update.