This commit adds four litmus tests showing that a failing cmpxchg()
operation is unordered unless followed by an smp_mb__after_atomic()
operation.
Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jade Alglave <j.alglave@ucl.ac.uk>
Cc: Luc Maranget <luc.maranget@inria.fr>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Akira Yokosawa <akiyks@gmail.com>
Cc: Daniel Lustig <dlustig@nvidia.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: <linux-arch@vger.kernel.org>
Cc: <linux-doc@vger.kernel.org>
---
Documentation/litmus-tests/README | 16 +++++++++
.../atomic/cmpxchg-fail-ordered-1.litmus | 34 +++++++++++++++++++
.../atomic/cmpxchg-fail-ordered-2.litmus | 30 ++++++++++++++++
.../atomic/cmpxchg-fail-unordered-1.litmus | 33 ++++++++++++++++++
.../atomic/cmpxchg-fail-unordered-2.litmus | 30 ++++++++++++++++
5 files changed, 143 insertions(+)
create mode 100644 Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-1.litmus
create mode 100644 Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-2.litmus
create mode 100644 Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-1.litmus
create mode 100644 Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-2.litmus
diff --git a/Documentation/litmus-tests/README b/Documentation/litmus-tests/README
index 26ca56df02125..6c666f3422ea3 100644
--- a/Documentation/litmus-tests/README
+++ b/Documentation/litmus-tests/README
@@ -21,6 +21,22 @@ Atomic-RMW-ops-are-atomic-WRT-atomic_set.litmus
Test that atomic_set() cannot break the atomicity of atomic RMWs.
NOTE: Require herd7 7.56 or later which supports "(void)expr".
+cmpxchg-fail-ordered-1.litmus
+ Demonstrate that a failing cmpxchg() operation acts as a full barrier
+ when followed by smp_mb__after_atomic().
+
+cmpxchg-fail-ordered-2.litmus
+ Demonstrate that a failing cmpxchg() operation acts as an acquire
+ operation when followed by smp_mb__after_atomic().
+
+cmpxchg-fail-unordered-1.litmus
+ Demonstrate that a failing cmpxchg() operation does not act as a
+ full barrier.
+
+cmpxchg-fail-unordered-2.litmus
+ Demonstrate that a failing cmpxchg() operation does not act as an
+ acquire operation.
+
locking (/locking directory)
----------------------------
diff --git a/Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-1.litmus b/Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-1.litmus
new file mode 100644
index 0000000000000..3df1d140b189b
--- /dev/null
+++ b/Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-1.litmus
@@ -0,0 +1,34 @@
+C cmpxchg-fail-ordered-1
+
+(*
+ * Result: Never
+ *
+ * Demonstrate that a failing cmpxchg() operation will act as a full
+ * barrier when followed by smp_mb__after_atomic().
+ *)
+
+{}
+
+P0(int *x, int *y, int *z)
+{
+ int r0;
+ int r1;
+
+ WRITE_ONCE(*x, 1);
+ r1 = cmpxchg(z, 1, 0);
+ smp_mb__after_atomic();
+ r0 = READ_ONCE(*y);
+}
+
+P1(int *x, int *y, int *z)
+{
+ int r0;
+
+ WRITE_ONCE(*y, 1);
+ r1 = cmpxchg(z, 1, 0);
+ smp_mb__after_atomic();
+ r0 = READ_ONCE(*x);
+}
+
+locations[0:r1;1:r1]
+exists (0:r0=0 /\ 1:r0=0)
diff --git a/Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-2.litmus b/Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-2.litmus
new file mode 100644
index 0000000000000..54146044a16f6
--- /dev/null
+++ b/Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-2.litmus
@@ -0,0 +1,30 @@
+C cmpxchg-fail-ordered-2
+
+(*
+ * Result: Never
+ *
+ * Demonstrate use of smp_mb__after_atomic() to make a failing cmpxchg
+ * operation have acquire ordering.
+ *)
+
+{}
+
+P0(int *x, int *y)
+{
+ int r0;
+ int r1;
+
+ WRITE_ONCE(*x, 1);
+ r1 = cmpxchg(y, 0, 1);
+}
+
+P1(int *x, int *y)
+{
+ int r0;
+
+ r1 = cmpxchg(y, 0, 1);
+ smp_mb__after_atomic();
+ r2 = READ_ONCE(*x);
+}
+
+exists (0:r1=0 /\ 1:r1=1 /\ 1:r2=0)
diff --git a/Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-1.litmus b/Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-1.litmus
new file mode 100644
index 0000000000000..a727ce23b1a6e
--- /dev/null
+++ b/Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-1.litmus
@@ -0,0 +1,33 @@
+C cmpxchg-fail-unordered-1
+
+(*
+ * Result: Sometimes
+ *
+ * Demonstrate that a failing cmpxchg() operation does not act as a
+ * full barrier. (In contrast, a successful cmpxchg() does act as a
+ * full barrier.)
+ *)
+
+{}
+
+P0(int *x, int *y, int *z)
+{
+ int r0;
+ int r1;
+
+ WRITE_ONCE(*x, 1);
+ r1 = cmpxchg(z, 1, 0);
+ r0 = READ_ONCE(*y);
+}
+
+P1(int *x, int *y, int *z)
+{
+ int r0;
+
+ WRITE_ONCE(*y, 1);
+ r1 = cmpxchg(z, 1, 0);
+ r0 = READ_ONCE(*x);
+}
+
+locations[0:r1;1:r1]
+exists (0:r0=0 /\ 1:r0=0)
diff --git a/Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-2.litmus b/Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-2.litmus
new file mode 100644
index 0000000000000..a245bac55b578
--- /dev/null
+++ b/Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-2.litmus
@@ -0,0 +1,30 @@
+C cmpxchg-fail-unordered-2
+
+(*
+ * Result: Sometimes
+ *
+ * Demonstrate that a failing cmpxchg() operation does not act as either
+ * an acquire release operation. (In contrast, a successful cmpxchg()
+ * does act as both an acquire and a release operation.)
+ *)
+
+{}
+
+P0(int *x, int *y)
+{
+ int r0;
+ int r1;
+
+ WRITE_ONCE(*x, 1);
+ r1 = cmpxchg(y, 0, 1);
+}
+
+P1(int *x, int *y)
+{
+ int r0;
+
+ r1 = cmpxchg(y, 0, 1);
+ r2 = READ_ONCE(*x);
+}
+
+exists (0:r1=0 /\ 1:r1=1 /\ 1:r2=0)
--
2.40.1
Am 5/2/2024 um 1:21 AM schrieb Paul E. McKenney:
> This commit adds four litmus tests showing that a failing cmpxchg()
> operation is unordered unless followed by an smp_mb__after_atomic()
> operation.
So far, my understanding was that all RMW operations without suffix
(xchg(), cmpxchg(), ...) will be interpreted as F[Mb];...;F[Mb].
I guess this shows again how important it is to model these full
barriers explicitly inside the cat model, instead of relying on implicit
conversions internal to herd.
I'd like to propose a patch to this effect.
What is the intended behavior of a failed cmpxchg()? Is it the same as a
relaxed one?
My suggestion would be in the direction of marking read and write events
of these operations as Mb, and then defining
(* full barrier events that appear in non-failing RMW *)
let RMW_MB = Mb & (dom(rmw) | range(rmw))
let mb =
[M] ; fencerel(Mb) ; [M]
| [M] ; (po \ si ; rmw) ; [RMW_MB] ; po^? ; [M]
| [M] ; po^? ; [RMW_MB] ; (po \ rmw ; si) ; [M]
| ...
The po \ si;rmw is because ordering is not provided internally of the
rmw, although I suspect that after we added release sequences it could
perhaps be simplified to
let mb =
[M] ; fencerel(Mb) ; [M]
| [M] ; po ; [RMW_MB] ; po^? ; [M]
| [M] ; po^? ; [RMW_MB] ; po ; [M]
| ...
or
let mb =
[M] ; fencerel(Mb) ; [M]
| [M] ; po & (po^? ; [RMW_MB] ; po^?) ; [M]
| ...
(the po & is necessary to avoid trivial hb cycles of an RMW event
happening before itself)
Any interest?
Have fun,
jonas
Am 5/6/2024 um 12:05 PM schrieb Jonas Oberhauser:
>
>
> Am 5/2/2024 um 1:21 AM schrieb Paul E. McKenney:
>> This commit adds four litmus tests showing that a failing cmpxchg()
>> operation is unordered unless followed by an smp_mb__after_atomic()
>> operation.
>
> So far, my understanding was that all RMW operations without suffix
> (xchg(), cmpxchg(), ...) will be interpreted as F[Mb];...;F[Mb].
>
> I guess this shows again how important it is to model these full
> barriers explicitly inside the cat model, instead of relying on implicit
> conversions internal to herd.
>
> I'd like to propose a patch to this effect.
>
> What is the intended behavior of a failed cmpxchg()? Is it the same as a
> relaxed one?
>
> My suggestion would be in the direction of marking read and write events
> of these operations as Mb, and then defining
>
> (* full barrier events that appear in non-failing RMW *)
> let RMW_MB = Mb & (dom(rmw) | range(rmw))
>
>
> let mb =
> [M] ; fencerel(Mb) ; [M]
> | [M] ; (po \ rmw) ; [RMW_MB] ; po^? ; [M]
> | [M] ; po^? ; [RMW_MB] ; (po \ rmw) ; [M]
> | ...
>
> The po \ rmw is because ordering is not provided internally of the
> rmw
(removed the unnecessary si since LKMM is still non-mixed-accesses)
This could also be written with a single rule:
| [M] ; (po \ rmw) & (po^?; [RMW_MB] ; po^?) ; [M]
> I suspect that after we added [rmw] sequences it could
> perhaps be simplified [...]
>
No, my suspicion is wrong - this would incorrectly let full-barrier RMWs
act like strong fences when they appear in an rmw sequence.
if (z==1) || x = 2; || xchg(&y,2) || if (y==2)
x = 1; || y =_rel 1; || || z=1;
right now, we allow x=2 overwriting x=1 (in case the last thread does
not propagate x=2 along with z=1) because on power, the xchg might be
implemented with a sync that doesn't get executed until the very end
of the program run.
Instead of its negative form (everything other than inside the rmw),
it could also be rewritten positively. Here's a somewhat short form:
let mb =
[M] ; fencerel(Mb) ; [M]
(* everything across a full barrier RMW is ordered. This includes up
to one event inside the RMW. *)
| [M] ; po ; [RMW_MB] ; po ; [M]
(* full barrier RMW writes are ordered with everything behind the RMW *)
| [W & RMW_MB] ; po ; [M]
(* full barrier RMW reads are ordered with everything before the RMW *)
| [M] ; po ; [R & RMW_MB]
| ...
Good luck,
jonas
On Mon, May 06, 2024 at 06:30:45PM +0200, Jonas Oberhauser wrote: > Am 5/6/2024 um 12:05 PM schrieb Jonas Oberhauser: > > Am 5/2/2024 um 1:21 AM schrieb Paul E. McKenney: > > > This commit adds four litmus tests showing that a failing cmpxchg() > > > operation is unordered unless followed by an smp_mb__after_atomic() > > > operation. > > > > So far, my understanding was that all RMW operations without suffix > > (xchg(), cmpxchg(), ...) will be interpreted as F[Mb];...;F[Mb]. > > > > I guess this shows again how important it is to model these full > > barriers explicitly inside the cat model, instead of relying on implicit > > conversions internal to herd. > > > > I'd like to propose a patch to this effect. > > > > What is the intended behavior of a failed cmpxchg()? Is it the same as a > > relaxed one? Yes, and unless I am too confused, LKMM currently does implement this. Please let me know if I am missing something. > > My suggestion would be in the direction of marking read and write events > > of these operations as Mb, and then defining > > > > (* full barrier events that appear in non-failing RMW *) > > let RMW_MB = Mb & (dom(rmw) | range(rmw)) > > > > > > let mb = > > [M] ; fencerel(Mb) ; [M] > > | [M] ; (po \ rmw) ; [RMW_MB] ; po^? ; [M] > > | [M] ; po^? ; [RMW_MB] ; (po \ rmw) ; [M] > > | ... > > > > The po \ rmw is because ordering is not provided internally of the rmw > > (removed the unnecessary si since LKMM is still non-mixed-accesses) Addition of mixed-access support would be quite welcome! > This could also be written with a single rule: > > | [M] ; (po \ rmw) & (po^?; [RMW_MB] ; po^?) ; [M] > > > I suspect that after we added [rmw] sequences it could perhaps be > > simplified [...] > > No, my suspicion is wrong - this would incorrectly let full-barrier RMWs > act like strong fences when they appear in an rmw sequence. > > if (z==1) || x = 2; || xchg(&y,2) || if (y==2) > x = 1; || y =_rel 1; || || z=1; > > > right now, we allow x=2 overwriting x=1 (in case the last thread does not > propagate x=2 along with z=1) because on power, the xchg might be > implemented with a sync that doesn't get executed until the very end > of the program run. > > > Instead of its negative form (everything other than inside the rmw), > it could also be rewritten positively. Here's a somewhat short form: > > let mb = > [M] ; fencerel(Mb) ; [M] > (* everything across a full barrier RMW is ordered. This includes up to > one event inside the RMW. *) > | [M] ; po ; [RMW_MB] ; po ; [M] > (* full barrier RMW writes are ordered with everything behind the RMW *) > | [W & RMW_MB] ; po ; [M] > (* full barrier RMW reads are ordered with everything before the RMW *) > | [M] ; po ; [R & RMW_MB] > | ... Does this produce the results expected by the litmus tests in the Linux kernel source tree and also those at https://github.com/paulmckrcu/litmus? Thanx, Paul
On 5/6/2024 8:00 PM, Paul E. McKenney wrote: > On Mon, May 06, 2024 at 06:30:45PM +0200, Jonas Oberhauser wrote: >> Am 5/6/2024 um 12:05 PM schrieb Jonas Oberhauser: >>> Am 5/2/2024 um 1:21 AM schrieb Paul E. McKenney: >>>> This commit adds four litmus tests showing that a failing cmpxchg() >>>> operation is unordered unless followed by an smp_mb__after_atomic() >>>> operation. >>> >>> So far, my understanding was that all RMW operations without suffix >>> (xchg(), cmpxchg(), ...) will be interpreted as F[Mb];...;F[Mb]. >>> >>> I guess this shows again how important it is to model these full >>> barriers explicitly inside the cat model, instead of relying on implicit >>> conversions internal to herd. >>> >>> I'd like to propose a patch to this effect. >>> >>> What is the intended behavior of a failed cmpxchg()? Is it the same as a >>> relaxed one? > > Yes, and unless I am too confused, LKMM currently does implement this. > Please let me know if I am missing something. > >>> My suggestion would be in the direction of marking read and write events >>> of these operations as Mb, and then defining >>> >>> (* full barrier events that appear in non-failing RMW *) >>> let RMW_MB = Mb & (dom(rmw) | range(rmw)) >>> >>> >>> let mb = >>> [M] ; fencerel(Mb) ; [M] >>> | [M] ; (po \ rmw) ; [RMW_MB] ; po^? ; [M] >>> | [M] ; po^? ; [RMW_MB] ; (po \ rmw) ; [M] >>> | ... >>> >>> The po \ rmw is because ordering is not provided internally of the rmw >> >> (removed the unnecessary si since LKMM is still non-mixed-accesses) > > Addition of mixed-access support would be quite welcome! > >> This could also be written with a single rule: >> >> | [M] ; (po \ rmw) & (po^?; [RMW_MB] ; po^?) ; [M] >> >>> I suspect that after we added [rmw] sequences it could perhaps be >>> simplified [...] >> >> No, my suspicion is wrong - this would incorrectly let full-barrier RMWs >> act like strong fences when they appear in an rmw sequence. >> >> if (z==1) || x = 2; || xchg(&y,2) || if (y==2) >> x = 1; || y =_rel 1; || || z=1; >> >> >> right now, we allow x=2 overwriting x=1 (in case the last thread does not >> propagate x=2 along with z=1) because on power, the xchg might be >> implemented with a sync that doesn't get executed until the very end >> of the program run. >> >> >> Instead of its negative form (everything other than inside the rmw), >> it could also be rewritten positively. Here's a somewhat short form: >> >> let mb = >> [M] ; fencerel(Mb) ; [M] >> (* everything across a full barrier RMW is ordered. This includes up to >> one event inside the RMW. *) >> | [M] ; po ; [RMW_MB] ; po ; [M] >> (* full barrier RMW writes are ordered with everything behind the RMW *) >> | [W & RMW_MB] ; po ; [M] >> (* full barrier RMW reads are ordered with everything before the RMW *) >> | [M] ; po ; [R & RMW_MB] >> | ... > > Does this produce the results expected by the litmus tests in the Linux > kernel source tree and also those at https://github.com/paulmckrcu/litmus? > > Thanx, Paul I implemented in the dartagnan tool the changes proposed by Jonas (i.e. changing the mb definition in the cat model and removing the fences that were added programmatically). I run this using the ~5K litmus test I have (it should include everything from the source tree + the non-LISA ones from your repo). I also checked with the version of qspinlock discussed in [1]. I do get the expected results. Hernan [1] https://lkml.org/lkml/2022/8/26/597
On Wed, May 15, 2024 at 08:44:30AM +0200, Hernan Ponce de Leon wrote: > On 5/6/2024 8:00 PM, Paul E. McKenney wrote: > > On Mon, May 06, 2024 at 06:30:45PM +0200, Jonas Oberhauser wrote: > > > Am 5/6/2024 um 12:05 PM schrieb Jonas Oberhauser: > > > > Am 5/2/2024 um 1:21 AM schrieb Paul E. McKenney: > > > > > This commit adds four litmus tests showing that a failing cmpxchg() > > > > > operation is unordered unless followed by an smp_mb__after_atomic() > > > > > operation. > > > > > > > > So far, my understanding was that all RMW operations without suffix > > > > (xchg(), cmpxchg(), ...) will be interpreted as F[Mb];...;F[Mb]. > > > > > > > > I guess this shows again how important it is to model these full > > > > barriers explicitly inside the cat model, instead of relying on implicit > > > > conversions internal to herd. > > > > > > > > I'd like to propose a patch to this effect. > > > > > > > > What is the intended behavior of a failed cmpxchg()? Is it the same as a > > > > relaxed one? > > > > Yes, and unless I am too confused, LKMM currently does implement this. > > Please let me know if I am missing something. > > > > > > My suggestion would be in the direction of marking read and write events > > > > of these operations as Mb, and then defining > > > > > > > > (* full barrier events that appear in non-failing RMW *) > > > > let RMW_MB = Mb & (dom(rmw) | range(rmw)) > > > > > > > > > > > > let mb = > > > > [M] ; fencerel(Mb) ; [M] > > > > | [M] ; (po \ rmw) ; [RMW_MB] ; po^? ; [M] > > > > | [M] ; po^? ; [RMW_MB] ; (po \ rmw) ; [M] > > > > | ... > > > > > > > > The po \ rmw is because ordering is not provided internally of the rmw > > > > > > (removed the unnecessary si since LKMM is still non-mixed-accesses) > > > > Addition of mixed-access support would be quite welcome! > > > > > This could also be written with a single rule: > > > > > > | [M] ; (po \ rmw) & (po^?; [RMW_MB] ; po^?) ; [M] > > > > > > > I suspect that after we added [rmw] sequences it could perhaps be > > > > simplified [...] > > > > > > No, my suspicion is wrong - this would incorrectly let full-barrier RMWs > > > act like strong fences when they appear in an rmw sequence. > > > > > > if (z==1) || x = 2; || xchg(&y,2) || if (y==2) > > > x = 1; || y =_rel 1; || || z=1; > > > > > > > > > right now, we allow x=2 overwriting x=1 (in case the last thread does not > > > propagate x=2 along with z=1) because on power, the xchg might be > > > implemented with a sync that doesn't get executed until the very end > > > of the program run. > > > > > > > > > Instead of its negative form (everything other than inside the rmw), > > > it could also be rewritten positively. Here's a somewhat short form: > > > > > > let mb = > > > [M] ; fencerel(Mb) ; [M] > > > (* everything across a full barrier RMW is ordered. This includes up to > > > one event inside the RMW. *) > > > | [M] ; po ; [RMW_MB] ; po ; [M] > > > (* full barrier RMW writes are ordered with everything behind the RMW *) > > > | [W & RMW_MB] ; po ; [M] > > > (* full barrier RMW reads are ordered with everything before the RMW *) > > > | [M] ; po ; [R & RMW_MB] > > > | ... > > > > Does this produce the results expected by the litmus tests in the Linux > > kernel source tree and also those at https://github.com/paulmckrcu/litmus? > > > > Thanx, Paul > > I implemented in the dartagnan tool the changes proposed by Jonas (i.e. > changing the mb definition in the cat model and removing the fences that > were added programmatically). > > I run this using the ~5K litmus test I have (it should include everything > from the source tree + the non-LISA ones from your repo). I also checked > with the version of qspinlock discussed in [1]. > > I do get the expected results. Thank you very much, Hernan!!! And good show, Puranjay!!! Thanx, Paul > Hernan > > [1] https://lkml.org/lkml/2022/8/26/597 >
Am 5/6/2024 um 8:00 PM schrieb Paul E. McKenney: > On Mon, May 06, 2024 at 06:30:45PM +0200, Jonas Oberhauser wrote: >> Am 5/6/2024 um 12:05 PM schrieb Jonas Oberhauser: >>> Am 5/2/2024 um 1:21 AM schrieb Paul E. McKenney: >>>> This commit adds four litmus tests showing that a failing cmpxchg() >>>> operation is unordered unless followed by an smp_mb__after_atomic() >>>> operation. >>> >>> So far, my understanding was that all RMW operations without suffix >>> (xchg(), cmpxchg(), ...) will be interpreted as F[Mb];...;F[Mb]. >>> >>> I guess this shows again how important it is to model these full >>> barriers explicitly inside the cat model, instead of relying on implicit >>> conversions internal to herd. >>> >>> I'd like to propose a patch to this effect. >>> >>> What is the intended behavior of a failed cmpxchg()? Is it the same as a >>> relaxed one? > > Yes, and unless I am too confused, LKMM currently does implement this. > Please let me know if I am missing something. At least the herd and Dat3M implementations seem to be doing that, at least according to this thread sent to me by Hernan. https://github.com/herd/herdtools7/issues/384#issue-1243049709 > >>> My suggestion would be in the direction of marking read and write events >>> of these operations as Mb, and then defining >>> >>> (* full barrier events that appear in non-failing RMW *) >>> let RMW_MB = Mb & (dom(rmw) | range(rmw)) >>> >>> >>> let mb = >>> [M] ; fencerel(Mb) ; [M] >>> | [M] ; (po \ rmw) ; [RMW_MB] ; po^? ; [M] >>> | [M] ; po^? ; [RMW_MB] ; (po \ rmw) ; [M] >>> | ... >>> >>> The po \ rmw is because ordering is not provided internally of the rmw >> >> (removed the unnecessary si since LKMM is still non-mixed-accesses) > > Addition of mixed-access support would be quite welcome! :P >> This could also be written with a single rule: >> >> | [M] ; (po \ rmw) & (po^?; [RMW_MB] ; po^?) ; [M] >> >>> I suspect that after we added [rmw] sequences it could perhaps be >>> simplified [...] >> >> No, my suspicion is wrong - this would incorrectly let full-barrier RMWs >> act like strong fences when they appear in an rmw sequence. >> >> if (z==1) || x = 2; || xchg(&y,2) || if (y==2) >> x = 1; || y =_rel 1; || || z=1; >> >> >> right now, we allow x=2 overwriting x=1 (in case the last thread does not >> propagate x=2 along with z=1) because on power, the xchg might be >> implemented with a sync that doesn't get executed until the very end >> of the program run. >> >> >> Instead of its negative form (everything other than inside the rmw), >> it could also be rewritten positively. Here's a somewhat short form: >> >> let mb = >> [M] ; fencerel(Mb) ; [M] >> (* everything across a full barrier RMW is ordered. This includes up to >> one event inside the RMW. *) >> | [M] ; po ; [RMW_MB] ; po ; [M] >> (* full barrier RMW writes are ordered with everything behind the RMW *) >> | [W & RMW_MB] ; po ; [M] >> (* full barrier RMW reads are ordered with everything before the RMW *) >> | [M] ; po ; [R & RMW_MB] >> | ... > > Does this produce the results expected by the litmus tests in the Linux > kernel source tree and also those at https://github.com/paulmckrcu/litmus? I suspect that it doesn't work out of the box because of some of the implicit magic herd is doing that could get in the way, so I'd need some help from Luc to actually turn this into a patch that can be tested. (or at least confirmation that just by changing a few things in the .def & .bell files we can sidestep the implicit behaviors). But at least in my proofs it seems to be equivalent. (there may still be differences in opinion on what some herd things mean, so what I/Viktor have formalized as the semantics of the herd model may not be exactly the behavior of LKMM in herd. hence testing is necessary too as a sanity check) best wishes, jonas
On Mon, May 06, 2024 at 11:00:42AM -0700, Paul E. McKenney wrote: > On Mon, May 06, 2024 at 06:30:45PM +0200, Jonas Oberhauser wrote: > > Am 5/6/2024 um 12:05 PM schrieb Jonas Oberhauser: > > > Am 5/2/2024 um 1:21 AM schrieb Paul E. McKenney: > > > > This commit adds four litmus tests showing that a failing cmpxchg() > > > > operation is unordered unless followed by an smp_mb__after_atomic() > > > > operation. > > > > > > So far, my understanding was that all RMW operations without suffix > > > (xchg(), cmpxchg(), ...) will be interpreted as F[Mb];...;F[Mb]. It's more accurate to say that RMW operations without a suffix that return a value will be interpreted that way. So for example, atomic_inc() doesn't imply any ordering, because it doesn't return a value. > > > barriers explicitly inside the cat model, instead of relying on implicit > > > conversions internal to herd. Don't the annotations in linux-kernel.def and linux-kernel.bell (like "noreturn") already make this explicit? I guess the part that is still implicit is that herd7 doesn't regard failed RMW operations as actual RMWs (they don't have a store part). Alan
Am 5/6/2024 um 9:21 PM schrieb Alan Stern: > On Mon, May 06, 2024 at 11:00:42AM -0700, Paul E. McKenney wrote: >> On Mon, May 06, 2024 at 06:30:45PM +0200, Jonas Oberhauser wrote: >>> Am 5/6/2024 um 12:05 PM schrieb Jonas Oberhauser: >>>> Am 5/2/2024 um 1:21 AM schrieb Paul E. McKenney: >>>>> This commit adds four litmus tests showing that a failing cmpxchg() >>>>> operation is unordered unless followed by an smp_mb__after_atomic() >>>>> operation. >>>> >>>> So far, my understanding was that all RMW operations without suffix >>>> (xchg(), cmpxchg(), ...) will be interpreted as F[Mb];...;F[Mb]. > > It's more accurate to say that RMW operations without a suffix that > return a value will be interpreted that way. So for example, > atomic_inc() doesn't imply any ordering, because it doesn't return a > value. > I see, thanks. >>>> barriers explicitlyinside the cat model, instead of relying on implicit >>>> conversions internal to herd. > > Don't the annotations in linux-kernel.def and linux-kernel.bell (like > "noreturn") already make this explicit? Not that I'm aware. All I can see there is that according to .bell RMW don't have an mb mode, but according to .def they do. How this mb disappears between parsing the code (.def) and interpreting it (.bell) is totally implicit. Including how noreturn affects this disappeareance. In fact most tool developers that support LKMM (Viktor, Hernan, and Luc) were at least once confused about it. And I think they read those files more carefully than I. https://github.com/herd/herdtools7/issues/384#issuecomment-1132859904 Note that while there's no explicit annotation of noreturn in the .def file, at least I can guess based on context that it should be annotated on all the functions that don't have _return and for which also a version with _return exists. have fun, jonas
> > Don't the annotations in linux-kernel.def and linux-kernel.bell (like
> > "noreturn") already make this explicit?
>
> Not that I'm aware. All I can see there is that according to .bell RMW don't
> have an mb mode, but according to .def they do.
>
> How this mb disappears between parsing the code (.def) and interpreting it
> (.bell) is totally implicit. Including how noreturn affects this
> disappeareance.
IIRC, that's more or less implicit ;-) in the herd macros of the .def file;
for example,
(noreturn)
- atomic_add(V,X) { __atomic_op(X,+,V); }
Generates a pair of R*[noreturn] and W*[once] events
(w/ return)
- atomic_add_return_relaxed(V,X) __atomic_op_return{once}(X,+,V)
Generates a pair of R*[once] and W*[once] events
- atomic_add_return_acquire(V,X) __atomic_op_return{acquire}(X,+,V)
Generates a pair of R*[acquire] and W*[once] events
- atomic_add_return_release(V,X) __atomic_op_return{acquire}(X,+,V)
Generates a pair of R*[once] and W*[release] events
- atomic_add_return(V,X) __atomic_op_return{mb}(X,+,V)
Generates a pair of R*[once] and W*[once] events plus two F[mb] events
(possibly failing)
- atomic_cmpxchg_relaxed(X,V,W) __cmpxchg{once}(X,V,W)
Generates a pair of R*[once] and W*[once] events if successful;
a single R*[once] event otherwise.
- atomic_cmpxchg_acquire(X,V,W) __cmpxchg{acquire}(X,V,W)
Generates a pair of R*[acquire] and W*[once] events if successful;
a single R*[once] event otherwise.
- atomic_cmpxchg_release(X,V,W) __cmpxchg{release}(X,V,W)
Generates a pair of R*[once] and W*[release] events if successful;
a single R*[once] event otherwise.
- atomic_cmpxchg(X,V,W) __cmpxchg{mb}(X,V,W)
Generates a pair of R*[once] and W*[once] events plus two F[mb] events
if successful; a single R*[once] event otherwise.
The line
instructions RMW[{'once,'acquire,'release}]
in the .bell file seems to be effectively redundant (perhaps a LISA backward
-compatibility?): consider
$ cat rmw.litmus
C rmw
{}
P0(atomic_t *x)
{
int r0;
r0 = atomic_inc_return_release(x);
}
exists (x=1)
Some experiments:
- Upon removing 'release from "(instructions) RMW"
$ herd7 -conf linux-kernel.cfg rmw.litmus
Test rmw Allowed
States 1
[x]=1;
Ok
Witnesses
Positive: 1 Negative: 0
Condition exists ([x]=1)
Observation rmw Always 1 0
Time rmw 0.00
Hash=3a2dd354c250206d993d31f05f3f595c
- Upon restoring 'release in RMW and removing it from W
$ herd7 -conf linux-kernel.cfg rmw.litmus
Test rmw Allowed
States 1
[x]=1;
Ok
Witnesses
Positive: 1 Negative: 0
Condition exists ([x]=1)
Observation rmw Always 1 0
Time rmw 0.00
Hash=3a2dd354c250206d993d31f05f3f595c
- Upon removing 'release from both W and RMW
$ herd7 -conf linux-kernel.cfg rmw.litmus (herd complains... )
File "./linux-kernel.bell", line 76, characters 32-39: unbound var: Release
But I'd have to defer to Luc/Jade about herd internals and/or recommended style
on this matter.
Andrea
© 2016 - 2025 Red Hat, Inc.