[PATCH memory-model 2/4] Documentation/litmus-tests: Demonstrate unordered failing cmpxchg

Paul E. McKenney posted 3 patches 1 year, 7 months ago
[PATCH memory-model 2/4] Documentation/litmus-tests: Demonstrate unordered failing cmpxchg
Posted by Paul E. McKenney 1 year, 7 months ago
This commit adds four litmus tests showing that a failing cmpxchg()
operation is unordered unless followed by an smp_mb__after_atomic()
operation.

Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jade Alglave <j.alglave@ucl.ac.uk>
Cc: Luc Maranget <luc.maranget@inria.fr>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Akira Yokosawa <akiyks@gmail.com>
Cc: Daniel Lustig <dlustig@nvidia.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: <linux-arch@vger.kernel.org>
Cc: <linux-doc@vger.kernel.org>
---
 Documentation/litmus-tests/README             | 16 +++++++++
 .../atomic/cmpxchg-fail-ordered-1.litmus      | 34 +++++++++++++++++++
 .../atomic/cmpxchg-fail-ordered-2.litmus      | 30 ++++++++++++++++
 .../atomic/cmpxchg-fail-unordered-1.litmus    | 33 ++++++++++++++++++
 .../atomic/cmpxchg-fail-unordered-2.litmus    | 30 ++++++++++++++++
 5 files changed, 143 insertions(+)
 create mode 100644 Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-1.litmus
 create mode 100644 Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-2.litmus
 create mode 100644 Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-1.litmus
 create mode 100644 Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-2.litmus

diff --git a/Documentation/litmus-tests/README b/Documentation/litmus-tests/README
index 26ca56df02125..6c666f3422ea3 100644
--- a/Documentation/litmus-tests/README
+++ b/Documentation/litmus-tests/README
@@ -21,6 +21,22 @@ Atomic-RMW-ops-are-atomic-WRT-atomic_set.litmus
     Test that atomic_set() cannot break the atomicity of atomic RMWs.
     NOTE: Require herd7 7.56 or later which supports "(void)expr".
 
+cmpxchg-fail-ordered-1.litmus
+    Demonstrate that a failing cmpxchg() operation acts as a full barrier
+    when followed by smp_mb__after_atomic().
+
+cmpxchg-fail-ordered-2.litmus
+    Demonstrate that a failing cmpxchg() operation acts as an acquire
+    operation when followed by smp_mb__after_atomic().
+
+cmpxchg-fail-unordered-1.litmus
+    Demonstrate that a failing cmpxchg() operation does not act as a
+    full barrier.
+
+cmpxchg-fail-unordered-2.litmus
+    Demonstrate that a failing cmpxchg() operation does not act as an
+    acquire operation.
+
 
 locking (/locking directory)
 ----------------------------
diff --git a/Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-1.litmus b/Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-1.litmus
new file mode 100644
index 0000000000000..3df1d140b189b
--- /dev/null
+++ b/Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-1.litmus
@@ -0,0 +1,34 @@
+C cmpxchg-fail-ordered-1
+
+(*
+ * Result: Never
+ *
+ * Demonstrate that a failing cmpxchg() operation will act as a full
+ * barrier when followed by smp_mb__after_atomic().
+ *)
+
+{}
+
+P0(int *x, int *y, int *z)
+{
+	int r0;
+	int r1;
+
+	WRITE_ONCE(*x, 1);
+	r1 = cmpxchg(z, 1, 0);
+	smp_mb__after_atomic();
+	r0 = READ_ONCE(*y);
+}
+
+P1(int *x, int *y, int *z)
+{
+	int r0;
+
+	WRITE_ONCE(*y, 1);
+	r1 = cmpxchg(z, 1, 0);
+	smp_mb__after_atomic();
+	r0 = READ_ONCE(*x);
+}
+
+locations[0:r1;1:r1]
+exists (0:r0=0 /\ 1:r0=0)
diff --git a/Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-2.litmus b/Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-2.litmus
new file mode 100644
index 0000000000000..54146044a16f6
--- /dev/null
+++ b/Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-2.litmus
@@ -0,0 +1,30 @@
+C cmpxchg-fail-ordered-2
+
+(*
+ * Result: Never
+ *
+ * Demonstrate use of smp_mb__after_atomic() to make a failing cmpxchg
+ * operation have acquire ordering.
+ *)
+
+{}
+
+P0(int *x, int *y)
+{
+	int r0;
+	int r1;
+
+	WRITE_ONCE(*x, 1);
+	r1 = cmpxchg(y, 0, 1);
+}
+
+P1(int *x, int *y)
+{
+	int r0;
+
+	r1 = cmpxchg(y, 0, 1);
+	smp_mb__after_atomic();
+	r2 = READ_ONCE(*x);
+}
+
+exists (0:r1=0 /\ 1:r1=1 /\ 1:r2=0)
diff --git a/Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-1.litmus b/Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-1.litmus
new file mode 100644
index 0000000000000..a727ce23b1a6e
--- /dev/null
+++ b/Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-1.litmus
@@ -0,0 +1,33 @@
+C cmpxchg-fail-unordered-1
+
+(*
+ * Result: Sometimes
+ *
+ * Demonstrate that a failing cmpxchg() operation does not act as a
+ * full barrier.  (In contrast, a successful cmpxchg() does act as a
+ * full barrier.)
+ *)
+
+{}
+
+P0(int *x, int *y, int *z)
+{
+	int r0;
+	int r1;
+
+	WRITE_ONCE(*x, 1);
+	r1 = cmpxchg(z, 1, 0);
+	r0 = READ_ONCE(*y);
+}
+
+P1(int *x, int *y, int *z)
+{
+	int r0;
+
+	WRITE_ONCE(*y, 1);
+	r1 = cmpxchg(z, 1, 0);
+	r0 = READ_ONCE(*x);
+}
+
+locations[0:r1;1:r1]
+exists (0:r0=0 /\ 1:r0=0)
diff --git a/Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-2.litmus b/Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-2.litmus
new file mode 100644
index 0000000000000..a245bac55b578
--- /dev/null
+++ b/Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-2.litmus
@@ -0,0 +1,30 @@
+C cmpxchg-fail-unordered-2
+
+(*
+ * Result: Sometimes
+ *
+ * Demonstrate that a failing cmpxchg() operation does not act as either
+ * an acquire release operation.  (In contrast, a successful cmpxchg()
+ * does act as both an acquire and a release operation.)
+ *)
+
+{}
+
+P0(int *x, int *y)
+{
+	int r0;
+	int r1;
+
+	WRITE_ONCE(*x, 1);
+	r1 = cmpxchg(y, 0, 1);
+}
+
+P1(int *x, int *y)
+{
+	int r0;
+
+	r1 = cmpxchg(y, 0, 1);
+	r2 = READ_ONCE(*x);
+}
+
+exists (0:r1=0 /\ 1:r1=1 /\ 1:r2=0)
-- 
2.40.1
Re: [PATCH memory-model 2/4] Documentation/litmus-tests: Demonstrate unordered failing cmpxchg
Posted by Jonas Oberhauser 1 year, 7 months ago

Am 5/2/2024 um 1:21 AM schrieb Paul E. McKenney:
> This commit adds four litmus tests showing that a failing cmpxchg()
> operation is unordered unless followed by an smp_mb__after_atomic()
> operation.

So far, my understanding was that all RMW operations without suffix 
(xchg(), cmpxchg(), ...) will be interpreted as F[Mb];...;F[Mb].

I guess this shows again how important it is to model these full 
barriers explicitly inside the cat model, instead of relying on implicit 
conversions internal to herd.

I'd like to propose a patch to this effect.

What is the intended behavior of a failed cmpxchg()? Is it the same as a 
relaxed one?

My suggestion would be in the direction of marking read and write events 
of these operations as Mb, and then defining

(* full barrier events that appear in non-failing RMW *)
let RMW_MB = Mb & (dom(rmw) | range(rmw))


let mb =
     [M] ; fencerel(Mb) ; [M]
   | [M] ; (po \ si ; rmw) ; [RMW_MB] ; po^? ; [M]
   | [M] ; po^? ; [RMW_MB] ; (po \ rmw ; si) ; [M]
   | ...

The po \ si;rmw is because ordering is not provided internally of the 
rmw, although I suspect that after we added release sequences it could 
perhaps be simplified to


let mb =
     [M] ; fencerel(Mb) ; [M]
   | [M] ; po ; [RMW_MB] ; po^? ; [M]
   | [M] ; po^? ; [RMW_MB] ; po ; [M]
   | ...

or

let mb =
     [M] ; fencerel(Mb) ; [M]
   | [M] ; po & (po^? ; [RMW_MB] ; po^?) ; [M]
   | ...

(the po & is necessary to avoid trivial hb cycles of an RMW event 
happening before itself)


Any interest?

Have fun,
jonas
Re: [PATCH memory-model 2/4] Documentation/litmus-tests: Demonstrate unordered failing cmpxchg
Posted by Jonas Oberhauser 1 year, 7 months ago

Am 5/6/2024 um 12:05 PM schrieb Jonas Oberhauser:
> 
> 
> Am 5/2/2024 um 1:21 AM schrieb Paul E. McKenney:
>> This commit adds four litmus tests showing that a failing cmpxchg()
>> operation is unordered unless followed by an smp_mb__after_atomic()
>> operation.
> 
> So far, my understanding was that all RMW operations without suffix 
> (xchg(), cmpxchg(), ...) will be interpreted as F[Mb];...;F[Mb].
> 
> I guess this shows again how important it is to model these full 
> barriers explicitly inside the cat model, instead of relying on implicit 
> conversions internal to herd.
> 
> I'd like to propose a patch to this effect.
> 
> What is the intended behavior of a failed cmpxchg()? Is it the same as a 
> relaxed one?
> 
> My suggestion would be in the direction of marking read and write events 
> of these operations as Mb, and then defining
> 
> (* full barrier events that appear in non-failing RMW *)
> let RMW_MB = Mb & (dom(rmw) | range(rmw))
> 
> 
> let mb =
>      [M] ; fencerel(Mb) ; [M]
>    | [M] ; (po \ rmw) ; [RMW_MB] ; po^? ; [M]
>    | [M] ; po^? ; [RMW_MB] ; (po \ rmw) ; [M]
>    | ...
> 
> The po \ rmw is because ordering is not provided internally of the 
> rmw

(removed the unnecessary si since LKMM is still non-mixed-accesses)


This could also be written with a single rule:

      | [M] ; (po \ rmw) & (po^?; [RMW_MB] ; po^?) ; [M]


> I suspect that after we added [rmw] sequences it could 
> perhaps be simplified [...]
> 

No, my suspicion is wrong - this would incorrectly let full-barrier RMWs
act like strong fences when they appear in an rmw sequence.

  if (z==1)  ||  x = 2;     ||  xchg(&y,2)  || if (y==2)
    x = 1;   ||  y =_rel 1; ||              ||    z=1;


right now, we allow x=2 overwriting x=1 (in case the last thread does 
not propagate x=2 along with z=1) because on power, the xchg might be
implemented with a sync that doesn't get executed until the very end
of the program run.


Instead of its negative form (everything other than inside the rmw),
it could also be rewritten positively. Here's a somewhat short form:

let mb =
      [M] ; fencerel(Mb) ; [M]
    (* everything across a full barrier RMW is ordered. This includes up 
to one event inside the RMW. *)
    | [M] ; po ; [RMW_MB] ; po ; [M]
    (* full barrier RMW writes are ordered with everything behind the RMW *)
    | [W & RMW_MB] ; po ; [M]
    (* full barrier RMW reads are ordered with everything before the RMW *)
    | [M] ; po ; [R & RMW_MB]
    | ...



Good luck,
jonas

Re: [PATCH memory-model 2/4] Documentation/litmus-tests: Demonstrate unordered failing cmpxchg
Posted by Paul E. McKenney 1 year, 7 months ago
On Mon, May 06, 2024 at 06:30:45PM +0200, Jonas Oberhauser wrote:
> Am 5/6/2024 um 12:05 PM schrieb Jonas Oberhauser:
> > Am 5/2/2024 um 1:21 AM schrieb Paul E. McKenney:
> > > This commit adds four litmus tests showing that a failing cmpxchg()
> > > operation is unordered unless followed by an smp_mb__after_atomic()
> > > operation.
> > 
> > So far, my understanding was that all RMW operations without suffix
> > (xchg(), cmpxchg(), ...) will be interpreted as F[Mb];...;F[Mb].
> > 
> > I guess this shows again how important it is to model these full
> > barriers explicitly inside the cat model, instead of relying on implicit
> > conversions internal to herd.
> > 
> > I'd like to propose a patch to this effect.
> > 
> > What is the intended behavior of a failed cmpxchg()? Is it the same as a
> > relaxed one?

Yes, and unless I am too confused, LKMM currently does implement this.
Please let me know if I am missing something.

> > My suggestion would be in the direction of marking read and write events
> > of these operations as Mb, and then defining
> > 
> > (* full barrier events that appear in non-failing RMW *)
> > let RMW_MB = Mb & (dom(rmw) | range(rmw))
> > 
> > 
> > let mb =
> >      [M] ; fencerel(Mb) ; [M]
> >    | [M] ; (po \ rmw) ; [RMW_MB] ; po^? ; [M]
> >    | [M] ; po^? ; [RMW_MB] ; (po \ rmw) ; [M]
> >    | ...
> > 
> > The po \ rmw is because ordering is not provided internally of the rmw
> 
> (removed the unnecessary si since LKMM is still non-mixed-accesses)

Addition of mixed-access support would be quite welcome!

> This could also be written with a single rule:
> 
>      | [M] ; (po \ rmw) & (po^?; [RMW_MB] ; po^?) ; [M]
> 
> > I suspect that after we added [rmw] sequences it could perhaps be
> > simplified [...]
> 
> No, my suspicion is wrong - this would incorrectly let full-barrier RMWs
> act like strong fences when they appear in an rmw sequence.
> 
>  if (z==1)  ||  x = 2;     ||  xchg(&y,2)  || if (y==2)
>    x = 1;   ||  y =_rel 1; ||              ||    z=1;
> 
> 
> right now, we allow x=2 overwriting x=1 (in case the last thread does not
> propagate x=2 along with z=1) because on power, the xchg might be
> implemented with a sync that doesn't get executed until the very end
> of the program run.
> 
> 
> Instead of its negative form (everything other than inside the rmw),
> it could also be rewritten positively. Here's a somewhat short form:
> 
> let mb =
>      [M] ; fencerel(Mb) ; [M]
>    (* everything across a full barrier RMW is ordered. This includes up to
> one event inside the RMW. *)
>    | [M] ; po ; [RMW_MB] ; po ; [M]
>    (* full barrier RMW writes are ordered with everything behind the RMW *)
>    | [W & RMW_MB] ; po ; [M]
>    (* full barrier RMW reads are ordered with everything before the RMW *)
>    | [M] ; po ; [R & RMW_MB]
>    | ...

Does this produce the results expected by the litmus tests in the Linux
kernel source tree and also those at https://github.com/paulmckrcu/litmus?

							Thanx, Paul
Re: [PATCH memory-model 2/4] Documentation/litmus-tests: Demonstrate unordered failing cmpxchg
Posted by Hernan Ponce de Leon 1 year, 7 months ago
On 5/6/2024 8:00 PM, Paul E. McKenney wrote:
> On Mon, May 06, 2024 at 06:30:45PM +0200, Jonas Oberhauser wrote:
>> Am 5/6/2024 um 12:05 PM schrieb Jonas Oberhauser:
>>> Am 5/2/2024 um 1:21 AM schrieb Paul E. McKenney:
>>>> This commit adds four litmus tests showing that a failing cmpxchg()
>>>> operation is unordered unless followed by an smp_mb__after_atomic()
>>>> operation.
>>>
>>> So far, my understanding was that all RMW operations without suffix
>>> (xchg(), cmpxchg(), ...) will be interpreted as F[Mb];...;F[Mb].
>>>
>>> I guess this shows again how important it is to model these full
>>> barriers explicitly inside the cat model, instead of relying on implicit
>>> conversions internal to herd.
>>>
>>> I'd like to propose a patch to this effect.
>>>
>>> What is the intended behavior of a failed cmpxchg()? Is it the same as a
>>> relaxed one?
> 
> Yes, and unless I am too confused, LKMM currently does implement this.
> Please let me know if I am missing something.
> 
>>> My suggestion would be in the direction of marking read and write events
>>> of these operations as Mb, and then defining
>>>
>>> (* full barrier events that appear in non-failing RMW *)
>>> let RMW_MB = Mb & (dom(rmw) | range(rmw))
>>>
>>>
>>> let mb =
>>>       [M] ; fencerel(Mb) ; [M]
>>>     | [M] ; (po \ rmw) ; [RMW_MB] ; po^? ; [M]
>>>     | [M] ; po^? ; [RMW_MB] ; (po \ rmw) ; [M]
>>>     | ...
>>>
>>> The po \ rmw is because ordering is not provided internally of the rmw
>>
>> (removed the unnecessary si since LKMM is still non-mixed-accesses)
> 
> Addition of mixed-access support would be quite welcome!
> 
>> This could also be written with a single rule:
>>
>>       | [M] ; (po \ rmw) & (po^?; [RMW_MB] ; po^?) ; [M]
>>
>>> I suspect that after we added [rmw] sequences it could perhaps be
>>> simplified [...]
>>
>> No, my suspicion is wrong - this would incorrectly let full-barrier RMWs
>> act like strong fences when they appear in an rmw sequence.
>>
>>   if (z==1)  ||  x = 2;     ||  xchg(&y,2)  || if (y==2)
>>     x = 1;   ||  y =_rel 1; ||              ||    z=1;
>>
>>
>> right now, we allow x=2 overwriting x=1 (in case the last thread does not
>> propagate x=2 along with z=1) because on power, the xchg might be
>> implemented with a sync that doesn't get executed until the very end
>> of the program run.
>>
>>
>> Instead of its negative form (everything other than inside the rmw),
>> it could also be rewritten positively. Here's a somewhat short form:
>>
>> let mb =
>>       [M] ; fencerel(Mb) ; [M]
>>     (* everything across a full barrier RMW is ordered. This includes up to
>> one event inside the RMW. *)
>>     | [M] ; po ; [RMW_MB] ; po ; [M]
>>     (* full barrier RMW writes are ordered with everything behind the RMW *)
>>     | [W & RMW_MB] ; po ; [M]
>>     (* full barrier RMW reads are ordered with everything before the RMW *)
>>     | [M] ; po ; [R & RMW_MB]
>>     | ...
> 
> Does this produce the results expected by the litmus tests in the Linux
> kernel source tree and also those at https://github.com/paulmckrcu/litmus?
> 
> 							Thanx, Paul

I implemented in the dartagnan tool the changes proposed by Jonas (i.e. 
changing the mb definition in the cat model and removing the fences that 
were added programmatically).

I run this using the ~5K litmus test I have (it should include 
everything from the source tree + the non-LISA ones from your repo). I 
also checked with the version of qspinlock discussed in [1].

I do get the expected results.

Hernan

[1] https://lkml.org/lkml/2022/8/26/597

Re: [PATCH memory-model 2/4] Documentation/litmus-tests: Demonstrate unordered failing cmpxchg
Posted by Paul E. McKenney 1 year, 7 months ago
On Wed, May 15, 2024 at 08:44:30AM +0200, Hernan Ponce de Leon wrote:
> On 5/6/2024 8:00 PM, Paul E. McKenney wrote:
> > On Mon, May 06, 2024 at 06:30:45PM +0200, Jonas Oberhauser wrote:
> > > Am 5/6/2024 um 12:05 PM schrieb Jonas Oberhauser:
> > > > Am 5/2/2024 um 1:21 AM schrieb Paul E. McKenney:
> > > > > This commit adds four litmus tests showing that a failing cmpxchg()
> > > > > operation is unordered unless followed by an smp_mb__after_atomic()
> > > > > operation.
> > > > 
> > > > So far, my understanding was that all RMW operations without suffix
> > > > (xchg(), cmpxchg(), ...) will be interpreted as F[Mb];...;F[Mb].
> > > > 
> > > > I guess this shows again how important it is to model these full
> > > > barriers explicitly inside the cat model, instead of relying on implicit
> > > > conversions internal to herd.
> > > > 
> > > > I'd like to propose a patch to this effect.
> > > > 
> > > > What is the intended behavior of a failed cmpxchg()? Is it the same as a
> > > > relaxed one?
> > 
> > Yes, and unless I am too confused, LKMM currently does implement this.
> > Please let me know if I am missing something.
> > 
> > > > My suggestion would be in the direction of marking read and write events
> > > > of these operations as Mb, and then defining
> > > > 
> > > > (* full barrier events that appear in non-failing RMW *)
> > > > let RMW_MB = Mb & (dom(rmw) | range(rmw))
> > > > 
> > > > 
> > > > let mb =
> > > >       [M] ; fencerel(Mb) ; [M]
> > > >     | [M] ; (po \ rmw) ; [RMW_MB] ; po^? ; [M]
> > > >     | [M] ; po^? ; [RMW_MB] ; (po \ rmw) ; [M]
> > > >     | ...
> > > > 
> > > > The po \ rmw is because ordering is not provided internally of the rmw
> > > 
> > > (removed the unnecessary si since LKMM is still non-mixed-accesses)
> > 
> > Addition of mixed-access support would be quite welcome!
> > 
> > > This could also be written with a single rule:
> > > 
> > >       | [M] ; (po \ rmw) & (po^?; [RMW_MB] ; po^?) ; [M]
> > > 
> > > > I suspect that after we added [rmw] sequences it could perhaps be
> > > > simplified [...]
> > > 
> > > No, my suspicion is wrong - this would incorrectly let full-barrier RMWs
> > > act like strong fences when they appear in an rmw sequence.
> > > 
> > >   if (z==1)  ||  x = 2;     ||  xchg(&y,2)  || if (y==2)
> > >     x = 1;   ||  y =_rel 1; ||              ||    z=1;
> > > 
> > > 
> > > right now, we allow x=2 overwriting x=1 (in case the last thread does not
> > > propagate x=2 along with z=1) because on power, the xchg might be
> > > implemented with a sync that doesn't get executed until the very end
> > > of the program run.
> > > 
> > > 
> > > Instead of its negative form (everything other than inside the rmw),
> > > it could also be rewritten positively. Here's a somewhat short form:
> > > 
> > > let mb =
> > >       [M] ; fencerel(Mb) ; [M]
> > >     (* everything across a full barrier RMW is ordered. This includes up to
> > > one event inside the RMW. *)
> > >     | [M] ; po ; [RMW_MB] ; po ; [M]
> > >     (* full barrier RMW writes are ordered with everything behind the RMW *)
> > >     | [W & RMW_MB] ; po ; [M]
> > >     (* full barrier RMW reads are ordered with everything before the RMW *)
> > >     | [M] ; po ; [R & RMW_MB]
> > >     | ...
> > 
> > Does this produce the results expected by the litmus tests in the Linux
> > kernel source tree and also those at https://github.com/paulmckrcu/litmus?
> > 
> > 							Thanx, Paul
> 
> I implemented in the dartagnan tool the changes proposed by Jonas (i.e.
> changing the mb definition in the cat model and removing the fences that
> were added programmatically).
> 
> I run this using the ~5K litmus test I have (it should include everything
> from the source tree + the non-LISA ones from your repo). I also checked
> with the version of qspinlock discussed in [1].
> 
> I do get the expected results.

Thank you very much, Hernan!!!  And good show, Puranjay!!!

							Thanx, Paul

> Hernan
> 
> [1] https://lkml.org/lkml/2022/8/26/597
> 
Re: [PATCH memory-model 2/4] Documentation/litmus-tests: Demonstrate unordered failing cmpxchg
Posted by Jonas Oberhauser 1 year, 7 months ago

Am 5/6/2024 um 8:00 PM schrieb Paul E. McKenney:
> On Mon, May 06, 2024 at 06:30:45PM +0200, Jonas Oberhauser wrote:
>> Am 5/6/2024 um 12:05 PM schrieb Jonas Oberhauser:
>>> Am 5/2/2024 um 1:21 AM schrieb Paul E. McKenney:
>>>> This commit adds four litmus tests showing that a failing cmpxchg()
>>>> operation is unordered unless followed by an smp_mb__after_atomic()
>>>> operation.
>>>
>>> So far, my understanding was that all RMW operations without suffix
>>> (xchg(), cmpxchg(), ...) will be interpreted as F[Mb];...;F[Mb].
>>>
>>> I guess this shows again how important it is to model these full
>>> barriers explicitly inside the cat model, instead of relying on implicit
>>> conversions internal to herd.
>>>
>>> I'd like to propose a patch to this effect.
>>>
>>> What is the intended behavior of a failed cmpxchg()? Is it the same as a
>>> relaxed one?
> 
> Yes, and unless I am too confused, LKMM currently does implement this.
> Please let me know if I am missing something.

At least the herd and Dat3M implementations seem to be doing that, at 
least according to this thread sent to me by Hernan.

https://github.com/herd/herdtools7/issues/384#issue-1243049709


> 
>>> My suggestion would be in the direction of marking read and write events
>>> of these operations as Mb, and then defining
>>>
>>> (* full barrier events that appear in non-failing RMW *)
>>> let RMW_MB = Mb & (dom(rmw) | range(rmw))
>>>
>>>
>>> let mb =
>>>       [M] ; fencerel(Mb) ; [M]
>>>     | [M] ; (po \ rmw) ; [RMW_MB] ; po^? ; [M]
>>>     | [M] ; po^? ; [RMW_MB] ; (po \ rmw) ; [M]
>>>     | ...
>>>
>>> The po \ rmw is because ordering is not provided internally of the rmw
>>
>> (removed the unnecessary si since LKMM is still non-mixed-accesses)
> 
> Addition of mixed-access support would be quite welcome!

:P


>> This could also be written with a single rule:
>>
>>       | [M] ; (po \ rmw) & (po^?; [RMW_MB] ; po^?) ; [M]
>>
>>> I suspect that after we added [rmw] sequences it could perhaps be
>>> simplified [...]
>>
>> No, my suspicion is wrong - this would incorrectly let full-barrier RMWs
>> act like strong fences when they appear in an rmw sequence.
>>
>>   if (z==1)  ||  x = 2;     ||  xchg(&y,2)  || if (y==2)
>>     x = 1;   ||  y =_rel 1; ||              ||    z=1;
>>
>>
>> right now, we allow x=2 overwriting x=1 (in case the last thread does not
>> propagate x=2 along with z=1) because on power, the xchg might be
>> implemented with a sync that doesn't get executed until the very end
>> of the program run.
>>
>>
>> Instead of its negative form (everything other than inside the rmw),
>> it could also be rewritten positively. Here's a somewhat short form:
>>
>> let mb =
>>       [M] ; fencerel(Mb) ; [M]
>>     (* everything across a full barrier RMW is ordered. This includes up to
>> one event inside the RMW. *)
>>     | [M] ; po ; [RMW_MB] ; po ; [M]
>>     (* full barrier RMW writes are ordered with everything behind the RMW *)
>>     | [W & RMW_MB] ; po ; [M]
>>     (* full barrier RMW reads are ordered with everything before the RMW *)
>>     | [M] ; po ; [R & RMW_MB]
>>     | ...
> 
> Does this produce the results expected by the litmus tests in the Linux
> kernel source tree and also those at https://github.com/paulmckrcu/litmus?

I suspect that it doesn't work out of the box because of some of the 
implicit magic herd is doing that could get in the way, so I'd need some 
help from Luc to actually turn this into a patch that can be tested.
(or at least confirmation that just by changing a few things in the .def 
& .bell files we can sidestep the implicit behaviors).

But at least in my proofs it seems to be equivalent.
(there may still be differences in opinion on what some herd things 
mean, so what I/Viktor have formalized as the semantics of the herd 
model may not be exactly the behavior of LKMM in herd. hence testing is 
necessary too as a sanity check)

best wishes,
   jonas

Re: [PATCH memory-model 2/4] Documentation/litmus-tests: Demonstrate unordered failing cmpxchg
Posted by Alan Stern 1 year, 7 months ago
On Mon, May 06, 2024 at 11:00:42AM -0700, Paul E. McKenney wrote:
> On Mon, May 06, 2024 at 06:30:45PM +0200, Jonas Oberhauser wrote:
> > Am 5/6/2024 um 12:05 PM schrieb Jonas Oberhauser:
> > > Am 5/2/2024 um 1:21 AM schrieb Paul E. McKenney:
> > > > This commit adds four litmus tests showing that a failing cmpxchg()
> > > > operation is unordered unless followed by an smp_mb__after_atomic()
> > > > operation.
> > > 
> > > So far, my understanding was that all RMW operations without suffix
> > > (xchg(), cmpxchg(), ...) will be interpreted as F[Mb];...;F[Mb].

It's more accurate to say that RMW operations without a suffix that 
return a value will be interpreted that way.  So for example, 
atomic_inc() doesn't imply any ordering, because it doesn't return a 
value.

> > > barriers explicitly inside the cat model, instead of relying on implicit
> > > conversions internal to herd.

Don't the annotations in linux-kernel.def and linux-kernel.bell (like 
"noreturn") already make this explicit?  

I guess the part that is still implicit is that herd7 doesn't regard 
failed RMW operations as actual RMWs (they don't have a store part).

Alan
Re: [PATCH memory-model 2/4] Documentation/litmus-tests: Demonstrate unordered failing cmpxchg
Posted by Jonas Oberhauser 1 year, 7 months ago

Am 5/6/2024 um 9:21 PM schrieb Alan Stern:
> On Mon, May 06, 2024 at 11:00:42AM -0700, Paul E. McKenney wrote:
>> On Mon, May 06, 2024 at 06:30:45PM +0200, Jonas Oberhauser wrote:
>>> Am 5/6/2024 um 12:05 PM schrieb Jonas Oberhauser:
>>>> Am 5/2/2024 um 1:21 AM schrieb Paul E. McKenney:
>>>>> This commit adds four litmus tests showing that a failing cmpxchg()
>>>>> operation is unordered unless followed by an smp_mb__after_atomic()
>>>>> operation.
>>>>
>>>> So far, my understanding was that all RMW operations without suffix
>>>> (xchg(), cmpxchg(), ...) will be interpreted as F[Mb];...;F[Mb].
> 
> It's more accurate to say that RMW operations without a suffix that
> return a value will be interpreted that way.  So for example,
> atomic_inc() doesn't imply any ordering, because it doesn't return a
> value.
> 

I see, thanks.

>>>> barriers explicitlyinside the cat model, instead of relying on implicit
>>>> conversions internal to herd.
> 
> Don't the annotations in linux-kernel.def and linux-kernel.bell (like
> "noreturn") already make this explicit?

Not that I'm aware. All I can see there is that according to .bell RMW 
don't have an mb mode, but according to .def they do.

How this mb disappears between parsing the code (.def) and interpreting 
it (.bell) is totally implicit. Including how noreturn affects this 
disappeareance.

In fact most tool developers that support LKMM (Viktor, Hernan, and Luc) 
were at least once confused about it. And I think they read those files 
more carefully than I.

https://github.com/herd/herdtools7/issues/384#issuecomment-1132859904

Note that while there's no explicit annotation of noreturn in the .def 
file, at least I can guess based on context that it should be annotated 
on all the functions that don't have _return and for which also a 
version with _return exists.


have fun,
   jonas
Re: [PATCH memory-model 2/4] Documentation/litmus-tests: Demonstrate unordered failing cmpxchg
Posted by Andrea Parri 1 year, 7 months ago
> > Don't the annotations in linux-kernel.def and linux-kernel.bell (like
> > "noreturn") already make this explicit?
> 
> Not that I'm aware. All I can see there is that according to .bell RMW don't
> have an mb mode, but according to .def they do.
> 
> How this mb disappears between parsing the code (.def) and interpreting it
> (.bell) is totally implicit. Including how noreturn affects this
> disappeareance.

IIRC, that's more or less implicit  ;-) in the herd macros of the .def file;
for example,


  (noreturn)

  - atomic_add(V,X) { __atomic_op(X,+,V); }

    Generates a pair of R*[noreturn] and W*[once] events


  (w/ return)

  - atomic_add_return_relaxed(V,X) __atomic_op_return{once}(X,+,V)

    Generates a pair of R*[once] and W*[once] events

  - atomic_add_return_acquire(V,X) __atomic_op_return{acquire}(X,+,V)

    Generates a pair of R*[acquire] and W*[once] events

  - atomic_add_return_release(V,X) __atomic_op_return{acquire}(X,+,V)

    Generates a pair of R*[once] and W*[release] events

  - atomic_add_return(V,X) __atomic_op_return{mb}(X,+,V)

    Generates a pair of R*[once] and W*[once] events plus two F[mb] events


  (possibly failing)

  - atomic_cmpxchg_relaxed(X,V,W) __cmpxchg{once}(X,V,W)

    Generates a pair of R*[once] and W*[once] events if successful;
    a single R*[once] event otherwise.

  - atomic_cmpxchg_acquire(X,V,W) __cmpxchg{acquire}(X,V,W)

    Generates a pair of R*[acquire] and W*[once] events if successful;
    a single R*[once] event otherwise.

  - atomic_cmpxchg_release(X,V,W) __cmpxchg{release}(X,V,W)

    Generates a pair of R*[once] and W*[release] events if successful;
    a single R*[once] event otherwise.

  - atomic_cmpxchg(X,V,W) __cmpxchg{mb}(X,V,W)

    Generates a pair of R*[once] and W*[once] events plus two F[mb] events
    if successful; a single R*[once] event otherwise.


The line

  instructions RMW[{'once,'acquire,'release}]

in the .bell file seems to be effectively redundant (perhaps a LISA backward
-compatibility?): consider

  $ cat rmw.litmus
  C rmw
    
  {}
    
  P0(atomic_t *x)
  {
	int r0;

	r0 = atomic_inc_return_release(x);
  }

  exists (x=1)

Some experiments:

  - Upon removing 'release from "(instructions) RMW"

    $ herd7 -conf linux-kernel.cfg rmw.litmus 
    Test rmw Allowed
    States 1
    [x]=1;
    Ok
    Witnesses
    Positive: 1 Negative: 0
    Condition exists ([x]=1)
    Observation rmw Always 1 0
    Time rmw 0.00
    Hash=3a2dd354c250206d993d31f05f3f595c

  - Upon restoring 'release in RMW and removing it from W

    $ herd7 -conf linux-kernel.cfg rmw.litmus 
    Test rmw Allowed
    States 1
    [x]=1;
    Ok
    Witnesses
    Positive: 1 Negative: 0
    Condition exists ([x]=1)
    Observation rmw Always 1 0
    Time rmw 0.00
    Hash=3a2dd354c250206d993d31f05f3f595c

  - Upon removing 'release from both W and RMW

    $ herd7 -conf linux-kernel.cfg rmw.litmus (herd complains... )
    File "./linux-kernel.bell", line 76, characters 32-39: unbound var: Release

But I'd have to defer to Luc/Jade about herd internals and/or recommended style
on this matter.

  Andrea