[PATCH] tools: memory-model: Add rmw-sequences to the LKMM

Alan Stern posted 1 patch 3 years, 4 months ago
There is a newer version of this series
tools/memory-model/Documentation/explanation.txt |   28 +++++++++++++++++++++++
tools/memory-model/linux-kernel.cat              |    5 ++--
2 files changed, 31 insertions(+), 2 deletions(-)
[PATCH] tools: memory-model: Add rmw-sequences to the LKMM
Posted by Alan Stern 3 years, 4 months ago
Jonas has pointed out a weakness in the Linux Kernel Memory Model.
Namely, the memory ordering properties of atomic operations are not
monotonic: An atomic op with full-barrier semantics does not always
provide ordering as strong as one with release-barrier semantics.

The following litmus test illustrates the problem:

--------------------------------------------------
C atomics-not-monotonic

{}

P0(int *x, atomic_t *y)
{
	WRITE_ONCE(*x, 1);
	smp_wmb();
	atomic_set(y, 1);
}

P1(atomic_t *y)
{
	int r1;

	r1 = atomic_inc_return(y);
}

P2(int *x, atomic_t *y)
{
	int r2;
	int r3;

	r2 = atomic_read(y);
	smp_rmb();
	r3 = READ_ONCE(*x);
}

exists (2:r2=2 /\ 2:r3=0)
--------------------------------------------------

The litmus test is allowed as shown with atomic_inc_return(), which
has full-barrier semantics.  But if the operation is changed to
atomic_inc_return_release(), which only has release-barrier semantics,
the litmus test is forbidden.  Clearly this violates monotonicity.

The reason is because the LKMM treats full-barrier atomic ops as if
they were written:

	mb();
	load();
	store();
	mb();

(where the load() and store() are the two parts of an atomic RMW op),
whereas it treats release-barrier atomic ops as if they were written:

	load();
	release_barrier();
	store();

The difference is that here the release barrier orders the load part
of the atomic op before the store part with A-cumulativity, whereas
the mb()'s above do not.  This means that release-barrier atomics can
effectively extend the cumul-fence relation but full-barrier atomics
cannot.

To resolve this problem we introduce the rmw-sequence relation,
representing an arbitrarily long sequence of atomic RMW operations in
which each operation reads from the previous one, and explicitly allow
it to extend cumul-fence.  This modification of the memory model is
sound; it holds for PPC because of B-cumulativity, it holds for TSO
and ARM64 because of other-multicopy atomicity, and we can assume that
atomic ops on all other architectures will be implemented so as to
make it hold for them.

For similar reasons we also allow rmw-sequence to extend the
w-post-bounded relation, which is analogous to cumul-fence in some
ways.

Suggested-by: Jonas Oberhauser <jonas.oberhauser@huawei.com>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>

---

 tools/memory-model/Documentation/explanation.txt |   28 +++++++++++++++++++++++
 tools/memory-model/linux-kernel.cat              |    5 ++--
 2 files changed, 31 insertions(+), 2 deletions(-)

Index: usb-devel/tools/memory-model/linux-kernel.cat
===================================================================
--- usb-devel.orig/tools/memory-model/linux-kernel.cat
+++ usb-devel/tools/memory-model/linux-kernel.cat
@@ -74,8 +74,9 @@ let ppo = to-r | to-w | fence | (po-unlo
 
 (* Propagation: Ordering from release operations and strong fences. *)
 let A-cumul(r) = (rfe ; [Marked])? ; r
+let rmw-sequence = (rf ; rmw)*
 let cumul-fence = [Marked] ; (A-cumul(strong-fence | po-rel) | wmb |
-	po-unlock-lock-po) ; [Marked]
+	po-unlock-lock-po) ; [Marked] ; rmw-sequence
 let prop = [Marked] ; (overwrite & ext)? ; cumul-fence* ;
 	[Marked] ; rfe? ; [Marked]
 
@@ -174,7 +175,7 @@ let vis = cumul-fence* ; rfe? ; [Marked]
 let w-pre-bounded = [Marked] ; (addr | fence)?
 let r-pre-bounded = [Marked] ; (addr | nonrw-fence |
 	([R4rmb] ; fencerel(Rmb) ; [~Noreturn]))?
-let w-post-bounded = fence? ; [Marked]
+let w-post-bounded = fence? ; [Marked] ; rmw-sequence
 let r-post-bounded = (nonrw-fence | ([~Noreturn] ; fencerel(Rmb) ; [R4rmb]))? ;
 	[Marked]
 
Index: usb-devel/tools/memory-model/Documentation/explanation.txt
===================================================================
--- usb-devel.orig/tools/memory-model/Documentation/explanation.txt
+++ usb-devel/tools/memory-model/Documentation/explanation.txt
@@ -1006,6 +1006,34 @@ order.  Equivalently,
 where the rmw relation links the read and write events making up each
 atomic update.  This is what the LKMM's "atomic" axiom says.
 
+Atomic rmw updates play one more role in the LKMM: They can form "rmw
+sequences".  An rmw sequence is simply a bunch of atomic updates where
+each update reads from the previous one.  Written using events, it
+looks like this:
+
+	Z0 ->rf Y1 ->rmw Z1 ->rf ... ->rf Yn ->rmw Zn,
+
+where Z0 is some store event and n can be any number (even 0, in the
+degenerate case).  We write this relation as: Z0 ->rmw-sequence Zn.
+Note that this implies Z0 and Zn are stores to the same variable.
+
+Rmw sequences have a special property in the LKMM: They can extend the
+cumul-fence relation.  That is, if we have:
+
+	U ->cumul-fence X -> rmw-sequence Y
+
+then also U ->cumul-fence Y.  Thinking about this in terms of the
+operational model, U ->cumul-fence X says that the store U propagates
+to each CPU before the store X does.  Then the fact that X and Y are
+linked by an rmw sequence means that U also propagates to each CPU
+before Y does.
+
+(The notion of rmw sequences in the LKMM is similar to, but not quite
+the same as, that of release sequences in the C11 memory model.  They
+were added to the LKMM to fix an obscure bug; without them, atomic
+updates with full-barrier semantics did not always guarantee ordering
+at least as strong as atomic updates with release-barrier semantics.)
+
 
 THE PRESERVED PROGRAM ORDER RELATION: ppo
 -----------------------------------------
RE: [PATCH] tools: memory-model: Add rmw-sequences to the LKMM
Posted by Jonas Oberhauser 3 years, 4 months ago


	
> -----Original Message-----
> From: Alan Stern [mailto:stern@rowland.harvard.edu] 
> Sent: Monday, November 14, 2022 6:26 PM

Hi Alan,
thanks for preparing this!

> Jonas has pointed out a weakness in the Linux Kernel Memory Model.
> Namely, the memory ordering properties of atomic operations are not
> monotonic: An atomic op with full-barrier semantics does not always provide ordering as strong as one with release-barrier semantics.

Note that I believe it was Viktor who originally pointed out this weakness to me
in private communication. My contribution (besides chatting with you) is to
check that the solution does indeed restore the monotonicity (not just on some
litmus tests but in general).

So I would change the wording to "Viktor has pointed out a weakness in the Linux
Kernel Memory Model."


> +let rmw-sequence = (rf ; rmw)*

I would perhaps suggest to only consider external read-from in rmw-sequences, as
below:
+let rmw-sequence = (rfe ; rmw)*

The reason I (slightly) prefer this is that this is sufficient to imply
monotonicity.
Also there is some minor concern that the patch that results in the stricter
model (i.e., rmw-sequence = (rf ; rmw)*) might be incorrect on some hypothetical
future architecture in which RMWs can be merged in the store coalescing queue
with earlier stores to the same location. This is exemplified in the following
litmus test:

C atomics-not-monotonic-2

{}

P0(int *x, atomic_t *y)
{
    int r1;
	WRITE_ONCE(*x, 1);
	smp_store_release(y, 0);
	r1 = atomic_inc_return_relaxed(y);
}

P1(atomic_t *y)
{
	int r1;

	r1 = atomic_inc_return(y);
}

P2(int *x, atomic_t *y)
{
	int r2;
	int r3;

	r2 = atomic_read(y);
	smp_rmb();
	r3 = READ_ONCE(*x);
}

exists (2:r2=2 /\ 2:r3=0)

Here such a hypothetical future architecture could merge the operations to *y by
P0 into a single store, effectively turning the code of P0 into

P0(int *x, atomic_t *y)
{
    int r1;
	WRITE_ONCE(*x, 1);
	WRITE_ONCE(*y, 1);
	r1 = 0;
}

The stricter patch would not be sound with this hypothetical architecture, while
the more relaxed patch should be.

I don't think such a future architecture is likely since I don't expect there to
be any practical performance impact. At the same time I also don't currently see
any advantage of the stricter model.

For this reason I would slightly prefer the more relaxed model.



> +Rmw sequences have a special property in the LKMM: They can extend the 
> +cumul-fence relation.  That is, if we have:
> +
> +	U ->cumul-fence X -> rmw-sequence Y
> +
> +then also U ->cumul-fence Y.  Thinking about this in terms of the 
> +operational model, U ->cumul-fence X says that the store U propagates 
> +to each CPU before the store X does.  Then the fact that X and Y are 
> +linked by an rmw sequence means that U also propagates to each CPU 
> +before Y does.
> +

Here I would add that the rmw sequences also play a similar role in the
w-post-bounded relation. For example as follows:

+Rmw sequences have a special property in the LKMM: They can extend the 
+cumul-fence and w-post-bounded relations.  That is, if we have:
+
+	U ->cumul-fence X -> rmw-sequence Y
+
+then also U ->cumul-fence Y, and analogously if we have
+
+	U ->w-post-bounded X -> rmw-sequence Y
+
+then also U ->w-post-bounded Y. Thinking about this in terms of the 
+operational model, U ->cumul-fence X says that the store U propagates 
+to each CPU before the store X does.  Then the fact that X and Y are 
+linked by an rmw sequence means that U also propagates to each CPU 
+before Y does.
+


Best wishes,
jonas
Re: [PATCH] tools: memory-model: Add rmw-sequences to the LKMM
Posted by Alan Stern 3 years, 4 months ago
On Tue, Nov 15, 2022 at 02:05:39PM +0000, Jonas Oberhauser wrote:
> 
> 
> 
> 	
> > -----Original Message-----
> > From: Alan Stern [mailto:stern@rowland.harvard.edu] 
> > Sent: Monday, November 14, 2022 6:26 PM
> 
> Hi Alan,
> thanks for preparing this!
> 
> > Jonas has pointed out a weakness in the Linux Kernel Memory Model.
> > Namely, the memory ordering properties of atomic operations are not
> > monotonic: An atomic op with full-barrier semantics does not always provide ordering as strong as one with release-barrier semantics.
> 
> Note that I believe it was Viktor who originally pointed out this weakness to me
> in private communication. My contribution (besides chatting with you) is to
> check that the solution does indeed restore the monotonicity (not just on some
> litmus tests but in general).
> 
> So I would change the wording to "Viktor has pointed out a weakness in the Linux
> Kernel Memory Model."

People will wonder who Viktor is.  I don't have his full name or email 
address.  In fact, shouldn't he have been CC'ed during this entire 
discussion?

> > +let rmw-sequence = (rf ; rmw)*
> 
> I would perhaps suggest to only consider external read-from in rmw-sequences, as
> below:
> +let rmw-sequence = (rfe ; rmw)*

We discussed the matter earlier, and I don't recall any mention of this 
objection.

> The reason I (slightly) prefer this is that this is sufficient to imply
> monotonicity.
> Also there is some minor concern that the patch that results in the stricter
> model (i.e., rmw-sequence = (rf ; rmw)*) might be incorrect on some hypothetical
> future architecture in which RMWs can be merged in the store coalescing queue
> with earlier stores to the same location. This is exemplified in the following
> litmus test:
> 
> C atomics-not-monotonic-2
> 
> {}
> 
> P0(int *x, atomic_t *y)
> {
>     int r1;
> 	WRITE_ONCE(*x, 1);
> 	smp_store_release(y, 0);
> 	r1 = atomic_inc_return_relaxed(y);
> }
> 
> P1(atomic_t *y)
> {
> 	int r1;
> 
> 	r1 = atomic_inc_return(y);
> }
> 
> P2(int *x, atomic_t *y)
> {
> 	int r2;
> 	int r3;
> 
> 	r2 = atomic_read(y);
> 	smp_rmb();
> 	r3 = READ_ONCE(*x);
> }
> 
> exists (2:r2=2 /\ 2:r3=0)
> 
> Here such a hypothetical future architecture could merge the operations to *y by
> P0 into a single store, effectively turning the code of P0 into
> 
> P0(int *x, atomic_t *y)
> {
>     int r1;
> 	WRITE_ONCE(*x, 1);
> 	WRITE_ONCE(*y, 1);
> 	r1 = 0;
> }
> 
> The stricter patch would not be sound with this hypothetical architecture, while
> the more relaxed patch should be.
> 
> I don't think such a future architecture is likely since I don't expect there to
> be any practical performance impact. At the same time I also don't currently see
> any advantage of the stricter model.
> 
> For this reason I would slightly prefer the more relaxed model.

I don't see any point in worrying about hypothetical future 
architectures that might use a questionable design.

Also, given that this test is forbidden:

P0				P1		P2
-------------------------	--------------	----------------------------
WRITE_ONCE(*x, 1);		atomic_inc(y);	r1 = atomic_read_acquire(y);
atomic_set_release(y, 1);			r2 = READ_ONCE(*x);
exists (2:r1=2 /\ 2:r2=0)

shouldn't the following also be forbidden?

P0				P1		P2
-------------------------	--------------	----------------------------
WRITE_ONCE(*x, 1);		atomic_inc(y);	r1 = atomic_read_acquire(y);
atomic_set_release(y, 1);	atomic_inc(y);	r2 = READ_ONCE(*x);
exists (2:r1=3 /\ 2:r2=0)

> > +Rmw sequences have a special property in the LKMM: They can extend the 
> > +cumul-fence relation.  That is, if we have:
> > +
> > +	U ->cumul-fence X -> rmw-sequence Y
> > +
> > +then also U ->cumul-fence Y.  Thinking about this in terms of the 
> > +operational model, U ->cumul-fence X says that the store U propagates 
> > +to each CPU before the store X does.  Then the fact that X and Y are 
> > +linked by an rmw sequence means that U also propagates to each CPU 
> > +before Y does.
> > +
> 
> Here I would add that the rmw sequences also play a similar role in the
> w-post-bounded relation. For example as follows:
> 
> +Rmw sequences have a special property in the LKMM: They can extend the 
> +cumul-fence and w-post-bounded relations.  That is, if we have:
> +
> +	U ->cumul-fence X -> rmw-sequence Y
> +
> +then also U ->cumul-fence Y, and analogously if we have
> +
> +	U ->w-post-bounded X -> rmw-sequence Y
> +
> +then also U ->w-post-bounded Y. Thinking about this in terms of the 
> +operational model, U ->cumul-fence X says that the store U propagates 
> +to each CPU before the store X does.  Then the fact that X and Y are 
> +linked by an rmw sequence means that U also propagates to each CPU 
> +before Y does.
> +

I considered this and specifically decided against it, because the 
w-post-bounded relation has not yet been introduced at this point in the 
document.  It doesn't show up until much later.  (Also, there didn't 
seem to be any graceful way of mentioning this fact at the point where 
w-post-bounded does get introduced, and on the whole the matter didn't 
seem to be all that important.)

Instead of your suggested change, I suppose it would be okay to say, at 
the end of the paragraph:

	(In an analogous way, rmw sequences can extend the 
	w-post-bounded relation defined below in the PLAIN ACCESSES
	AND DATA RACES section.)

Or something like this could be added to the ODDS AND ENDS section at 
the end of the document.

Alan
Re: [PATCH] tools: memory-model: Add rmw-sequences to the LKMM
Posted by Paul E. McKenney 3 years, 4 months ago
On Tue, Nov 15, 2022 at 11:13:12AM -0500, Alan Stern wrote:
> On Tue, Nov 15, 2022 at 02:05:39PM +0000, Jonas Oberhauser wrote:
> > 
> > 
> > 
> > 	
> > > -----Original Message-----
> > > From: Alan Stern [mailto:stern@rowland.harvard.edu] 
> > > Sent: Monday, November 14, 2022 6:26 PM
> > 
> > Hi Alan,
> > thanks for preparing this!
> > 
> > > Jonas has pointed out a weakness in the Linux Kernel Memory Model.
> > > Namely, the memory ordering properties of atomic operations are not
> > > monotonic: An atomic op with full-barrier semantics does not always provide ordering as strong as one with release-barrier semantics.
> > 
> > Note that I believe it was Viktor who originally pointed out this weakness to me
> > in private communication. My contribution (besides chatting with you) is to
> > check that the solution does indeed restore the monotonicity (not just on some
> > litmus tests but in general).
> > 
> > So I would change the wording to "Viktor has pointed out a weakness in the Linux
> > Kernel Memory Model."
> 
> People will wonder who Viktor is.  I don't have his full name or email 
> address.  In fact, shouldn't he have been CC'ed during this entire 
> discussion?

Viktor Vafeiadis <viktor@mpi-sws.org>

But I defer to Jonas on CCing, just in case Viktor needs to be provided
context on this discussion.

							Thanx, Paul

> > > +let rmw-sequence = (rf ; rmw)*
> > 
> > I would perhaps suggest to only consider external read-from in rmw-sequences, as
> > below:
> > +let rmw-sequence = (rfe ; rmw)*
> 
> We discussed the matter earlier, and I don't recall any mention of this 
> objection.
> 
> > The reason I (slightly) prefer this is that this is sufficient to imply
> > monotonicity.
> > Also there is some minor concern that the patch that results in the stricter
> > model (i.e., rmw-sequence = (rf ; rmw)*) might be incorrect on some hypothetical
> > future architecture in which RMWs can be merged in the store coalescing queue
> > with earlier stores to the same location. This is exemplified in the following
> > litmus test:
> > 
> > C atomics-not-monotonic-2
> > 
> > {}
> > 
> > P0(int *x, atomic_t *y)
> > {
> >     int r1;
> > 	WRITE_ONCE(*x, 1);
> > 	smp_store_release(y, 0);
> > 	r1 = atomic_inc_return_relaxed(y);
> > }
> > 
> > P1(atomic_t *y)
> > {
> > 	int r1;
> > 
> > 	r1 = atomic_inc_return(y);
> > }
> > 
> > P2(int *x, atomic_t *y)
> > {
> > 	int r2;
> > 	int r3;
> > 
> > 	r2 = atomic_read(y);
> > 	smp_rmb();
> > 	r3 = READ_ONCE(*x);
> > }
> > 
> > exists (2:r2=2 /\ 2:r3=0)
> > 
> > Here such a hypothetical future architecture could merge the operations to *y by
> > P0 into a single store, effectively turning the code of P0 into
> > 
> > P0(int *x, atomic_t *y)
> > {
> >     int r1;
> > 	WRITE_ONCE(*x, 1);
> > 	WRITE_ONCE(*y, 1);
> > 	r1 = 0;
> > }
> > 
> > The stricter patch would not be sound with this hypothetical architecture, while
> > the more relaxed patch should be.
> > 
> > I don't think such a future architecture is likely since I don't expect there to
> > be any practical performance impact. At the same time I also don't currently see
> > any advantage of the stricter model.
> > 
> > For this reason I would slightly prefer the more relaxed model.
> 
> I don't see any point in worrying about hypothetical future 
> architectures that might use a questionable design.
> 
> Also, given that this test is forbidden:
> 
> P0				P1		P2
> -------------------------	--------------	----------------------------
> WRITE_ONCE(*x, 1);		atomic_inc(y);	r1 = atomic_read_acquire(y);
> atomic_set_release(y, 1);			r2 = READ_ONCE(*x);
> exists (2:r1=2 /\ 2:r2=0)
> 
> shouldn't the following also be forbidden?
> 
> P0				P1		P2
> -------------------------	--------------	----------------------------
> WRITE_ONCE(*x, 1);		atomic_inc(y);	r1 = atomic_read_acquire(y);
> atomic_set_release(y, 1);	atomic_inc(y);	r2 = READ_ONCE(*x);
> exists (2:r1=3 /\ 2:r2=0)
> 
> > > +Rmw sequences have a special property in the LKMM: They can extend the 
> > > +cumul-fence relation.  That is, if we have:
> > > +
> > > +	U ->cumul-fence X -> rmw-sequence Y
> > > +
> > > +then also U ->cumul-fence Y.  Thinking about this in terms of the 
> > > +operational model, U ->cumul-fence X says that the store U propagates 
> > > +to each CPU before the store X does.  Then the fact that X and Y are 
> > > +linked by an rmw sequence means that U also propagates to each CPU 
> > > +before Y does.
> > > +
> > 
> > Here I would add that the rmw sequences also play a similar role in the
> > w-post-bounded relation. For example as follows:
> > 
> > +Rmw sequences have a special property in the LKMM: They can extend the 
> > +cumul-fence and w-post-bounded relations.  That is, if we have:
> > +
> > +	U ->cumul-fence X -> rmw-sequence Y
> > +
> > +then also U ->cumul-fence Y, and analogously if we have
> > +
> > +	U ->w-post-bounded X -> rmw-sequence Y
> > +
> > +then also U ->w-post-bounded Y. Thinking about this in terms of the 
> > +operational model, U ->cumul-fence X says that the store U propagates 
> > +to each CPU before the store X does.  Then the fact that X and Y are 
> > +linked by an rmw sequence means that U also propagates to each CPU 
> > +before Y does.
> > +
> 
> I considered this and specifically decided against it, because the 
> w-post-bounded relation has not yet been introduced at this point in the 
> document.  It doesn't show up until much later.  (Also, there didn't 
> seem to be any graceful way of mentioning this fact at the point where 
> w-post-bounded does get introduced, and on the whole the matter didn't 
> seem to be all that important.)
> 
> Instead of your suggested change, I suppose it would be okay to say, at 
> the end of the paragraph:
> 
> 	(In an analogous way, rmw sequences can extend the 
> 	w-post-bounded relation defined below in the PLAIN ACCESSES
> 	AND DATA RACES section.)
> 
> Or something like this could be added to the ODDS AND ENDS section at 
> the end of the document.
> 
> Alan
RE: [PATCH] tools: memory-model: Add rmw-sequences to the LKMM
Posted by Jonas Oberhauser 3 years, 4 months ago

> -----Original Message-----
> From: Paul E. McKenney [mailto:paulmck@kernel.org] 
> Sent: Wednesday, November 16, 2022 5:59 AM
> > On Tue, Nov 15, 2022 at 11:13:12AM -0500, Alan Stern wrote:
> > > On Tue, Nov 15, 2022 at 02:05:39PM +0000, Jonas Oberhauser wrote:
> > > So I would change the wording to "Viktor has pointed out a weakness 
> > > in the Linux Kernel Memory Model."
> > 
> > People will wonder who Viktor is.  I don't have his full name or email 
> > address.  In fact, shouldn't he have been CC'ed during this entire 
> > discussion?
> 
> [...] I defer to Jonas on CCing, just in case Viktor needs to be provided context on this discussion.


When I noticed Viktor wasn't in CC I asked him in person if we should add him,
but he said it's not necessary. I think he is too busy to catch up with the
complete discussion anyways.

I gave him a brief summary of our discussions so far though.

> > > +let rmw-sequence = (rf ; rmw)*
> > 
> > I would perhaps suggest to only consider external read-from in 
> > rmw-sequences, as
> > below:
> > +let rmw-sequence = (rfe ; rmw)*
> 
> We discussed the matter earlier, and I don't recall any mention of 
> this objection.

Is it possible that you did not receive some messages I sent? I mentioned it at
least twice (and I just double checked that I didn't accidentally drop you from
CC -- one of them from 9/30 including the string "C arm-behavior" if you want to
search).
Paul McKenney says some of my messages frequently land in his spam folder... :( 
 -- which doesn't say anything positive about my messaging patterns I suppose :P


> > > I don't think such a future architecture is likely since I don't 
> > > expect there to be any practical performance impact. At the same 
> > > time I also don't currently see any advantage of the stricter model.
> > > 
> > > For this reason I would slightly prefer the more relaxed model.
> > 
> > I don't see any point in worrying about hypothetical future 
> > architectures that might use a questionable design.

I usually would agree and not use as an argument absurd designs I might imagine.
Now I'm personally definitely not aware of any specific aims to implement such a
design, but the reason I brought it up was that  this kind of design is not my
invention either; as I had (perhaps incorrectly) remembered someone had told me
there was concrete interest in doing a design along these lines and hence the 
C11 definition of data races might need to be weakened.

A quick google search only turns up the following C11 proposal (+ its ancestors)
   https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0982r1.html
which only considers relaxing the store part of the release sequence definition
(but keeps the RMW part like in the proposal you have given), and they don't
mention specific hardware optimizations that might necessitate such a change.

(They do mention specifically compiler optimizations that might do such a merge,
which is of no direct relevance to LKMM (since all rmw accesses are compiler
barriers, this  kind of optimization can not happen in Linux anyways).
They also raise a concern about how adding another thread creates some weird
behavior with C11 release sequences, but that concern is also not relevant to 
RMWs.)

So on third/fourth thought it's entirely possible that I just misremembered the
details, and perhaps this design was just given to me by someone trying to
explain to me why people might be concerned that this kind of definition might
needlessly restrict some design space of architects.

> > Also, given that this test is forbidden:
> > 
> > P0				P1		P2
> > -------------------------	--------------	----------------------------
> > WRITE_ONCE(*x, 1);		atomic_inc(y);	r1 = atomic_read_acquire(y);
> > atomic_set_release(y, 1);			r2 = READ_ONCE(*x);
> > exists (2:r1=2 /\ 2:r2=0)
> > 
> > shouldn't the following also be forbidden?
> > 
> > P0				P1		P2
> > -------------------------	--------------	----------------------------
> > WRITE_ONCE(*x, 1);		atomic_inc(y);	r1 = atomic_read_acquire(y);
> > atomic_set_release(y, 1);	atomic_inc(y);	r2 = READ_ONCE(*x);
> > exists (2:r1=3 /\ 2:r2=0)

I think so, and in fact I've considered before a third option where only the
first rf must be external like so
+let rmw-sequence = (rfe ; rmw ; (rf ; rmw)*)?
but it is quite ugly. Thus imho it is the least desirable option right now,
given that it combines providing an unused feature with worrying about an unused 
"optimization".

(The reason why allowing rfi after the initial cumul-fence store should be sound 
is that all subsequent rf's read from rmw operations, which after merging will
still be rmw operations (just doing two updates in a single rmw). E.g. the two
atomic_inc in the litmus test you provided might be turned into a single 
atomic_add(..., 2), but this does not affect the overall rmw sequence.)

Anyways, this is only a slight preference. In fact, given that probably no code
relies on this extra ordering it should also be easy enough to adapt to unlikely 
changes at the hardware side. Another thought I've had is that it might be 
possible to "inherit" the release ordering across such merges in such a hardware
implementation, in which case your definition should still be correct even if a
merge happened.

So if nobody else has any opinions on this, I'm ok with following your
suggestion.

> > 
> > > > +Rmw sequences have a special property in the LKMM: They can 
> > > > +extend the cumul-fence relation.  That is, if we have:
> > > > +
> > > > +	U ->cumul-fence X -> rmw-sequence Y
> > > > +
> > > > +then also U ->cumul-fence Y.  Thinking about this in terms of the 
> > > > +operational model, U ->cumul-fence X says that the store U 
> > > > +propagates to each CPU before the store X does.  Then the fact 
> > > > +that X and Y are linked by an rmw sequence means that U also 
> > > > +propagates to each CPU before Y does.
> > > > +
> > > 
> > > Here I would add that the rmw sequences also play a similar role in 
> > > the w-post-bounded relation.
> > 
> > I considered this and specifically decided against it, because the 
> > w-post-bounded relation has not yet been introduced at this point in 
> > the document.  It doesn't show up until much later.  (Also, there 
> > didn't seem to be any graceful way of mentioning this fact at the 
> > point where w-post-bounded does get introduced, and on the whole the 
> > matter didn't seem to be all that important.)
> > 
> > Instead of your suggested change, I suppose it would be okay to say, 
> > at the end of the paragraph:
> > 
> > 	(In an analogous way, rmw sequences can extend the 
> > 	w-post-bounded relation defined below in the PLAIN ACCESSES
> > 	AND DATA RACES section.)

Great point! I'm in favor of this solution.

Best wishes,
jonas
Re: [PATCH] tools: memory-model: Add rmw-sequences to the LKMM
Posted by Boqun Feng 3 years, 4 months ago
On Mon, Nov 14, 2022 at 12:26:23PM -0500, Alan Stern wrote:
> Jonas has pointed out a weakness in the Linux Kernel Memory Model.
> Namely, the memory ordering properties of atomic operations are not
> monotonic: An atomic op with full-barrier semantics does not always
> provide ordering as strong as one with release-barrier semantics.
> 
> The following litmus test illustrates the problem:
> 
> --------------------------------------------------
> C atomics-not-monotonic
> 
> {}
> 
> P0(int *x, atomic_t *y)
> {
> 	WRITE_ONCE(*x, 1);
> 	smp_wmb();
> 	atomic_set(y, 1);
> }
> 
> P1(atomic_t *y)
> {
> 	int r1;
> 
> 	r1 = atomic_inc_return(y);
> }
> 
> P2(int *x, atomic_t *y)
> {
> 	int r2;
> 	int r3;
> 
> 	r2 = atomic_read(y);
> 	smp_rmb();
> 	r3 = READ_ONCE(*x);
> }
> 
> exists (2:r2=2 /\ 2:r3=0)
> --------------------------------------------------
> 
> The litmus test is allowed as shown with atomic_inc_return(), which
> has full-barrier semantics.  But if the operation is changed to
> atomic_inc_return_release(), which only has release-barrier semantics,
> the litmus test is forbidden.  Clearly this violates monotonicity.
> 
> The reason is because the LKMM treats full-barrier atomic ops as if
> they were written:
> 
> 	mb();
> 	load();
> 	store();
> 	mb();
> 
> (where the load() and store() are the two parts of an atomic RMW op),
> whereas it treats release-barrier atomic ops as if they were written:
> 
> 	load();
> 	release_barrier();
> 	store();
> 
> The difference is that here the release barrier orders the load part
> of the atomic op before the store part with A-cumulativity, whereas
> the mb()'s above do not.  This means that release-barrier atomics can
> effectively extend the cumul-fence relation but full-barrier atomics
> cannot.
> 
> To resolve this problem we introduce the rmw-sequence relation,
> representing an arbitrarily long sequence of atomic RMW operations in
> which each operation reads from the previous one, and explicitly allow
> it to extend cumul-fence.  This modification of the memory model is
> sound; it holds for PPC because of B-cumulativity, it holds for TSO
> and ARM64 because of other-multicopy atomicity, and we can assume that
> atomic ops on all other architectures will be implemented so as to
> make it hold for them.
> 
> For similar reasons we also allow rmw-sequence to extend the
> w-post-bounded relation, which is analogous to cumul-fence in some
> ways.
> 
> Suggested-by: Jonas Oberhauser <jonas.oberhauser@huawei.com>
> Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
> 

Reviewed-by: Boqun Feng <boqun.feng@gmail.com>

Thanks!

Regards,
Boqun

> ---
> 
>  tools/memory-model/Documentation/explanation.txt |   28 +++++++++++++++++++++++
>  tools/memory-model/linux-kernel.cat              |    5 ++--
>  2 files changed, 31 insertions(+), 2 deletions(-)
> 
> Index: usb-devel/tools/memory-model/linux-kernel.cat
> ===================================================================
> --- usb-devel.orig/tools/memory-model/linux-kernel.cat
> +++ usb-devel/tools/memory-model/linux-kernel.cat
> @@ -74,8 +74,9 @@ let ppo = to-r | to-w | fence | (po-unlo
>  
>  (* Propagation: Ordering from release operations and strong fences. *)
>  let A-cumul(r) = (rfe ; [Marked])? ; r
> +let rmw-sequence = (rf ; rmw)*
>  let cumul-fence = [Marked] ; (A-cumul(strong-fence | po-rel) | wmb |
> -	po-unlock-lock-po) ; [Marked]
> +	po-unlock-lock-po) ; [Marked] ; rmw-sequence
>  let prop = [Marked] ; (overwrite & ext)? ; cumul-fence* ;
>  	[Marked] ; rfe? ; [Marked]
>  
> @@ -174,7 +175,7 @@ let vis = cumul-fence* ; rfe? ; [Marked]
>  let w-pre-bounded = [Marked] ; (addr | fence)?
>  let r-pre-bounded = [Marked] ; (addr | nonrw-fence |
>  	([R4rmb] ; fencerel(Rmb) ; [~Noreturn]))?
> -let w-post-bounded = fence? ; [Marked]
> +let w-post-bounded = fence? ; [Marked] ; rmw-sequence
>  let r-post-bounded = (nonrw-fence | ([~Noreturn] ; fencerel(Rmb) ; [R4rmb]))? ;
>  	[Marked]
>  
> Index: usb-devel/tools/memory-model/Documentation/explanation.txt
> ===================================================================
> --- usb-devel.orig/tools/memory-model/Documentation/explanation.txt
> +++ usb-devel/tools/memory-model/Documentation/explanation.txt
> @@ -1006,6 +1006,34 @@ order.  Equivalently,
>  where the rmw relation links the read and write events making up each
>  atomic update.  This is what the LKMM's "atomic" axiom says.
>  
> +Atomic rmw updates play one more role in the LKMM: They can form "rmw
> +sequences".  An rmw sequence is simply a bunch of atomic updates where
> +each update reads from the previous one.  Written using events, it
> +looks like this:
> +
> +	Z0 ->rf Y1 ->rmw Z1 ->rf ... ->rf Yn ->rmw Zn,
> +
> +where Z0 is some store event and n can be any number (even 0, in the
> +degenerate case).  We write this relation as: Z0 ->rmw-sequence Zn.
> +Note that this implies Z0 and Zn are stores to the same variable.
> +
> +Rmw sequences have a special property in the LKMM: They can extend the
> +cumul-fence relation.  That is, if we have:
> +
> +	U ->cumul-fence X -> rmw-sequence Y
> +
> +then also U ->cumul-fence Y.  Thinking about this in terms of the
> +operational model, U ->cumul-fence X says that the store U propagates
> +to each CPU before the store X does.  Then the fact that X and Y are
> +linked by an rmw sequence means that U also propagates to each CPU
> +before Y does.
> +
> +(The notion of rmw sequences in the LKMM is similar to, but not quite
> +the same as, that of release sequences in the C11 memory model.  They
> +were added to the LKMM to fix an obscure bug; without them, atomic
> +updates with full-barrier semantics did not always guarantee ordering
> +at least as strong as atomic updates with release-barrier semantics.)
> +
>  
>  THE PRESERVED PROGRAM ORDER RELATION: ppo
>  -----------------------------------------
>