[PATCH memory-model 2/8] tools/memory-model: Unify UNLOCK+LOCK pairings to po-unlock-lock-po

Paul E. McKenney posted 8 patches 2 years, 10 months ago
[PATCH memory-model 2/8] tools/memory-model: Unify UNLOCK+LOCK pairings to po-unlock-lock-po
Posted by Paul E. McKenney 2 years, 10 months ago
From: Jonas Oberhauser <jonas.oberhauser@huaweicloud.com>

LKMM uses two relations for talking about UNLOCK+LOCK pairings:

	1) po-unlock-lock-po, which handles UNLOCK+LOCK pairings
	   on the same CPU or immediate lock handovers on the same
	   lock variable

	2) po;[UL];(co|po);[LKW];po, which handles UNLOCK+LOCK pairs
	   literally as described in rcupdate.h#L1002, i.e., even
	   after a sequence of handovers on the same lock variable.

The latter relation is used only once, to provide the guarantee
defined in rcupdate.h#L1002 by smp_mb__after_unlock_lock(), which
makes any UNLOCK+LOCK pair followed by the fence behave like a full
barrier.

This patch drops this use in favor of using po-unlock-lock-po
everywhere, which unifies the way the model talks about UNLOCK+LOCK
pairings.  At first glance this seems to weaken the guarantee given
by LKMM: When considering a long sequence of lock handovers
such as below, where P0 hands the lock to P1, which hands it to P2,
which finally executes such an after_unlock_lock fence, the mb
relation currently links any stores in the critical section of P0
to instructions P2 executes after its fence, but not so after the
patch.

P0(int *x, int *y, spinlock_t *mylock)
{
        spin_lock(mylock);
        WRITE_ONCE(*x, 2);
        spin_unlock(mylock);
        WRITE_ONCE(*y, 1);
}

P1(int *y, int *z, spinlock_t *mylock)
{
        int r0 = READ_ONCE(*y); // reads 1
        spin_lock(mylock);
        spin_unlock(mylock);
        WRITE_ONCE(*z,1);
}

P2(int *z, int *d, spinlock_t *mylock)
{
        int r1 = READ_ONCE(*z); // reads 1
        spin_lock(mylock);
        spin_unlock(mylock);
        smp_mb__after_unlock_lock();
        WRITE_ONCE(*d,1);
}

P3(int *x, int *d)
{
        WRITE_ONCE(*d,2);
        smp_mb();
        WRITE_ONCE(*x,1);
}

exists (1:r0=1 /\ 2:r1=1 /\ x=2 /\ d=2)

Nevertheless, the ordering guarantee given in rcupdate.h is actually
not weakened.  This is because the unlock operations along the
sequence of handovers are A-cumulative fences.  They ensure that any
stores that propagate to the CPU performing the first unlock
operation in the sequence must also propagate to every CPU that
performs a subsequent lock operation in the sequence.  Therefore any
such stores will also be ordered correctly by the fence even if only
the final handover is considered a full barrier.

Indeed this patch does not affect the behaviors allowed by LKMM at
all.  The mb relation is used to define ordering through:
1) mb/.../ppo/hb, where the ordering is subsumed by hb+ where the
   lock-release, rfe, and unlock-acquire orderings each provide hb
2) mb/strong-fence/cumul-fence/prop, where the rfe and A-cumulative
   lock-release orderings simply add more fine-grained cumul-fence
   edges to substitute a single strong-fence edge provided by a long
   lock handover sequence
3) mb/strong-fence/pb and various similar uses in the definition of
   data races, where as discussed above any long handover sequence
   can be turned into a sequence of cumul-fence edges that provide
   the same ordering.

Signed-off-by: Jonas Oberhauser <jonas.oberhauser@huaweicloud.com>
Reviewed-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 tools/memory-model/linux-kernel.cat | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/tools/memory-model/linux-kernel.cat b/tools/memory-model/linux-kernel.cat
index 07f884f9b2bf..6e531457bb73 100644
--- a/tools/memory-model/linux-kernel.cat
+++ b/tools/memory-model/linux-kernel.cat
@@ -37,8 +37,19 @@ let mb = ([M] ; fencerel(Mb) ; [M]) |
 	([M] ; fencerel(Before-atomic) ; [RMW] ; po? ; [M]) |
 	([M] ; po? ; [RMW] ; fencerel(After-atomic) ; [M]) |
 	([M] ; po? ; [LKW] ; fencerel(After-spinlock) ; [M]) |
-	([M] ; po ; [UL] ; (co | po) ; [LKW] ;
-		fencerel(After-unlock-lock) ; [M])
+(*
+ * Note: The po-unlock-lock-po relation only passes the lock to the direct
+ * successor, perhaps giving the impression that the ordering of the
+ * smp_mb__after_unlock_lock() fence only affects a single lock handover.
+ * However, in a longer sequence of lock handovers, the implicit
+ * A-cumulative release fences of lock-release ensure that any stores that
+ * propagate to one of the involved CPUs before it hands over the lock to
+ * the next CPU will also propagate to the final CPU handing over the lock
+ * to the CPU that executes the fence.  Therefore, all those stores are
+ * also affected by the fence.
+ *)
+	([M] ; po-unlock-lock-po ;
+		[After-unlock-lock] ; po ; [M])
 let gp = po ; [Sync-rcu | Sync-srcu] ; po?
 let strong-fence = mb | gp
 
-- 
2.40.0.rc2
Re: [PATCH memory-model 2/8] tools/memory-model: Unify UNLOCK+LOCK pairings to po-unlock-lock-po
Posted by Andrea Parri 2 years, 10 months ago
On Mon, Mar 20, 2023 at 06:02:40PM -0700, Paul E. McKenney wrote:
> From: Jonas Oberhauser <jonas.oberhauser@huaweicloud.com>
> 
> LKMM uses two relations for talking about UNLOCK+LOCK pairings:
> 
> 	1) po-unlock-lock-po, which handles UNLOCK+LOCK pairings
> 	   on the same CPU or immediate lock handovers on the same
> 	   lock variable
> 
> 	2) po;[UL];(co|po);[LKW];po, which handles UNLOCK+LOCK pairs
> 	   literally as described in rcupdate.h#L1002, i.e., even
> 	   after a sequence of handovers on the same lock variable.
> 
> The latter relation is used only once, to provide the guarantee
> defined in rcupdate.h#L1002 by smp_mb__after_unlock_lock(), which
> makes any UNLOCK+LOCK pair followed by the fence behave like a full
> barrier.
> 
> This patch drops this use in favor of using po-unlock-lock-po
> everywhere, which unifies the way the model talks about UNLOCK+LOCK
> pairings.  At first glance this seems to weaken the guarantee given
> by LKMM: When considering a long sequence of lock handovers
> such as below, where P0 hands the lock to P1, which hands it to P2,
> which finally executes such an after_unlock_lock fence, the mb
> relation currently links any stores in the critical section of P0
> to instructions P2 executes after its fence, but not so after the
> patch.
> 
> P0(int *x, int *y, spinlock_t *mylock)
> {
>         spin_lock(mylock);
>         WRITE_ONCE(*x, 2);
>         spin_unlock(mylock);
>         WRITE_ONCE(*y, 1);
> }
> 
> P1(int *y, int *z, spinlock_t *mylock)
> {
>         int r0 = READ_ONCE(*y); // reads 1
>         spin_lock(mylock);
>         spin_unlock(mylock);
>         WRITE_ONCE(*z,1);
> }
> 
> P2(int *z, int *d, spinlock_t *mylock)
> {
>         int r1 = READ_ONCE(*z); // reads 1
>         spin_lock(mylock);
>         spin_unlock(mylock);
>         smp_mb__after_unlock_lock();
>         WRITE_ONCE(*d,1);
> }
> 
> P3(int *x, int *d)
> {
>         WRITE_ONCE(*d,2);
>         smp_mb();
>         WRITE_ONCE(*x,1);
> }
> 
> exists (1:r0=1 /\ 2:r1=1 /\ x=2 /\ d=2)
> 
> Nevertheless, the ordering guarantee given in rcupdate.h is actually
> not weakened.  This is because the unlock operations along the
> sequence of handovers are A-cumulative fences.  They ensure that any
> stores that propagate to the CPU performing the first unlock
> operation in the sequence must also propagate to every CPU that
> performs a subsequent lock operation in the sequence.  Therefore any
> such stores will also be ordered correctly by the fence even if only
> the final handover is considered a full barrier.
> 
> Indeed this patch does not affect the behaviors allowed by LKMM at
> all.  The mb relation is used to define ordering through:
> 1) mb/.../ppo/hb, where the ordering is subsumed by hb+ where the
>    lock-release, rfe, and unlock-acquire orderings each provide hb
> 2) mb/strong-fence/cumul-fence/prop, where the rfe and A-cumulative
>    lock-release orderings simply add more fine-grained cumul-fence
>    edges to substitute a single strong-fence edge provided by a long
>    lock handover sequence
> 3) mb/strong-fence/pb and various similar uses in the definition of
>    data races, where as discussed above any long handover sequence
>    can be turned into a sequence of cumul-fence edges that provide
>    the same ordering.
> 
> Signed-off-by: Jonas Oberhauser <jonas.oberhauser@huaweicloud.com>
> Reviewed-by: Alan Stern <stern@rowland.harvard.edu>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Looks like after-unlock-lock has just won the single fattest inline comment
in linux-kernel.cat.  :-)

Acked-by: Andrea Parri <parri.andrea@gmail.com>

  Andrea


> ---
>  tools/memory-model/linux-kernel.cat | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/memory-model/linux-kernel.cat b/tools/memory-model/linux-kernel.cat
> index 07f884f9b2bf..6e531457bb73 100644
> --- a/tools/memory-model/linux-kernel.cat
> +++ b/tools/memory-model/linux-kernel.cat
> @@ -37,8 +37,19 @@ let mb = ([M] ; fencerel(Mb) ; [M]) |
>  	([M] ; fencerel(Before-atomic) ; [RMW] ; po? ; [M]) |
>  	([M] ; po? ; [RMW] ; fencerel(After-atomic) ; [M]) |
>  	([M] ; po? ; [LKW] ; fencerel(After-spinlock) ; [M]) |
> -	([M] ; po ; [UL] ; (co | po) ; [LKW] ;
> -		fencerel(After-unlock-lock) ; [M])
> +(*
> + * Note: The po-unlock-lock-po relation only passes the lock to the direct
> + * successor, perhaps giving the impression that the ordering of the
> + * smp_mb__after_unlock_lock() fence only affects a single lock handover.
> + * However, in a longer sequence of lock handovers, the implicit
> + * A-cumulative release fences of lock-release ensure that any stores that
> + * propagate to one of the involved CPUs before it hands over the lock to
> + * the next CPU will also propagate to the final CPU handing over the lock
> + * to the CPU that executes the fence.  Therefore, all those stores are
> + * also affected by the fence.
> + *)
> +	([M] ; po-unlock-lock-po ;
> +		[After-unlock-lock] ; po ; [M])
>  let gp = po ; [Sync-rcu | Sync-srcu] ; po?
>  let strong-fence = mb | gp
>  
> -- 
> 2.40.0.rc2
>
Re: [PATCH memory-model 2/8] tools/memory-model: Unify UNLOCK+LOCK pairings to po-unlock-lock-po
Posted by Paul E. McKenney 2 years, 10 months ago
On Wed, Mar 22, 2023 at 01:59:00AM +0100, Andrea Parri wrote:
> On Mon, Mar 20, 2023 at 06:02:40PM -0700, Paul E. McKenney wrote:
> > From: Jonas Oberhauser <jonas.oberhauser@huaweicloud.com>
> > 
> > LKMM uses two relations for talking about UNLOCK+LOCK pairings:
> > 
> > 	1) po-unlock-lock-po, which handles UNLOCK+LOCK pairings
> > 	   on the same CPU or immediate lock handovers on the same
> > 	   lock variable
> > 
> > 	2) po;[UL];(co|po);[LKW];po, which handles UNLOCK+LOCK pairs
> > 	   literally as described in rcupdate.h#L1002, i.e., even
> > 	   after a sequence of handovers on the same lock variable.
> > 
> > The latter relation is used only once, to provide the guarantee
> > defined in rcupdate.h#L1002 by smp_mb__after_unlock_lock(), which
> > makes any UNLOCK+LOCK pair followed by the fence behave like a full
> > barrier.
> > 
> > This patch drops this use in favor of using po-unlock-lock-po
> > everywhere, which unifies the way the model talks about UNLOCK+LOCK
> > pairings.  At first glance this seems to weaken the guarantee given
> > by LKMM: When considering a long sequence of lock handovers
> > such as below, where P0 hands the lock to P1, which hands it to P2,
> > which finally executes such an after_unlock_lock fence, the mb
> > relation currently links any stores in the critical section of P0
> > to instructions P2 executes after its fence, but not so after the
> > patch.
> > 
> > P0(int *x, int *y, spinlock_t *mylock)
> > {
> >         spin_lock(mylock);
> >         WRITE_ONCE(*x, 2);
> >         spin_unlock(mylock);
> >         WRITE_ONCE(*y, 1);
> > }
> > 
> > P1(int *y, int *z, spinlock_t *mylock)
> > {
> >         int r0 = READ_ONCE(*y); // reads 1
> >         spin_lock(mylock);
> >         spin_unlock(mylock);
> >         WRITE_ONCE(*z,1);
> > }
> > 
> > P2(int *z, int *d, spinlock_t *mylock)
> > {
> >         int r1 = READ_ONCE(*z); // reads 1
> >         spin_lock(mylock);
> >         spin_unlock(mylock);
> >         smp_mb__after_unlock_lock();
> >         WRITE_ONCE(*d,1);
> > }
> > 
> > P3(int *x, int *d)
> > {
> >         WRITE_ONCE(*d,2);
> >         smp_mb();
> >         WRITE_ONCE(*x,1);
> > }
> > 
> > exists (1:r0=1 /\ 2:r1=1 /\ x=2 /\ d=2)
> > 
> > Nevertheless, the ordering guarantee given in rcupdate.h is actually
> > not weakened.  This is because the unlock operations along the
> > sequence of handovers are A-cumulative fences.  They ensure that any
> > stores that propagate to the CPU performing the first unlock
> > operation in the sequence must also propagate to every CPU that
> > performs a subsequent lock operation in the sequence.  Therefore any
> > such stores will also be ordered correctly by the fence even if only
> > the final handover is considered a full barrier.
> > 
> > Indeed this patch does not affect the behaviors allowed by LKMM at
> > all.  The mb relation is used to define ordering through:
> > 1) mb/.../ppo/hb, where the ordering is subsumed by hb+ where the
> >    lock-release, rfe, and unlock-acquire orderings each provide hb
> > 2) mb/strong-fence/cumul-fence/prop, where the rfe and A-cumulative
> >    lock-release orderings simply add more fine-grained cumul-fence
> >    edges to substitute a single strong-fence edge provided by a long
> >    lock handover sequence
> > 3) mb/strong-fence/pb and various similar uses in the definition of
> >    data races, where as discussed above any long handover sequence
> >    can be turned into a sequence of cumul-fence edges that provide
> >    the same ordering.
> > 
> > Signed-off-by: Jonas Oberhauser <jonas.oberhauser@huaweicloud.com>
> > Reviewed-by: Alan Stern <stern@rowland.harvard.edu>
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> Looks like after-unlock-lock has just won the single fattest inline comment
> in linux-kernel.cat.  :-)
> 
> Acked-by: Andrea Parri <parri.andrea@gmail.com>

Thank you!  I will apply these tags (1, 2, and 4) on my next rebase.

							Thanx, Paul

>   Andrea
> 
> 
> > ---
> >  tools/memory-model/linux-kernel.cat | 15 +++++++++++++--
> >  1 file changed, 13 insertions(+), 2 deletions(-)
> > 
> > diff --git a/tools/memory-model/linux-kernel.cat b/tools/memory-model/linux-kernel.cat
> > index 07f884f9b2bf..6e531457bb73 100644
> > --- a/tools/memory-model/linux-kernel.cat
> > +++ b/tools/memory-model/linux-kernel.cat
> > @@ -37,8 +37,19 @@ let mb = ([M] ; fencerel(Mb) ; [M]) |
> >  	([M] ; fencerel(Before-atomic) ; [RMW] ; po? ; [M]) |
> >  	([M] ; po? ; [RMW] ; fencerel(After-atomic) ; [M]) |
> >  	([M] ; po? ; [LKW] ; fencerel(After-spinlock) ; [M]) |
> > -	([M] ; po ; [UL] ; (co | po) ; [LKW] ;
> > -		fencerel(After-unlock-lock) ; [M])
> > +(*
> > + * Note: The po-unlock-lock-po relation only passes the lock to the direct
> > + * successor, perhaps giving the impression that the ordering of the
> > + * smp_mb__after_unlock_lock() fence only affects a single lock handover.
> > + * However, in a longer sequence of lock handovers, the implicit
> > + * A-cumulative release fences of lock-release ensure that any stores that
> > + * propagate to one of the involved CPUs before it hands over the lock to
> > + * the next CPU will also propagate to the final CPU handing over the lock
> > + * to the CPU that executes the fence.  Therefore, all those stores are
> > + * also affected by the fence.
> > + *)
> > +	([M] ; po-unlock-lock-po ;
> > +		[After-unlock-lock] ; po ; [M])
> >  let gp = po ; [Sync-rcu | Sync-srcu] ; po?
> >  let strong-fence = mb | gp
> >  
> > -- 
> > 2.40.0.rc2
> >