From nobody Tue Apr  7 03:54:23 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1CD3DECAAD3
	for <linux-kernel@archiver.kernel.org>; Wed, 31 Aug 2022 18:29:03 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231921AbiHaS3A (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 31 Aug 2022 14:29:00 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57174 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232911AbiHaS2P (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 31 Aug 2022 14:28:15 -0400
Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D6B89EE6AB;
        Wed, 31 Aug 2022 11:23:56 -0700 (PDT)
Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140])
        (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
        (No client certificate requested)
        by dfw.source.kernel.org (Postfix) with ESMTPS id 65FEB61C83;
        Wed, 31 Aug 2022 18:23:56 +0000 (UTC)
Received: by smtp.kernel.org (Postfix) with ESMTPSA id C465AC433D7;
        Wed, 31 Aug 2022 18:23:55 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
        s=k20201202; t=1661970235;
        bh=lHhxup+T/vsXylD61VLnUX6McJ5xx8+08HVRL7q6+SY=;
        h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
        b=i44jprEdrwCDOQ4HaAiFRPbFO0EB4Jror1lne1TpsvwYIq147LsV+Ju7LRqZOjCmN
         wK9Y6VZapgPhPfmN4VmZVBvwhbVm82SpGCq6U0IjAlzlcSaOg92jTfo2adKxkZDeUW
         YxIbDXxwpaaBxikWwgSttHYntoRUCCAmhFHWiL3QC/s07ATvy/9pmHE3dBJJAEYmva
         +HHHacgWIIlF5owfYJ9nid21LdW54aEwjuBqKPitJWDIk5ZiDTq8ww1nbmQabywl0m
         2qJJNAFAETuqj4JBsDFltFMj8hegpSYoJZnmfgFKo16CeitGJidMHtMQyCB2WQj5Kb
         u2XdTPhLgc6Eg==
Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000)
        id 7D0565C015D; Wed, 31 Aug 2022 11:23:55 -0700 (PDT)
From: "Paul E. McKenney" <paulmck@kernel.org>
To: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
        kernel-team@fb.com, mingo@kernel.org
Cc: stern@rowland.harvard.edu, parri.andrea@gmail.com, will@kernel.org,
        peterz@infradead.org, boqun.feng@gmail.com, npiggin@gmail.com,
        dhowells@redhat.com, j.alglave@ucl.ac.uk, luc.maranget@inria.fr,
        akiyks@gmail.com, "Michael S. Tsirkin" <mst@redhat.com>,
        "Paul E. McKenney" <paulmck@kernel.org>,
        Daniel Lustig <dlustig@nvidia.com>,
        Joel Fernandes <joel@joelfernandes.org>,
        Jonathan Corbet <corbet@lwn.net>
Subject: [PATCH memory-model 1/3] docs/memory-barriers.txt: Fix confusing name
 of 'data dependency barrier'
Date: Wed, 31 Aug 2022 11:23:51 -0700
Message-Id: <20220831182353.2699262-1-paulmck@kernel.org>
X-Mailer: git-send-email 2.31.1.189.g2e36527f23
In-Reply-To: <20220831182350.GA2698943@paulmck-ThinkPad-P17-Gen-1>
References: <20220831182350.GA2698943@paulmck-ThinkPad-P17-Gen-1>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Akira Yokosawa <akiyks@gmail.com>

The term "data dependency barrier", which has been in
memory-barriers.txt ever since it was first authored by David Howells,
has become confusing due to the fact that in LKMM's explanations.txt
and elsewhere, "data dependency" is used mostly for load-to-store data
dependency.

To prevent further confusions, do the changes listed below:

  - substitute "data dependency barrier" with "address-dependency
    barrier";
  - add note on the removal of kernel APIs for explicit address-
    dependency barriers in kernel release v5.9;
  - note that address-dependency barriers are not necessary for
    load-to-store situations;
  - use READ_ONCE_OLD() for pre-4.15 READ_ONCE() (no implicit address-
    dependency barrier);
  - fix count of kernel memory barrier APIs;
  - and a few more context adjustments.

Note: Cleanups of long lines are deferred to a followup patch.

Reported-by: "Michael S. Tsirkin" <mst@redhat.com>
Link: https://lore.kernel.org/r/20211011064233-mutt-send-email-mst@kernel.o=
rg/
Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Daniel Lustig <dlustig@nvidia.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 Documentation/memory-barriers.txt | 116 ++++++++++++++++--------------
 1 file changed, 64 insertions(+), 52 deletions(-)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barri=
ers.txt
index 832b5d36e279c..b16767cb6d31d 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -52,7 +52,7 @@ CONTENTS
=20
      - Varieties of memory barrier.
      - What may not be assumed about memory barriers?
-     - Data dependency barriers (historical).
+     - Address-dependency barriers (historical).
      - Control dependencies.
      - SMP barrier pairing.
      - Examples of memory barrier sequences.
@@ -187,7 +187,7 @@ As a further example, consider this sequence of events:
 	B =3D 4;		Q =3D P;
 	P =3D &B;		D =3D *Q;
=20
-There is an obvious data dependency here, as the value loaded into D depen=
ds on
+There is an obvious address dependency here, as the value loaded into D de=
pends on
 the address retrieved from P by CPU 2.  At the end of the sequence, any of=
 the
 following results are possible:
=20
@@ -391,49 +391,53 @@ Memory barriers come in four basic varieties:
      memory system as time progresses.  All stores _before_ a write barrier
      will occur _before_ all the stores after the write barrier.
=20
-     [!] Note that write barriers should normally be paired with read or d=
ata
-     dependency barriers; see the "SMP barrier pairing" subsection.
+     [!] Note that write barriers should normally be paired with read or
+     address-dependency barriers; see the "SMP barrier pairing" subsection.
=20
=20
- (2) Data dependency barriers.
+ (2) Address-dependency barriers (historical).
=20
-     A data dependency barrier is a weaker form of read barrier.  In the c=
ase
+     An address-dependency barrier is a weaker form of read barrier.  In t=
he case
      where two loads are performed such that the second depends on the res=
ult
      of the first (eg: the first load retrieves the address to which the s=
econd
-     load will be directed), a data dependency barrier would be required to
+     load will be directed), an address-dependency barrier would be requir=
ed to
      make sure that the target of the second load is updated after the add=
ress
      obtained by the first load is accessed.
=20
-     A data dependency barrier is a partial ordering on interdependent loa=
ds
+     An address-dependency barrier is a partial ordering on interdependent=
 loads
      only; it is not required to have any effect on stores, independent lo=
ads
      or overlapping loads.
=20
      As mentioned in (1), the other CPUs in the system can be viewed as
      committing sequences of stores to the memory system that the CPU being
-     considered can then perceive.  A data dependency barrier issued by th=
e CPU
+     considered can then perceive.  An address-dependency barrier issued b=
y the CPU
      under consideration guarantees that for any load preceding it, if that
      load touches one of a sequence of stores from another CPU, then by the
      time the barrier completes, the effects of all the stores prior to th=
at
-     touched by the load will be perceptible to any loads issued after the=
 data
+     touched by the load will be perceptible to any loads issued after the=
 address-
      dependency barrier.
=20
      See the "Examples of memory barrier sequences" subsection for diagrams
      showing the ordering constraints.
=20
-     [!] Note that the first load really has to have a _data_ dependency a=
nd
+     [!] Note that the first load really has to have an _address_ dependen=
cy and
      not a control dependency.  If the address for the second load is depe=
ndent
      on the first load, but the dependency is through a conditional rather=
 than
      actually loading the address itself, then it's a _control_ dependency=
 and
      a full read barrier or better is required.  See the "Control dependen=
cies"
      subsection for more information.
=20
-     [!] Note that data dependency barriers should normally be paired with
+     [!] Note that address-dependency barriers should normally be paired w=
ith
      write barriers; see the "SMP barrier pairing" subsection.
=20
+     [!] Kernel release v5.9 removed kernel APIs for explicit address-
+     dependency barriers.  Nowadays, APIs for marking loads from shared
+     variables such as READ_ONCE() and rcu_dereference() provide implicit
+     address-dependency barriers.
=20
  (3) Read (or load) memory barriers.
=20
-     A read barrier is a data dependency barrier plus a guarantee that all=
 the
+     A read barrier is an address-dependency barrier plus a guarantee that=
 all the
      LOAD operations specified before the barrier will appear to happen be=
fore
      all the LOAD operations specified after the barrier with respect to t=
he
      other components of the system.
@@ -441,7 +445,7 @@ Memory barriers come in four basic varieties:
      A read barrier is a partial ordering on loads only; it is not require=
d to
      have any effect on stores.
=20
-     Read memory barriers imply data dependency barriers, and so can subst=
itute
+     Read memory barriers imply address-dependency barriers, and so can su=
bstitute
      for them.
=20
      [!] Note that read barriers should normally be paired with write barr=
iers;
@@ -550,17 +554,21 @@ There are certain things that the Linux kernel memory=
 barriers do not guarantee:
 	    Documentation/core-api/dma-api.rst
=20
=20
-DATA DEPENDENCY BARRIERS (HISTORICAL)
--------------------------------------
+ADDRESS-DEPENDENCY BARRIERS (HISTORICAL)
+----------------------------------------
=20
 As of v4.15 of the Linux kernel, an smp_mb() was added to READ_ONCE() for
 DEC Alpha, which means that about the only people who need to pay attention
 to this section are those working on DEC Alpha architecture-specific code
 and those working on READ_ONCE() itself.  For those who need it, and for
 those who are interested in the history, here is the story of
-data-dependency barriers.
+address-dependency barriers.
+
+[!] While address dependencies are observed in both load-to-load and
+load-to-store relations, address-dependency barriers are not necessary
+for load-to-store situations.
=20
-The usage requirements of data dependency barriers are a little subtle, and
+The requirement of address-dependency barriers is a little subtle, and
 it's not always obvious that they're needed.  To illustrate, consider the
 following sequence of events:
=20
@@ -570,10 +578,13 @@ following sequence of events:
 	B =3D 4;
 	<write barrier>
 	WRITE_ONCE(P, &B);
-			      Q =3D READ_ONCE(P);
+			      Q =3D READ_ONCE_OLD(P);
 			      D =3D *Q;
=20
-There's a clear data dependency here, and it would seem that by the end of=
 the
+[!] READ_ONCE_OLD() corresponds to READ_ONCE() of pre-4.15 kernel, which
+doesn't imply an address-dependency barrier.
+
+There's a clear address dependency here, and it would seem that by the end=
 of the
 sequence, Q must be either &A or &B, and that:
=20
 	(Q =3D=3D &A) implies (D =3D=3D 1)
@@ -588,8 +599,8 @@ While this may seem like a failure of coherency or caus=
ality maintenance, it
 isn't, and this behaviour can be observed on certain real CPUs (such as th=
e DEC
 Alpha).
=20
-To deal with this, a data dependency barrier or better must be inserted
-between the address load and the data load:
+To deal with this, READ_ONCE() provides an implicit address-dependency
+barrier since kernel release v4.15:
=20
 	CPU 1		      CPU 2
 	=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D	      =3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D
@@ -598,7 +609,7 @@ between the address load and the data load:
 	<write barrier>
 	WRITE_ONCE(P, &B);
 			      Q =3D READ_ONCE(P);
-			      <data dependency barrier>
+			      <implicit address-dependency barrier>
 			      D =3D *Q;
=20
 This enforces the occurrence of one of the two implications, and prevents =
the
@@ -615,7 +626,7 @@ odd-numbered bank is idle, one can see the new value of=
 the pointer P (&B),
 but the old value of the variable B (2).
=20
=20
-A data-dependency barrier is not required to order dependent writes
+An address-dependency barrier is not required to order dependent writes
 because the CPUs that the Linux kernel supports don't do writes
 until they are certain (1) that the write will actually happen, (2)
 of the location of the write, and (3) of the value to be written.
@@ -629,12 +640,12 @@ break dependencies in a great many highly creative wa=
ys.
 	B =3D 4;
 	<write barrier>
 	WRITE_ONCE(P, &B);
-			      Q =3D READ_ONCE(P);
+			      Q =3D READ_ONCE_OLD(P);
 			      WRITE_ONCE(*Q, 5);
=20
-Therefore, no data-dependency barrier is required to order the read into
+Therefore, no address-dependency barrier is required to order the read into
 Q with the store into *Q.  In other words, this outcome is prohibited,
-even without a data-dependency barrier:
+even without an implicit address-dependency barrier of modern READ_ONCE():
=20
 	(Q =3D=3D &B) && (B =3D=3D 4)
=20
@@ -645,12 +656,12 @@ can be used to record rare error conditions and the l=
ike, and the CPUs'
 naturally occurring ordering prevents such records from being lost.
=20
=20
-Note well that the ordering provided by a data dependency is local to
+Note well that the ordering provided by an address dependency is local to
 the CPU containing it.  See the section on "Multicopy atomicity" for
 more information.
=20
=20
-The data dependency barrier is very important to the RCU system,
+The address-dependency barrier is very important to the RCU system,
 for example.  See rcu_assign_pointer() and rcu_dereference() in
 include/linux/rcupdate.h.  This permits the current target of an RCU'd
 pointer to be replaced with a new modified target, without the replacement
@@ -667,16 +678,17 @@ not understand them.  The purpose of this section is =
to help you prevent
 the compiler's ignorance from breaking your code.
=20
 A load-load control dependency requires a full read memory barrier, not
-simply a data dependency barrier to make it work correctly.  Consider the
+simply an (implicit) address-dependency barrier to make it work correctly.=
  Consider the
 following bit of code:
=20
 	q =3D READ_ONCE(a);
+	<implicit address-dependency barrier>
 	if (q) {
-		<data dependency barrier>  /* BUG: No data dependency!!! */
+		/* BUG: No address dependency!!! */
 		p =3D READ_ONCE(b);
 	}
=20
-This will not have the desired effect because there is no actual data
+This will not have the desired effect because there is no actual address
 dependency, but rather a control dependency that the CPU may short-circuit
 by attempting to predict the outcome in advance, so that other CPUs see
 the load from b as having happened before the load from a.  In such a
@@ -927,9 +939,9 @@ General barriers pair with each other, though they also=
 pair with most
 other types of barriers, albeit without multicopy atomicity.  An acquire
 barrier pairs with a release barrier, but both may also pair with other
 barriers, including of course general barriers.  A write barrier pairs
-with a data dependency barrier, a control dependency, an acquire barrier,
+with an address-dependency barrier, a control dependency, an acquire barri=
er,
 a release barrier, a read barrier, or a general barrier.  Similarly a
-read barrier, control dependency, or a data dependency barrier pairs
+read barrier, control dependency, or an address-dependency barrier pairs
 with a write barrier, an acquire barrier, a release barrier, or a
 general barrier:
=20
@@ -948,7 +960,7 @@ Or:
 	a =3D 1;
 	<write barrier>
 	WRITE_ONCE(b, &a);    x =3D READ_ONCE(b);
-			      <data dependency barrier>
+			      <implicit address-dependency barrier>
 			      y =3D *x;
=20
 Or even:
@@ -968,7 +980,7 @@ Basically, the read barrier always has to be there, eve=
n though it can be of
 the "weaker" type.
=20
 [!] Note that the stores before the write barrier would normally be expect=
ed to
-match the loads after the read barrier or the data dependency barrier, and=
 vice
+match the loads after the read barrier or the address-dependency barrier, =
and vice
 versa:
=20
 	CPU 1                               CPU 2
@@ -1021,7 +1033,7 @@ STORE B, STORE C } all occurring before the unordered=
 set of { STORE D, STORE E
 	                   V
=20
=20
-Secondly, data dependency barriers act as partial orderings on data-depend=
ent
+Secondly, address-dependency barriers act as partial orderings on address-=
dependent
 loads.  Consider the following sequence of events:
=20
 	CPU 1			CPU 2
@@ -1067,7 +1079,7 @@ effectively random order, despite the write barrier i=
ssued by CPU 1:
 In the above example, CPU 2 perceives that B is 7, despite the load of *C
 (which would be B) coming after the LOAD of C.
=20
-If, however, a data dependency barrier were to be placed between the load =
of C
+If, however, an address-dependency barrier were to be placed between the l=
oad of C
 and the load of *C (ie: B) on CPU 2:
=20
 	CPU 1			CPU 2
@@ -1078,7 +1090,7 @@ and the load of *C (ie: B) on CPU 2:
 	<write barrier>
 	STORE C =3D &B		LOAD X
 	STORE D =3D 4		LOAD C (gets &B)
-				<data dependency barrier>
+				<address-dependency barrier>
 				LOAD *C (reads B)
=20
 then the following will occur:
@@ -1101,7 +1113,7 @@ then the following will occur:
 	                               |        +-------+       |       |
 	                               |        | X->9  |------>|       |
 	                               |        +-------+       |       |
-	  Makes sure all effects --->   \   ddddddddddddddddd   |       |
+	  Makes sure all effects --->   \   aaaaaaaaaaaaaaaaa   |       |
 	  prior to the store of C        \      +-------+       |       |
 	  are perceptible to              ----->| B->2  |------>|       |
 	  subsequent loads                      +-------+       |       |
@@ -1292,7 +1304,7 @@ Which might appear as this:
 	LOAD with immediate effect              :       :       +-------+
=20
=20
-Placing a read barrier or a data dependency barrier just before the second
+Placing a read barrier or an address-dependency barrier just before the se=
cond
 load:
=20
 	CPU 1			CPU 2
@@ -1816,20 +1828,20 @@ which may then reorder things however it wishes.
 CPU MEMORY BARRIERS
 -------------------
=20
-The Linux kernel has eight basic CPU memory barriers:
+The Linux kernel has seven basic CPU memory barriers:
=20
-	TYPE		MANDATORY		SMP CONDITIONAL
-	=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D	=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D	=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
-	GENERAL		mb()			smp_mb()
-	WRITE		wmb()			smp_wmb()
-	READ		rmb()			smp_rmb()
-	DATA DEPENDENCY				READ_ONCE()
+	TYPE			MANDATORY	SMP CONDITIONAL
+	=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D	=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D	=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D
+	GENERAL			mb()		smp_mb()
+	WRITE			wmb()		smp_wmb()
+	READ			rmb()		smp_rmb()
+	ADDRESS DEPENDENCY			READ_ONCE()
=20
=20
-All memory barriers except the data dependency barriers imply a compiler
-barrier.  Data dependencies do not impose any additional compiler ordering.
+All memory barriers except the address-dependency barriers imply a compiler
+barrier.  Address dependencies do not impose any additional compiler order=
ing.
=20
-Aside: In the case of data dependencies, the compiler would be expected
+Aside: In the case of address dependencies, the compiler would be expected
 to issue the loads in the correct order (eg. `a[b]` would have to load
 the value of b before loading a[b]), however there is no guarantee in
 the C specification that the compiler may not speculate the value of b
@@ -2889,7 +2901,7 @@ AND THEN THERE'S THE ALPHA
 The DEC Alpha CPU is one of the most relaxed CPUs there is.  Not only that,
 some versions of the Alpha CPU have a split data cache, permitting them to=
 have
 two semantically-related cache lines updated at separate times.  This is w=
here
-the data dependency barrier really becomes necessary as this synchronises =
both
+the address-dependency barrier really becomes necessary as this synchronis=
es both
 caches with the memory coherence system, thus making it seem like pointer
 changes vs new data occur in the right order.
=20
--=20
2.31.1.189.g2e36527f23
From nobody Tue Apr  7 03:54:23 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A3909ECAAD1
	for <linux-kernel@archiver.kernel.org>; Wed, 31 Aug 2022 18:29:07 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232430AbiHaS3G (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 31 Aug 2022 14:29:06 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58122 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232866AbiHaS2P (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 31 Aug 2022 14:28:15 -0400
Received: from ams.source.kernel.org (ams.source.kernel.org
 [IPv6:2604:1380:4601:e00::1])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CB8856478;
        Wed, 31 Aug 2022 11:23:58 -0700 (PDT)
Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140])
        (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
        (No client certificate requested)
        by ams.source.kernel.org (Postfix) with ESMTPS id 35FC3B82274;
        Wed, 31 Aug 2022 18:23:57 +0000 (UTC)
Received: by smtp.kernel.org (Postfix) with ESMTPSA id CF846C433C1;
        Wed, 31 Aug 2022 18:23:55 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
        s=k20201202; t=1661970235;
        bh=xN5D2ShxxNPlDuwNTCWUnmxwkBxFPNBZHBLxsbMlkKI=;
        h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
        b=UEawr4TxzETNjqYk/WUPrJi36MXKNnvLL4s+Rf2+OXVHuaAgJf8FofJuf2Jpgx1iR
         XXvOA5futmG6pFxYWtEFZdHLX+tGy6ZjFncK7j6vu88a5WxSH+fPN4zil28FK/Srew
         hRx2Dq+2xY3oqjSTpuFZ91aYm0N+DToaEuPAbjrlS7A1/2vA11Cxa+EbIGKq0Vzomy
         fqnushRgVMBphTxbfJbAtkuqJOWY12ouI+0Uy8MU3ZGpbqeqLTZ3M8DGQngsyDrUQ+
         yVueg5RIqm/aj1gwgXmfRMRkYN6dR1I/UBoh6+AckiNWJyM191e5H2nFekj2ogcHYH
         08ce80hd5VfLQ==
Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000)
        id 800355C019C; Wed, 31 Aug 2022 11:23:55 -0700 (PDT)
From: "Paul E. McKenney" <paulmck@kernel.org>
To: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
        kernel-team@fb.com, mingo@kernel.org
Cc: stern@rowland.harvard.edu, parri.andrea@gmail.com, will@kernel.org,
        peterz@infradead.org, boqun.feng@gmail.com, npiggin@gmail.com,
        dhowells@redhat.com, j.alglave@ucl.ac.uk, luc.maranget@inria.fr,
        akiyks@gmail.com, "Paul E. McKenney" <paulmck@kernel.org>,
        Daniel Lustig <dlustig@nvidia.com>,
        Joel Fernandes <joel@joelfernandes.org>,
        "Michael S. Tsirkin" <mst@redhat.com>,
        Jonathan Corbet <corbet@lwn.net>
Subject: [PATCH memory-model 2/3] docs/memory-barriers.txt: Fixup long lines
Date: Wed, 31 Aug 2022 11:23:52 -0700
Message-Id: <20220831182353.2699262-2-paulmck@kernel.org>
X-Mailer: git-send-email 2.31.1.189.g2e36527f23
In-Reply-To: <20220831182350.GA2698943@paulmck-ThinkPad-P17-Gen-1>
References: <20220831182350.GA2698943@paulmck-ThinkPad-P17-Gen-1>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Akira Yokosawa <akiyks@gmail.com>

Substitution of "data dependency barrier" with "address-dependency
barrier" left quite a lot of lines exceeding 80 columns.

Reflow those lines as well as a few short ones not related to
the substitution.

No changes in documentation text.

Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Daniel Lustig <dlustig@nvidia.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 Documentation/memory-barriers.txt | 93 ++++++++++++++++---------------
 1 file changed, 47 insertions(+), 46 deletions(-)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barri=
ers.txt
index b16767cb6d31d..06f80e3785c5d 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -187,9 +187,9 @@ As a further example, consider this sequence of events:
 	B =3D 4;		Q =3D P;
 	P =3D &B;		D =3D *Q;
=20
-There is an obvious address dependency here, as the value loaded into D de=
pends on
-the address retrieved from P by CPU 2.  At the end of the sequence, any of=
 the
-following results are possible:
+There is an obvious address dependency here, as the value loaded into D de=
pends
+on the address retrieved from P by CPU 2.  At the end of the sequence, any=
 of
+the following results are possible:
=20
 	(Q =3D=3D &A) and (D =3D=3D 1)
 	(Q =3D=3D &B) and (D =3D=3D 2)
@@ -397,25 +397,25 @@ Memory barriers come in four basic varieties:
=20
  (2) Address-dependency barriers (historical).
=20
-     An address-dependency barrier is a weaker form of read barrier.  In t=
he case
-     where two loads are performed such that the second depends on the res=
ult
-     of the first (eg: the first load retrieves the address to which the s=
econd
-     load will be directed), an address-dependency barrier would be requir=
ed to
-     make sure that the target of the second load is updated after the add=
ress
-     obtained by the first load is accessed.
+     An address-dependency barrier is a weaker form of read barrier.  In t=
he
+     case where two loads are performed such that the second depends on the
+     result of the first (eg: the first load retrieves the address to which
+     the second load will be directed), an address-dependency barrier would
+     be required to make sure that the target of the second load is updated
+     after the address obtained by the first load is accessed.
=20
-     An address-dependency barrier is a partial ordering on interdependent=
 loads
-     only; it is not required to have any effect on stores, independent lo=
ads
-     or overlapping loads.
+     An address-dependency barrier is a partial ordering on interdependent
+     loads only; it is not required to have any effect on stores, independ=
ent
+     loads or overlapping loads.
=20
      As mentioned in (1), the other CPUs in the system can be viewed as
      committing sequences of stores to the memory system that the CPU being
-     considered can then perceive.  An address-dependency barrier issued b=
y the CPU
-     under consideration guarantees that for any load preceding it, if that
-     load touches one of a sequence of stores from another CPU, then by the
-     time the barrier completes, the effects of all the stores prior to th=
at
-     touched by the load will be perceptible to any loads issued after the=
 address-
-     dependency barrier.
+     considered can then perceive.  An address-dependency barrier issued by
+     the CPU under consideration guarantees that for any load preceding it,
+     if that load touches one of a sequence of stores from another CPU, th=
en
+     by the time the barrier completes, the effects of all the stores prio=
r to
+     that touched by the load will be perceptible to any loads issued after
+     the address-dependency barrier.
=20
      See the "Examples of memory barrier sequences" subsection for diagrams
      showing the ordering constraints.
@@ -437,16 +437,16 @@ Memory barriers come in four basic varieties:
=20
  (3) Read (or load) memory barriers.
=20
-     A read barrier is an address-dependency barrier plus a guarantee that=
 all the
-     LOAD operations specified before the barrier will appear to happen be=
fore
-     all the LOAD operations specified after the barrier with respect to t=
he
-     other components of the system.
+     A read barrier is an address-dependency barrier plus a guarantee that=
 all
+     the LOAD operations specified before the barrier will appear to happen
+     before all the LOAD operations specified after the barrier with respe=
ct to
+     the other components of the system.
=20
      A read barrier is a partial ordering on loads only; it is not require=
d to
      have any effect on stores.
=20
-     Read memory barriers imply address-dependency barriers, and so can su=
bstitute
-     for them.
+     Read memory barriers imply address-dependency barriers, and so can
+     substitute for them.
=20
      [!] Note that read barriers should normally be paired with write barr=
iers;
      see the "SMP barrier pairing" subsection.
@@ -584,8 +584,8 @@ following sequence of events:
 [!] READ_ONCE_OLD() corresponds to READ_ONCE() of pre-4.15 kernel, which
 doesn't imply an address-dependency barrier.
=20
-There's a clear address dependency here, and it would seem that by the end=
 of the
-sequence, Q must be either &A or &B, and that:
+There's a clear address dependency here, and it would seem that by the end=
 of
+the sequence, Q must be either &A or &B, and that:
=20
 	(Q =3D=3D &A) implies (D =3D=3D 1)
 	(Q =3D=3D &B) implies (D =3D=3D 4)
@@ -599,8 +599,8 @@ While this may seem like a failure of coherency or caus=
ality maintenance, it
 isn't, and this behaviour can be observed on certain real CPUs (such as th=
e DEC
 Alpha).
=20
-To deal with this, READ_ONCE() provides an implicit address-dependency
-barrier since kernel release v4.15:
+To deal with this, READ_ONCE() provides an implicit address-dependency bar=
rier
+since kernel release v4.15:
=20
 	CPU 1		      CPU 2
 	=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D	      =3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D
@@ -627,12 +627,12 @@ but the old value of the variable B (2).
=20
=20
 An address-dependency barrier is not required to order dependent writes
-because the CPUs that the Linux kernel supports don't do writes
-until they are certain (1) that the write will actually happen, (2)
-of the location of the write, and (3) of the value to be written.
+because the CPUs that the Linux kernel supports don't do writes until they
+are certain (1) that the write will actually happen, (2) of the location of
+the write, and (3) of the value to be written.
 But please carefully read the "CONTROL DEPENDENCIES" section and the
-Documentation/RCU/rcu_dereference.rst file:  The compiler can and does
-break dependencies in a great many highly creative ways.
+Documentation/RCU/rcu_dereference.rst file:  The compiler can and does bre=
ak
+dependencies in a great many highly creative ways.
=20
 	CPU 1		      CPU 2
 	=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D	      =3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D
@@ -678,8 +678,8 @@ not understand them.  The purpose of this section is to=
 help you prevent
 the compiler's ignorance from breaking your code.
=20
 A load-load control dependency requires a full read memory barrier, not
-simply an (implicit) address-dependency barrier to make it work correctly.=
  Consider the
-following bit of code:
+simply an (implicit) address-dependency barrier to make it work correctly.
+Consider the following bit of code:
=20
 	q =3D READ_ONCE(a);
 	<implicit address-dependency barrier>
@@ -691,8 +691,8 @@ following bit of code:
 This will not have the desired effect because there is no actual address
 dependency, but rather a control dependency that the CPU may short-circuit
 by attempting to predict the outcome in advance, so that other CPUs see
-the load from b as having happened before the load from a.  In such a
-case what's actually required is:
+the load from b as having happened before the load from a.  In such a case
+what's actually required is:
=20
 	q =3D READ_ONCE(a);
 	if (q) {
@@ -980,8 +980,8 @@ Basically, the read barrier always has to be there, eve=
n though it can be of
 the "weaker" type.
=20
 [!] Note that the stores before the write barrier would normally be expect=
ed to
-match the loads after the read barrier or the address-dependency barrier, =
and vice
-versa:
+match the loads after the read barrier or the address-dependency barrier, =
and
+vice versa:
=20
 	CPU 1                               CPU 2
 	=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D                =
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
@@ -1033,8 +1033,8 @@ STORE B, STORE C } all occurring before the unordered=
 set of { STORE D, STORE E
 	                   V
=20
=20
-Secondly, address-dependency barriers act as partial orderings on address-=
dependent
-loads.  Consider the following sequence of events:
+Secondly, address-dependency barriers act as partial orderings on address-
+dependent loads.  Consider the following sequence of events:
=20
 	CPU 1			CPU 2
 	=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D	=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
@@ -1079,8 +1079,8 @@ effectively random order, despite the write barrier i=
ssued by CPU 1:
 In the above example, CPU 2 perceives that B is 7, despite the load of *C
 (which would be B) coming after the LOAD of C.
=20
-If, however, an address-dependency barrier were to be placed between the l=
oad of C
-and the load of *C (ie: B) on CPU 2:
+If, however, an address-dependency barrier were to be placed between the l=
oad
+of C and the load of *C (ie: B) on CPU 2:
=20
 	CPU 1			CPU 2
 	=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D	=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
@@ -2761,7 +2761,8 @@ is discarded from the CPU's cache and reloaded.  To d=
eal with this, the
 appropriate part of the kernel must invalidate the overlapping bits of the
 cache on each CPU.
=20
-See Documentation/core-api/cachetlb.rst for more information on cache mana=
gement.
+See Documentation/core-api/cachetlb.rst for more information on cache
+management.
=20
=20
 CACHE COHERENCY VS MMIO
@@ -2901,8 +2902,8 @@ AND THEN THERE'S THE ALPHA
 The DEC Alpha CPU is one of the most relaxed CPUs there is.  Not only that,
 some versions of the Alpha CPU have a split data cache, permitting them to=
 have
 two semantically-related cache lines updated at separate times.  This is w=
here
-the address-dependency barrier really becomes necessary as this synchronis=
es both
-caches with the memory coherence system, thus making it seem like pointer
+the address-dependency barrier really becomes necessary as this synchronis=
es
+both caches with the memory coherence system, thus making it seem like poi=
nter
 changes vs new data occur in the right order.
=20
 The Alpha defines the Linux kernel's memory model, although as of v4.15
--=20
2.31.1.189.g2e36527f23
From nobody Tue Apr  7 03:54:23 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 994EEECAAD3
	for <linux-kernel@archiver.kernel.org>; Wed, 31 Aug 2022 18:28:58 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232177AbiHaS2z (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 31 Aug 2022 14:28:55 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57120 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231775AbiHaS2O (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 31 Aug 2022 14:28:14 -0400
Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D1421E9279;
        Wed, 31 Aug 2022 11:23:56 -0700 (PDT)
Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140])
        (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
        (No client certificate requested)
        by dfw.source.kernel.org (Postfix) with ESMTPS id 6C65861CD5;
        Wed, 31 Aug 2022 18:23:56 +0000 (UTC)
Received: by smtp.kernel.org (Postfix) with ESMTPSA id C3FF0C433D6;
        Wed, 31 Aug 2022 18:23:55 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
        s=k20201202; t=1661970235;
        bh=RH117eB+xIPb+d+OTRKvRd6566cj7SbPEdnpE0kcJI4=;
        h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
        b=rKzzifM2IzFzPoYqwuHyb7KUze9j1tsaAPHuoTM5xD6cjWD15Br+PtwVZfQc6ByxH
         4PtHUKu4Y3zj4gv587QhE1thcigp4sir8uM6jxn2Bwg2blw63ns2OIbwDoaqfY5Te7
         MdKx3BgKVNqihDrZeSwNjRbVJ/O/h7SqqjtJknUpVOyK+WEb5F2VE0A1g/cW09Nj0p
         sqrX5r1sbZ7GN/0HlqJe60P/lsCCCF7Ml+l2GmvHnmUiu+953wEApZN6r1EW0iD9rI
         VGlTWuQFtvHR4WFhNTre7ElWAJdTdT1em6coaOTilIn0BeMHfrMDEWegNT1JWlET2S
         m6vKv/RVFfnfw==
Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000)
        id 822DC5C02A9; Wed, 31 Aug 2022 11:23:55 -0700 (PDT)
From: "Paul E. McKenney" <paulmck@kernel.org>
To: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
        kernel-team@fb.com, mingo@kernel.org
Cc: stern@rowland.harvard.edu, parri.andrea@gmail.com, will@kernel.org,
        peterz@infradead.org, boqun.feng@gmail.com, npiggin@gmail.com,
        dhowells@redhat.com, j.alglave@ucl.ac.uk, luc.maranget@inria.fr,
        akiyks@gmail.com,
        =?UTF-8?q?Paul=20Heidekr=C3=BCger?= <paul.heidekrueger@in.tum.de>,
        Marco Elver <elver@google.com>,
        Joel Fernandes <joel@joelfernandes.org>,
        Charalampos Mainas <charalampos.mainas@gmail.com>,
        Pramod Bhatotia <pramod.bhatotia@in.tum.de>,
        Soham Chakraborty <s.s.chakraborty@tudelft.nl>,
        Martin Fink <martin.fink@in.tum.de>,
        "Paul E . McKenney" <paulmck@kernel.org>
Subject: [PATCH memory-model 3/3] tools/memory-model: Clarify LKMM's
 limitations in litmus-tests.txt
Date: Wed, 31 Aug 2022 11:23:53 -0700
Message-Id: <20220831182353.2699262-3-paulmck@kernel.org>
X-Mailer: git-send-email 2.31.1.189.g2e36527f23
In-Reply-To: <20220831182350.GA2698943@paulmck-ThinkPad-P17-Gen-1>
References: <20220831182350.GA2698943@paulmck-ThinkPad-P17-Gen-1>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

From: Paul Heidekr=C3=BCger <paul.heidekrueger@in.tum.de>

As discussed, clarify LKMM not recognizing certain kinds of orderings.
In particular, highlight the fact that LKMM might deliberately make
weaker guarantees than compilers and architectures.

[ paulmck: Fix whitespace issue noted by checkpatch.pl. ]

Link: https://lore.kernel.org/all/YpoW1deb%2FQeeszO1@ethstick13.dse.in.tum.=
de/T/#u
Co-developed-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Paul Heidekr=C3=BCger <paul.heidekrueger@in.tum.de>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Charalampos Mainas <charalampos.mainas@gmail.com>
Cc: Pramod Bhatotia <pramod.bhatotia@in.tum.de>
Cc: Soham Chakraborty <s.s.chakraborty@tudelft.nl>
Cc: Martin Fink <martin.fink@in.tum.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 .../Documentation/litmus-tests.txt            | 37 ++++++++++++++-----
 1 file changed, 27 insertions(+), 10 deletions(-)

diff --git a/tools/memory-model/Documentation/litmus-tests.txt b/tools/memo=
ry-model/Documentation/litmus-tests.txt
index 8a9d5d2787f9e..26554b1c5575e 100644
--- a/tools/memory-model/Documentation/litmus-tests.txt
+++ b/tools/memory-model/Documentation/litmus-tests.txt
@@ -946,22 +946,39 @@ Limitations of the Linux-kernel memory model (LKMM) i=
nclude:
 	carrying a dependency, then the compiler can break that dependency
 	by substituting a constant of that value.
=20
-	Conversely, LKMM sometimes doesn't recognize that a particular
-	optimization is not allowed, and as a result, thinks that a
-	dependency is not present (because the optimization would break it).
-	The memory model misses some pretty obvious control dependencies
-	because of this limitation.  A simple example is:
+	Conversely, LKMM will sometimes overestimate the amount of
+	reordering compilers and CPUs can carry out, leading it to miss
+	some pretty obvious cases of ordering.  A simple example is:
=20
 		r1 =3D READ_ONCE(x);
 		if (r1 =3D=3D 0)
 			smp_mb();
 		WRITE_ONCE(y, 1);
=20
-	There is a control dependency from the READ_ONCE to the WRITE_ONCE,
-	even when r1 is nonzero, but LKMM doesn't realize this and thinks
-	that the write may execute before the read if r1 !=3D 0.  (Yes, that
-	doesn't make sense if you think about it, but the memory model's
-	intelligence is limited.)
+	The WRITE_ONCE() does not depend on the READ_ONCE(), and as a
+	result, LKMM does not claim ordering.  However, even though no
+	dependency is present, the WRITE_ONCE() will not be executed before
+	the READ_ONCE().  There are two reasons for this:
+
+                The presence of the smp_mb() in one of the branches
+                prevents the compiler from moving the WRITE_ONCE()
+                up before the "if" statement, since the compiler has
+                to assume that r1 will sometimes be 0 (but see the
+                comment below);
+
+                CPUs do not execute stores before po-earlier conditional
+                branches, even in cases where the store occurs after the
+                two arms of the branch have recombined.
+
+	It is clear that it is not dangerous in the slightest for LKMM to
+	make weaker guarantees than architectures.  In fact, it is
+	desirable, as it gives compilers room for making optimizations.
+	For instance, suppose that a 0 value in r1 would trigger undefined
+	behavior elsewhere.  Then a clever compiler might deduce that r1
+	can never be 0 in the if condition.  As a result, said clever
+	compiler might deem it safe to optimize away the smp_mb(),
+	eliminating the branch and any ordering an architecture would
+	guarantee otherwise.
=20
 2.	Multiple access sizes for a single variable are not supported,
 	and neither are misaligned or partially overlapping accesses.
--=20
2.31.1.189.g2e36527f23