mm/vmscan.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
The PGDAT_RECLAIM_LOCKED bit is used to provide mutual exclusion of
node reclaim for struct pglist_data using a single bit.
It is "locked" with a test_and_set_bit (similarly to a try lock) which
provides full ordering with respect to loads and stores done within
__node_reclaim().
It is "unlocked" with clear_bit(), which does not provide any ordering
with respect to loads and stores done before clearing the bit.
The lack of clear_bit() memory ordering with respect to stores within
__node_reclaim() can cause a subsequent CPU to fail to observe stores
from a prior node reclaim. This is not an issue in practice on TSO (e.g.
x86), but it is an issue on weakly-ordered architectures (e.g. arm64).
Fix this with following changes:
A) Use clear_bit_unlock rather than clear_bit to clear PGDAT_RECLAIM_LOCKED
with a release memory ordering semantic.
This provides stronger memory ordering (release rather than relaxed).
B) Use test_and_set_bit_lock rather than test_and_set_bit to test-and-set
PGDAT_RECLAIM_LOCKED with an acquire memory ordering semantic.
This changes the "lock" acquisition from a full barrier to an acquire
memory ordering, which is weaker. The acquire semi-permeable barrier
paired with the release on unlock is sufficient for this mutual
exclusion use-case.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jade Alglave <j.alglave@ucl.ac.uk>
Cc: Luc Maranget <luc.maranget@inria.fr>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: linux-mm@kvack.org
---
mm/vmscan.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c22175120f5d..021b25bdba91 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -7567,11 +7567,11 @@ int node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned int order)
if (node_state(pgdat->node_id, N_CPU) && pgdat->node_id != numa_node_id())
return NODE_RECLAIM_NOSCAN;
- if (test_and_set_bit(PGDAT_RECLAIM_LOCKED, &pgdat->flags))
+ if (test_and_set_bit_lock(PGDAT_RECLAIM_LOCKED, &pgdat->flags))
return NODE_RECLAIM_NOSCAN;
ret = __node_reclaim(pgdat, gfp_mask, order);
- clear_bit(PGDAT_RECLAIM_LOCKED, &pgdat->flags);
+ clear_bit_unlock(PGDAT_RECLAIM_LOCKED, &pgdat->flags);
if (ret)
count_vm_event(PGSCAN_ZONE_RECLAIM_SUCCESS);
--
2.25.1
On Fri, Mar 07, 2025 at 02:30:47PM -0500, Mathieu Desnoyers wrote: > The PGDAT_RECLAIM_LOCKED bit is used to provide mutual exclusion of > node reclaim for struct pglist_data using a single bit. > > It is "locked" with a test_and_set_bit (similarly to a try lock) which > provides full ordering with respect to loads and stores done within > __node_reclaim(). > > It is "unlocked" with clear_bit(), which does not provide any ordering > with respect to loads and stores done before clearing the bit. > > The lack of clear_bit() memory ordering with respect to stores within > __node_reclaim() can cause a subsequent CPU to fail to observe stores > from a prior node reclaim. This is not an issue in practice on TSO (e.g. > x86), but it is an issue on weakly-ordered architectures (e.g. arm64). > > Fix this with following changes: > > A) Use clear_bit_unlock rather than clear_bit to clear PGDAT_RECLAIM_LOCKED > with a release memory ordering semantic. > > This provides stronger memory ordering (release rather than relaxed). > > B) Use test_and_set_bit_lock rather than test_and_set_bit to test-and-set > PGDAT_RECLAIM_LOCKED with an acquire memory ordering semantic. > > This changes the "lock" acquisition from a full barrier to an acquire > memory ordering, which is weaker. The acquire semi-permeable barrier > paired with the release on unlock is sufficient for this mutual > exclusion use-case. FWIW, this aligns with my understanding. Is (A) intended to be (submitted separately and) backported? Andrea > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > Cc: Matthew Wilcox <willy@infradead.org> > Cc: Alan Stern <stern@rowland.harvard.edu> > Cc: Andrea Parri <parri.andrea@gmail.com> > Cc: Will Deacon <will@kernel.org> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Boqun Feng <boqun.feng@gmail.com> > Cc: Nicholas Piggin <npiggin@gmail.com> > Cc: David Howells <dhowells@redhat.com> > Cc: Jade Alglave <j.alglave@ucl.ac.uk> > Cc: Luc Maranget <luc.maranget@inria.fr> > Cc: "Paul E. McKenney" <paulmck@kernel.org> > Cc: linux-mm@kvack.org > --- > mm/vmscan.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index c22175120f5d..021b25bdba91 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -7567,11 +7567,11 @@ int node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned int order) > if (node_state(pgdat->node_id, N_CPU) && pgdat->node_id != numa_node_id()) > return NODE_RECLAIM_NOSCAN; > > - if (test_and_set_bit(PGDAT_RECLAIM_LOCKED, &pgdat->flags)) > + if (test_and_set_bit_lock(PGDAT_RECLAIM_LOCKED, &pgdat->flags)) > return NODE_RECLAIM_NOSCAN; > > ret = __node_reclaim(pgdat, gfp_mask, order); > - clear_bit(PGDAT_RECLAIM_LOCKED, &pgdat->flags); > + clear_bit_unlock(PGDAT_RECLAIM_LOCKED, &pgdat->flags); > > if (ret) > count_vm_event(PGSCAN_ZONE_RECLAIM_SUCCESS); > -- > 2.25.1 >
© 2016 - 2026 Red Hat, Inc.