include/linux/refcount.h | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-)
On Wed, Jan 15, 2025 at 12:13:34PM +0100, Peter Zijlstra wrote:
> Notably, it means refcount_t is entirely unsuitable for anything
> SLAB_TYPESAFE_BY_RCU, since they all will need secondary validation
> conditions after the refcount succeeds.
>
> And this is probably fine, but let me ponder this all a little more.
Even though SLAB_TYPESAFE_BY_RCU is relatively rare, I think we'd better
fix this, these things are hard enough as they are.
Will, others, wdyt?
---
Subject: refcount: Strengthen inc_not_zero()
For speculative lookups where a successful inc_not_zero() pins the
object, but where we still need to double check if the object acquired
is indeed the one we set out to aquire, needs this validation to happen
*after* the increment.
Notably SLAB_TYPESAFE_BY_RCU is one such an example.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
include/linux/refcount.h | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/include/linux/refcount.h b/include/linux/refcount.h
index 35f039ecb272..340e7ffa445e 100644
--- a/include/linux/refcount.h
+++ b/include/linux/refcount.h
@@ -69,9 +69,10 @@
* its the lock acquire, for RCU/lockless data structures its the dependent
* load.
*
- * Do note that inc_not_zero() provides a control dependency which will order
- * future stores against the inc, this ensures we'll never modify the object
- * if we did not in fact acquire a reference.
+ * Do note that inc_not_zero() does provide acquire order, which will order
+ * future load and stores against the inc, this ensures all subsequent accesses
+ * are from this object and not anything previously occupying this memory --
+ * consider SLAB_TYPESAFE_BY_RCU.
*
* The decrements will provide release order, such that all the prior loads and
* stores will be issued before, it also provides a control dependency, which
@@ -144,7 +145,7 @@ bool __refcount_add_not_zero(int i, refcount_t *r, int *oldp)
do {
if (!old)
break;
- } while (!atomic_try_cmpxchg_relaxed(&r->refs, &old, old + i));
+ } while (!atomic_try_cmpxchg_acquire(&r->refs, &old, old + i));
if (oldp)
*oldp = old;
@@ -225,9 +226,9 @@ static inline __must_check bool __refcount_inc_not_zero(refcount_t *r, int *oldp
* Similar to atomic_inc_not_zero(), but will saturate at REFCOUNT_SATURATED
* and WARN.
*
- * Provides no memory ordering, it is assumed the caller has guaranteed the
- * object memory to be stable (RCU, etc.). It does provide a control dependency
- * and thereby orders future stores. See the comment on top.
+ * Provides acquire ordering, such that subsequent accesses are after the
+ * increment. This is important for the cases where secondary validation is
+ * required, eg. SLAB_TYPESAFE_BY_RCU.
*
* Return: true if the increment was successful, false otherwise
*/
On Wed, Jan 15, 2025 at 05:00:11PM +0100, Peter Zijlstra wrote:
> Subject: refcount: Strengthen inc_not_zero()
>
> For speculative lookups where a successful inc_not_zero() pins the
> object, but where we still need to double check if the object acquired
> is indeed the one we set out to aquire, needs this validation to happen
> *after* the increment.
>
> Notably SLAB_TYPESAFE_BY_RCU is one such an example.
While you're looking at inc_not_zero(), have you already thought about
doing something like this? ie failing rather than saturating since
all users of this already have to check for failure. It looks like
two comparisons per call rather than three.
diff --git a/include/linux/refcount.h b/include/linux/refcount.h
index 35f039ecb272..3ef7d316e870 100644
--- a/include/linux/refcount.h
+++ b/include/linux/refcount.h
@@ -142,16 +142,13 @@ bool __refcount_add_not_zero(int i, refcount_t *r, int *oldp)
int old = refcount_read(r);
do {
- if (!old)
+ if (old <= 0 || old + i < 0)
break;
} while (!atomic_try_cmpxchg_relaxed(&r->refs, &old, old + i));
if (oldp)
*oldp = old;
- if (unlikely(old < 0 || old + i < 0))
- refcount_warn_saturate(r, REFCOUNT_ADD_NOT_ZERO_OVF);
-
return old;
}
$ ./scripts/bloat-o-meter before.o after.o
add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-91 (-91)
Function old new delta
io_wq_for_each_worker.isra 162 158 -4
io_worker_handle_work 1403 1387 -16
io_wq_activate_free_worker 187 158 -29
io_queue_worker_create 367 325 -42
Total: Before=10068, After=9977, chg -0.90%
(that's io_uring/io-wq.o as an example user of refcount_inc_not_zero())
On Wed, Jan 15, 2025 at 05:00:11PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 15, 2025 at 12:13:34PM +0100, Peter Zijlstra wrote:
>
> > Notably, it means refcount_t is entirely unsuitable for anything
> > SLAB_TYPESAFE_BY_RCU, since they all will need secondary validation
> > conditions after the refcount succeeds.
> >
> > And this is probably fine, but let me ponder this all a little more.
>
> Even though SLAB_TYPESAFE_BY_RCU is relatively rare, I think we'd better
> fix this, these things are hard enough as they are.
>
> Will, others, wdyt?
We should also update the Documentation (atomic_t.txt and
refcount-vs-atomic.rst) if we strengthen this.
> ---
> Subject: refcount: Strengthen inc_not_zero()
>
> For speculative lookups where a successful inc_not_zero() pins the
> object, but where we still need to double check if the object acquired
> is indeed the one we set out to aquire, needs this validation to happen
> *after* the increment.
>
> Notably SLAB_TYPESAFE_BY_RCU is one such an example.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> include/linux/refcount.h | 15 ++++++++-------
> 1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/refcount.h b/include/linux/refcount.h
> index 35f039ecb272..340e7ffa445e 100644
> --- a/include/linux/refcount.h
> +++ b/include/linux/refcount.h
> @@ -69,9 +69,10 @@
> * its the lock acquire, for RCU/lockless data structures its the dependent
> * load.
> *
> - * Do note that inc_not_zero() provides a control dependency which will order
> - * future stores against the inc, this ensures we'll never modify the object
> - * if we did not in fact acquire a reference.
> + * Do note that inc_not_zero() does provide acquire order, which will order
> + * future load and stores against the inc, this ensures all subsequent accesses
> + * are from this object and not anything previously occupying this memory --
> + * consider SLAB_TYPESAFE_BY_RCU.
> *
> * The decrements will provide release order, such that all the prior loads and
> * stores will be issued before, it also provides a control dependency, which
> @@ -144,7 +145,7 @@ bool __refcount_add_not_zero(int i, refcount_t *r, int *oldp)
> do {
> if (!old)
> break;
> - } while (!atomic_try_cmpxchg_relaxed(&r->refs, &old, old + i));
> + } while (!atomic_try_cmpxchg_acquire(&r->refs, &old, old + i));
Hmm, do the later memory accesses need to be ordered against the store
part of the increment or just the read? If it's the former, then I don't
think that _acquire is sufficient -- accesses can still get in-between
the read and write parts of the RmW.
Will
On Fri, Jan 17, 2025 at 03:41:36PM +0000, Will Deacon wrote:
> On Wed, Jan 15, 2025 at 05:00:11PM +0100, Peter Zijlstra wrote:
> > On Wed, Jan 15, 2025 at 12:13:34PM +0100, Peter Zijlstra wrote:
> >
> > > Notably, it means refcount_t is entirely unsuitable for anything
> > > SLAB_TYPESAFE_BY_RCU, since they all will need secondary validation
> > > conditions after the refcount succeeds.
> > >
> > > And this is probably fine, but let me ponder this all a little more.
> >
> > Even though SLAB_TYPESAFE_BY_RCU is relatively rare, I think we'd better
> > fix this, these things are hard enough as they are.
> >
> > Will, others, wdyt?
>
> We should also update the Documentation (atomic_t.txt and
> refcount-vs-atomic.rst) if we strengthen this.
>
> > ---
> > Subject: refcount: Strengthen inc_not_zero()
> >
> > For speculative lookups where a successful inc_not_zero() pins the
> > object, but where we still need to double check if the object acquired
> > is indeed the one we set out to aquire, needs this validation to happen
> > *after* the increment.
> >
> > Notably SLAB_TYPESAFE_BY_RCU is one such an example.
> >
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> > include/linux/refcount.h | 15 ++++++++-------
> > 1 file changed, 8 insertions(+), 7 deletions(-)
> >
> > diff --git a/include/linux/refcount.h b/include/linux/refcount.h
> > index 35f039ecb272..340e7ffa445e 100644
> > --- a/include/linux/refcount.h
> > +++ b/include/linux/refcount.h
> > @@ -69,9 +69,10 @@
> > * its the lock acquire, for RCU/lockless data structures its the dependent
> > * load.
> > *
> > - * Do note that inc_not_zero() provides a control dependency which will order
> > - * future stores against the inc, this ensures we'll never modify the object
> > - * if we did not in fact acquire a reference.
> > + * Do note that inc_not_zero() does provide acquire order, which will order
> > + * future load and stores against the inc, this ensures all subsequent accesses
> > + * are from this object and not anything previously occupying this memory --
> > + * consider SLAB_TYPESAFE_BY_RCU.
> > *
> > * The decrements will provide release order, such that all the prior loads and
> > * stores will be issued before, it also provides a control dependency, which
> > @@ -144,7 +145,7 @@ bool __refcount_add_not_zero(int i, refcount_t *r, int *oldp)
> > do {
> > if (!old)
> > break;
> > - } while (!atomic_try_cmpxchg_relaxed(&r->refs, &old, old + i));
> > + } while (!atomic_try_cmpxchg_acquire(&r->refs, &old, old + i));
>
> Hmm, do the later memory accesses need to be ordered against the store
> part of the increment or just the read? If it's the former, then I don't
> think that _acquire is sufficient -- accesses can still get in-between
> the read and write parts of the RmW.
I dug some more into this at the end of last week. For the
SLAB_TYPESAFE_BY_RCU where we're racing inc_not_zero() with
dec_and_test(), then I think using _acquire() above is correct as the
later references can only move up into the critical section in the case
that we successfully obtained a reference.
However, if we're going to make the barriers implicit in the refcount
operations here then I think we also need to do something on the producer
side for when the object is re-initialised after being destroyed and
allocated again. I think that would necessitate release ordering for
refcount_set() so that whatever allows the consumer to validate the
object (e.g. sequence number) is published *before* the refcount.
Will
On Mon, Jan 27, 2025 at 6:09 AM Will Deacon <will@kernel.org> wrote:
>
> On Fri, Jan 17, 2025 at 03:41:36PM +0000, Will Deacon wrote:
> > On Wed, Jan 15, 2025 at 05:00:11PM +0100, Peter Zijlstra wrote:
> > > On Wed, Jan 15, 2025 at 12:13:34PM +0100, Peter Zijlstra wrote:
> > >
> > > > Notably, it means refcount_t is entirely unsuitable for anything
> > > > SLAB_TYPESAFE_BY_RCU, since they all will need secondary validation
> > > > conditions after the refcount succeeds.
> > > >
> > > > And this is probably fine, but let me ponder this all a little more.
> > >
> > > Even though SLAB_TYPESAFE_BY_RCU is relatively rare, I think we'd better
> > > fix this, these things are hard enough as they are.
> > >
> > > Will, others, wdyt?
> >
> > We should also update the Documentation (atomic_t.txt and
> > refcount-vs-atomic.rst) if we strengthen this.
> >
> > > ---
> > > Subject: refcount: Strengthen inc_not_zero()
> > >
> > > For speculative lookups where a successful inc_not_zero() pins the
> > > object, but where we still need to double check if the object acquired
> > > is indeed the one we set out to aquire, needs this validation to happen
> > > *after* the increment.
> > >
> > > Notably SLAB_TYPESAFE_BY_RCU is one such an example.
> > >
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > ---
> > > include/linux/refcount.h | 15 ++++++++-------
> > > 1 file changed, 8 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/include/linux/refcount.h b/include/linux/refcount.h
> > > index 35f039ecb272..340e7ffa445e 100644
> > > --- a/include/linux/refcount.h
> > > +++ b/include/linux/refcount.h
> > > @@ -69,9 +69,10 @@
> > > * its the lock acquire, for RCU/lockless data structures its the dependent
> > > * load.
> > > *
> > > - * Do note that inc_not_zero() provides a control dependency which will order
> > > - * future stores against the inc, this ensures we'll never modify the object
> > > - * if we did not in fact acquire a reference.
> > > + * Do note that inc_not_zero() does provide acquire order, which will order
> > > + * future load and stores against the inc, this ensures all subsequent accesses
> > > + * are from this object and not anything previously occupying this memory --
> > > + * consider SLAB_TYPESAFE_BY_RCU.
> > > *
> > > * The decrements will provide release order, such that all the prior loads and
> > > * stores will be issued before, it also provides a control dependency, which
> > > @@ -144,7 +145,7 @@ bool __refcount_add_not_zero(int i, refcount_t *r, int *oldp)
> > > do {
> > > if (!old)
> > > break;
> > > - } while (!atomic_try_cmpxchg_relaxed(&r->refs, &old, old + i));
> > > + } while (!atomic_try_cmpxchg_acquire(&r->refs, &old, old + i));
> >
> > Hmm, do the later memory accesses need to be ordered against the store
> > part of the increment or just the read? If it's the former, then I don't
> > think that _acquire is sufficient -- accesses can still get in-between
> > the read and write parts of the RmW.
>
> I dug some more into this at the end of last week. For the
> SLAB_TYPESAFE_BY_RCU where we're racing inc_not_zero() with
> dec_and_test(), then I think using _acquire() above is correct as the
> later references can only move up into the critical section in the case
> that we successfully obtained a reference.
>
> However, if we're going to make the barriers implicit in the refcount
> operations here then I think we also need to do something on the producer
> side for when the object is re-initialised after being destroyed and
> allocated again. I think that would necessitate release ordering for
> refcount_set() so that whatever allows the consumer to validate the
> object (e.g. sequence number) is published *before* the refcount.
Thanks Will!
I would like to expand on your answer to provide an example of the
race that would happen without release ordering in the producer. To
save reader's time I provide a simplified flow and reasoning first.
More detailed code of what I'm considering a typical
SLAB_TYPESAFE_BY_RCU refcounting example is added at the end of my
reply (Addendum).
Simplified flow looks like this:
consumer:
obj = lookup(collection, key);
if (!refcount_inc_not_zero(&obj->ref))
return;
smp_rmb(); /* Peter's new acquire fence */
if (READ_ONCE(obj->key) != key) {
put_ref(obj);
return;
}
use(obj->value);
producer:
old_key = obj->key;
remove(collection, old_key);
if (!refcount_dec_and_test(&obj->ref))
return;
obj->key = KEY_INVALID;
free(objj);
...
obj = malloc(); /* obj is reused */
obj->key = new_key;
obj->value = new_value;
smp_wmb(); /* Will's proposed release fence */
refcount_set(obj->ref, 1);
insert(collection, key, obj);
Let's consider a case when new_key == old_key. Will call both of them
"key". Without WIll's proposed fence the following reordering is
possible:
consumer:
obj = lookup(collection, key);
producer:
key = obj->key
remove(collection, key);
if (!refcount_dec_and_test(&obj->ref))
return;
obj->key = KEY_INVALID;
free(objj);
obj = malloc(); /* obj is reused */
refcount_set(obj->ref, 1);
obj->key = key; /* same key */
if (!refcount_inc_not_zero(&obj->ref))
return;
smp_rmb();
if (READ_ONCE(obj->key) != key) {
put_ref(obj);
return;
}
use(obj->value);
obj->value = new_value; /* reordered store */
add(collection, key, obj);
So, the consumer finds the old object, successfully takes a refcount
and validates the key. It succeeds because the object is allocated and
has the same key, which is fine. However it proceeds to use stale
obj->value. Will's proposed release ordering would prevent that.
The example in https://elixir.bootlin.com/linux/v6.12.6/source/include/linux/slab.h#L102
omits all these ordering issues for SLAB_TYPESAFE_BY_RCU.
I think it would be better to introduce two new functions:
refcount_add_not_zero_acquire() and refcount_set_release(), clearly
document that they should be used when a freed object can be recycled
and reused, like in SLAB_TYPESAFE_BY_RCU case. refcount_set_release()
should also clarify that once it's called, the object can be accessed
by consumers even if it was not added yet into the collection used for
object lookup (like in the example above). SLAB_TYPESAFE_BY_RCU
comment at https://elixir.bootlin.com/linux/v6.12.6/source/include/linux/slab.h#L102
then can explicitly use these new functions in the example code,
further clarifying their purpose and proper use.
WDYT?
ADDENDUM.
Detailed code for typical use of refcounting with SLAB_TYPESAFE_BY_RCU:
struct object {
refcount_t ref;
u64 key;
u64 value;
};
void init(struct object *obj, u64 key, u64 value)
{
obj->key = key;
obj->value = value;
smp_wmb(); /* Will's proposed release fence */
refcount_set(obj->ref, 1);
}
bool get_ref(struct object *obj, u64 key)
{
if (!refcount_inc_not_zero(&obj->ref))
return false;
smp_rmb(); /* Peter's new acquire fence */
if (READ_ONCE(obj->key) != key) {
put_ref(obj);
return false;
}
return true;
}
void put_ref(struct object *obj)
{
if (!refcount_dec_and_test(&obj->ref))
return;
obj->key = KEY_INVALID;
free(obj);
}
consumer:
obj = lookup(collection, key);
if (!get_ref(obj, key)
return;
use(obj->value);
producer:
remove(collection, old_obj->key);
put_ref(old_obj);
new_obj = malloc();
init(new_obj, new_key, new_value);
insert(collection, new_key, new_obj);
With SLAB_TYPESAFE_BY_RCU old_obj in the producer can be reused and be
equal to new_obj.
>
> Will
On Mon, Jan 27, 2025 at 11:21 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Mon, Jan 27, 2025 at 6:09 AM Will Deacon <will@kernel.org> wrote:
> >
> > On Fri, Jan 17, 2025 at 03:41:36PM +0000, Will Deacon wrote:
> > > On Wed, Jan 15, 2025 at 05:00:11PM +0100, Peter Zijlstra wrote:
> > > > On Wed, Jan 15, 2025 at 12:13:34PM +0100, Peter Zijlstra wrote:
> > > >
> > > > > Notably, it means refcount_t is entirely unsuitable for anything
> > > > > SLAB_TYPESAFE_BY_RCU, since they all will need secondary validation
> > > > > conditions after the refcount succeeds.
> > > > >
> > > > > And this is probably fine, but let me ponder this all a little more.
> > > >
> > > > Even though SLAB_TYPESAFE_BY_RCU is relatively rare, I think we'd better
> > > > fix this, these things are hard enough as they are.
> > > >
> > > > Will, others, wdyt?
> > >
> > > We should also update the Documentation (atomic_t.txt and
> > > refcount-vs-atomic.rst) if we strengthen this.
> > >
> > > > ---
> > > > Subject: refcount: Strengthen inc_not_zero()
> > > >
> > > > For speculative lookups where a successful inc_not_zero() pins the
> > > > object, but where we still need to double check if the object acquired
> > > > is indeed the one we set out to aquire, needs this validation to happen
> > > > *after* the increment.
> > > >
> > > > Notably SLAB_TYPESAFE_BY_RCU is one such an example.
> > > >
> > > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > > ---
> > > > include/linux/refcount.h | 15 ++++++++-------
> > > > 1 file changed, 8 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/include/linux/refcount.h b/include/linux/refcount.h
> > > > index 35f039ecb272..340e7ffa445e 100644
> > > > --- a/include/linux/refcount.h
> > > > +++ b/include/linux/refcount.h
> > > > @@ -69,9 +69,10 @@
> > > > * its the lock acquire, for RCU/lockless data structures its the dependent
> > > > * load.
> > > > *
> > > > - * Do note that inc_not_zero() provides a control dependency which will order
> > > > - * future stores against the inc, this ensures we'll never modify the object
> > > > - * if we did not in fact acquire a reference.
> > > > + * Do note that inc_not_zero() does provide acquire order, which will order
> > > > + * future load and stores against the inc, this ensures all subsequent accesses
> > > > + * are from this object and not anything previously occupying this memory --
> > > > + * consider SLAB_TYPESAFE_BY_RCU.
> > > > *
> > > > * The decrements will provide release order, such that all the prior loads and
> > > > * stores will be issued before, it also provides a control dependency, which
> > > > @@ -144,7 +145,7 @@ bool __refcount_add_not_zero(int i, refcount_t *r, int *oldp)
> > > > do {
> > > > if (!old)
> > > > break;
> > > > - } while (!atomic_try_cmpxchg_relaxed(&r->refs, &old, old + i));
> > > > + } while (!atomic_try_cmpxchg_acquire(&r->refs, &old, old + i));
> > >
> > > Hmm, do the later memory accesses need to be ordered against the store
> > > part of the increment or just the read? If it's the former, then I don't
> > > think that _acquire is sufficient -- accesses can still get in-between
> > > the read and write parts of the RmW.
> >
> > I dug some more into this at the end of last week. For the
> > SLAB_TYPESAFE_BY_RCU where we're racing inc_not_zero() with
> > dec_and_test(), then I think using _acquire() above is correct as the
> > later references can only move up into the critical section in the case
> > that we successfully obtained a reference.
> >
> > However, if we're going to make the barriers implicit in the refcount
> > operations here then I think we also need to do something on the producer
> > side for when the object is re-initialised after being destroyed and
> > allocated again. I think that would necessitate release ordering for
> > refcount_set() so that whatever allows the consumer to validate the
> > object (e.g. sequence number) is published *before* the refcount.
>
> Thanks Will!
> I would like to expand on your answer to provide an example of the
> race that would happen without release ordering in the producer. To
> save reader's time I provide a simplified flow and reasoning first.
> More detailed code of what I'm considering a typical
> SLAB_TYPESAFE_BY_RCU refcounting example is added at the end of my
> reply (Addendum).
> Simplified flow looks like this:
>
> consumer:
> obj = lookup(collection, key);
> if (!refcount_inc_not_zero(&obj->ref))
> return;
> smp_rmb(); /* Peter's new acquire fence */
> if (READ_ONCE(obj->key) != key) {
> put_ref(obj);
> return;
> }
> use(obj->value);
>
> producer:
> old_key = obj->key;
> remove(collection, old_key);
> if (!refcount_dec_and_test(&obj->ref))
> return;
> obj->key = KEY_INVALID;
> free(objj);
> ...
> obj = malloc(); /* obj is reused */
> obj->key = new_key;
> obj->value = new_value;
> smp_wmb(); /* Will's proposed release fence */
> refcount_set(obj->ref, 1);
> insert(collection, key, obj);
>
> Let's consider a case when new_key == old_key. Will call both of them
> "key". Without WIll's proposed fence the following reordering is
> possible:
>
> consumer:
> obj = lookup(collection, key);
>
> producer:
> key = obj->key
> remove(collection, key);
> if (!refcount_dec_and_test(&obj->ref))
> return;
> obj->key = KEY_INVALID;
> free(objj);
> obj = malloc(); /* obj is reused */
> refcount_set(obj->ref, 1);
> obj->key = key; /* same key */
>
> if (!refcount_inc_not_zero(&obj->ref))
> return;
> smp_rmb();
> if (READ_ONCE(obj->key) != key) {
> put_ref(obj);
> return;
> }
> use(obj->value);
>
> obj->value = new_value; /* reordered store */
> add(collection, key, obj);
>
> So, the consumer finds the old object, successfully takes a refcount
> and validates the key. It succeeds because the object is allocated and
> has the same key, which is fine. However it proceeds to use stale
> obj->value. Will's proposed release ordering would prevent that.
>
> The example in https://elixir.bootlin.com/linux/v6.12.6/source/include/linux/slab.h#L102
> omits all these ordering issues for SLAB_TYPESAFE_BY_RCU.
> I think it would be better to introduce two new functions:
> refcount_add_not_zero_acquire() and refcount_set_release(), clearly
> document that they should be used when a freed object can be recycled
> and reused, like in SLAB_TYPESAFE_BY_RCU case. refcount_set_release()
> should also clarify that once it's called, the object can be accessed
> by consumers even if it was not added yet into the collection used for
> object lookup (like in the example above). SLAB_TYPESAFE_BY_RCU
> comment at https://elixir.bootlin.com/linux/v6.12.6/source/include/linux/slab.h#L102
> then can explicitly use these new functions in the example code,
> further clarifying their purpose and proper use.
> WDYT?
Hi Peter,
Should I take a stab at preparing a patch to add the two new
refcounting functions suggested above with updates to the
documentation and comments?
If you disagree with that or need more time to decide then I'll wait.
Please let me know.
Thanks,
Suren.
>
> ADDENDUM.
> Detailed code for typical use of refcounting with SLAB_TYPESAFE_BY_RCU:
>
> struct object {
> refcount_t ref;
> u64 key;
> u64 value;
> };
>
> void init(struct object *obj, u64 key, u64 value)
> {
> obj->key = key;
> obj->value = value;
> smp_wmb(); /* Will's proposed release fence */
> refcount_set(obj->ref, 1);
> }
>
> bool get_ref(struct object *obj, u64 key)
> {
> if (!refcount_inc_not_zero(&obj->ref))
> return false;
> smp_rmb(); /* Peter's new acquire fence */
> if (READ_ONCE(obj->key) != key) {
> put_ref(obj);
> return false;
> }
> return true;
> }
>
> void put_ref(struct object *obj)
> {
> if (!refcount_dec_and_test(&obj->ref))
> return;
> obj->key = KEY_INVALID;
> free(obj);
> }
>
> consumer:
> obj = lookup(collection, key);
> if (!get_ref(obj, key)
> return;
> use(obj->value);
>
> producer:
> remove(collection, old_obj->key);
> put_ref(old_obj);
> new_obj = malloc();
> init(new_obj, new_key, new_value);
> insert(collection, new_key, new_obj);
>
> With SLAB_TYPESAFE_BY_RCU old_obj in the producer can be reused and be
> equal to new_obj.
>
>
> >
> > Will
On Tue, Jan 28, 2025 at 3:51 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Mon, Jan 27, 2025 at 11:21 AM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Mon, Jan 27, 2025 at 6:09 AM Will Deacon <will@kernel.org> wrote:
> > >
> > > On Fri, Jan 17, 2025 at 03:41:36PM +0000, Will Deacon wrote:
> > > > On Wed, Jan 15, 2025 at 05:00:11PM +0100, Peter Zijlstra wrote:
> > > > > On Wed, Jan 15, 2025 at 12:13:34PM +0100, Peter Zijlstra wrote:
> > > > >
> > > > > > Notably, it means refcount_t is entirely unsuitable for anything
> > > > > > SLAB_TYPESAFE_BY_RCU, since they all will need secondary validation
> > > > > > conditions after the refcount succeeds.
> > > > > >
> > > > > > And this is probably fine, but let me ponder this all a little more.
> > > > >
> > > > > Even though SLAB_TYPESAFE_BY_RCU is relatively rare, I think we'd better
> > > > > fix this, these things are hard enough as they are.
> > > > >
> > > > > Will, others, wdyt?
> > > >
> > > > We should also update the Documentation (atomic_t.txt and
> > > > refcount-vs-atomic.rst) if we strengthen this.
> > > >
> > > > > ---
> > > > > Subject: refcount: Strengthen inc_not_zero()
> > > > >
> > > > > For speculative lookups where a successful inc_not_zero() pins the
> > > > > object, but where we still need to double check if the object acquired
> > > > > is indeed the one we set out to aquire, needs this validation to happen
> > > > > *after* the increment.
> > > > >
> > > > > Notably SLAB_TYPESAFE_BY_RCU is one such an example.
> > > > >
> > > > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > > > ---
> > > > > include/linux/refcount.h | 15 ++++++++-------
> > > > > 1 file changed, 8 insertions(+), 7 deletions(-)
> > > > >
> > > > > diff --git a/include/linux/refcount.h b/include/linux/refcount.h
> > > > > index 35f039ecb272..340e7ffa445e 100644
> > > > > --- a/include/linux/refcount.h
> > > > > +++ b/include/linux/refcount.h
> > > > > @@ -69,9 +69,10 @@
> > > > > * its the lock acquire, for RCU/lockless data structures its the dependent
> > > > > * load.
> > > > > *
> > > > > - * Do note that inc_not_zero() provides a control dependency which will order
> > > > > - * future stores against the inc, this ensures we'll never modify the object
> > > > > - * if we did not in fact acquire a reference.
> > > > > + * Do note that inc_not_zero() does provide acquire order, which will order
> > > > > + * future load and stores against the inc, this ensures all subsequent accesses
> > > > > + * are from this object and not anything previously occupying this memory --
> > > > > + * consider SLAB_TYPESAFE_BY_RCU.
> > > > > *
> > > > > * The decrements will provide release order, such that all the prior loads and
> > > > > * stores will be issued before, it also provides a control dependency, which
> > > > > @@ -144,7 +145,7 @@ bool __refcount_add_not_zero(int i, refcount_t *r, int *oldp)
> > > > > do {
> > > > > if (!old)
> > > > > break;
> > > > > - } while (!atomic_try_cmpxchg_relaxed(&r->refs, &old, old + i));
> > > > > + } while (!atomic_try_cmpxchg_acquire(&r->refs, &old, old + i));
> > > >
> > > > Hmm, do the later memory accesses need to be ordered against the store
> > > > part of the increment or just the read? If it's the former, then I don't
> > > > think that _acquire is sufficient -- accesses can still get in-between
> > > > the read and write parts of the RmW.
> > >
> > > I dug some more into this at the end of last week. For the
> > > SLAB_TYPESAFE_BY_RCU where we're racing inc_not_zero() with
> > > dec_and_test(), then I think using _acquire() above is correct as the
> > > later references can only move up into the critical section in the case
> > > that we successfully obtained a reference.
> > >
> > > However, if we're going to make the barriers implicit in the refcount
> > > operations here then I think we also need to do something on the producer
> > > side for when the object is re-initialised after being destroyed and
> > > allocated again. I think that would necessitate release ordering for
> > > refcount_set() so that whatever allows the consumer to validate the
> > > object (e.g. sequence number) is published *before* the refcount.
> >
> > Thanks Will!
> > I would like to expand on your answer to provide an example of the
> > race that would happen without release ordering in the producer. To
> > save reader's time I provide a simplified flow and reasoning first.
> > More detailed code of what I'm considering a typical
> > SLAB_TYPESAFE_BY_RCU refcounting example is added at the end of my
> > reply (Addendum).
> > Simplified flow looks like this:
> >
> > consumer:
> > obj = lookup(collection, key);
> > if (!refcount_inc_not_zero(&obj->ref))
> > return;
> > smp_rmb(); /* Peter's new acquire fence */
> > if (READ_ONCE(obj->key) != key) {
> > put_ref(obj);
> > return;
> > }
> > use(obj->value);
> >
> > producer:
> > old_key = obj->key;
> > remove(collection, old_key);
> > if (!refcount_dec_and_test(&obj->ref))
> > return;
> > obj->key = KEY_INVALID;
> > free(objj);
> > ...
> > obj = malloc(); /* obj is reused */
> > obj->key = new_key;
> > obj->value = new_value;
> > smp_wmb(); /* Will's proposed release fence */
> > refcount_set(obj->ref, 1);
> > insert(collection, key, obj);
> >
> > Let's consider a case when new_key == old_key. Will call both of them
> > "key". Without WIll's proposed fence the following reordering is
> > possible:
> >
> > consumer:
> > obj = lookup(collection, key);
> >
> > producer:
> > key = obj->key
> > remove(collection, key);
> > if (!refcount_dec_and_test(&obj->ref))
> > return;
> > obj->key = KEY_INVALID;
> > free(objj);
> > obj = malloc(); /* obj is reused */
> > refcount_set(obj->ref, 1);
> > obj->key = key; /* same key */
> >
> > if (!refcount_inc_not_zero(&obj->ref))
> > return;
> > smp_rmb();
> > if (READ_ONCE(obj->key) != key) {
> > put_ref(obj);
> > return;
> > }
> > use(obj->value);
> >
> > obj->value = new_value; /* reordered store */
> > add(collection, key, obj);
> >
> > So, the consumer finds the old object, successfully takes a refcount
> > and validates the key. It succeeds because the object is allocated and
> > has the same key, which is fine. However it proceeds to use stale
> > obj->value. Will's proposed release ordering would prevent that.
> >
> > The example in https://elixir.bootlin.com/linux/v6.12.6/source/include/linux/slab.h#L102
> > omits all these ordering issues for SLAB_TYPESAFE_BY_RCU.
> > I think it would be better to introduce two new functions:
> > refcount_add_not_zero_acquire() and refcount_set_release(), clearly
> > document that they should be used when a freed object can be recycled
> > and reused, like in SLAB_TYPESAFE_BY_RCU case. refcount_set_release()
> > should also clarify that once it's called, the object can be accessed
> > by consumers even if it was not added yet into the collection used for
> > object lookup (like in the example above). SLAB_TYPESAFE_BY_RCU
> > comment at https://elixir.bootlin.com/linux/v6.12.6/source/include/linux/slab.h#L102
> > then can explicitly use these new functions in the example code,
> > further clarifying their purpose and proper use.
> > WDYT?
>
> Hi Peter,
> Should I take a stab at preparing a patch to add the two new
> refcounting functions suggested above with updates to the
> documentation and comments?
> If you disagree with that or need more time to decide then I'll wait.
> Please let me know.
Not sure if "--in-reply-to" worked but I just posted a patch adding
new refcounting APIs for SLAB_TYPESAFE_BY_RCU here:
https://lore.kernel.org/all/20250206025201.979573-1-surenb@google.com/
Since Peter seems to be busy I discussed these ordering requirements
for SLAB_TYPESAFE_BY_RCU with Paul McKenney and he was leaning towards
having separate functions with the additional fences for this case.
That's what I provided in my patch.
Another possible option is to add acquire ordering in the
__refcount_add_not_zero() as Peter suggested and add
refcount_set_release() function.
Thanks,
Suren.
> Thanks,
> Suren.
>
>
> >
> > ADDENDUM.
> > Detailed code for typical use of refcounting with SLAB_TYPESAFE_BY_RCU:
> >
> > struct object {
> > refcount_t ref;
> > u64 key;
> > u64 value;
> > };
> >
> > void init(struct object *obj, u64 key, u64 value)
> > {
> > obj->key = key;
> > obj->value = value;
> > smp_wmb(); /* Will's proposed release fence */
> > refcount_set(obj->ref, 1);
> > }
> >
> > bool get_ref(struct object *obj, u64 key)
> > {
> > if (!refcount_inc_not_zero(&obj->ref))
> > return false;
> > smp_rmb(); /* Peter's new acquire fence */
> > if (READ_ONCE(obj->key) != key) {
> > put_ref(obj);
> > return false;
> > }
> > return true;
> > }
> >
> > void put_ref(struct object *obj)
> > {
> > if (!refcount_dec_and_test(&obj->ref))
> > return;
> > obj->key = KEY_INVALID;
> > free(obj);
> > }
> >
> > consumer:
> > obj = lookup(collection, key);
> > if (!get_ref(obj, key)
> > return;
> > use(obj->value);
> >
> > producer:
> > remove(collection, old_obj->key);
> > put_ref(old_obj);
> > new_obj = malloc();
> > init(new_obj, new_key, new_value);
> > insert(collection, new_key, new_obj);
> >
> > With SLAB_TYPESAFE_BY_RCU old_obj in the producer can be reused and be
> > equal to new_obj.
> >
> >
> > >
> > > Will
On Wed, Feb 5, 2025 at 7:03 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Tue, Jan 28, 2025 at 3:51 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Mon, Jan 27, 2025 at 11:21 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > On Mon, Jan 27, 2025 at 6:09 AM Will Deacon <will@kernel.org> wrote:
> > > >
> > > > On Fri, Jan 17, 2025 at 03:41:36PM +0000, Will Deacon wrote:
> > > > > On Wed, Jan 15, 2025 at 05:00:11PM +0100, Peter Zijlstra wrote:
> > > > > > On Wed, Jan 15, 2025 at 12:13:34PM +0100, Peter Zijlstra wrote:
> > > > > >
> > > > > > > Notably, it means refcount_t is entirely unsuitable for anything
> > > > > > > SLAB_TYPESAFE_BY_RCU, since they all will need secondary validation
> > > > > > > conditions after the refcount succeeds.
> > > > > > >
> > > > > > > And this is probably fine, but let me ponder this all a little more.
> > > > > >
> > > > > > Even though SLAB_TYPESAFE_BY_RCU is relatively rare, I think we'd better
> > > > > > fix this, these things are hard enough as they are.
> > > > > >
> > > > > > Will, others, wdyt?
> > > > >
> > > > > We should also update the Documentation (atomic_t.txt and
> > > > > refcount-vs-atomic.rst) if we strengthen this.
> > > > >
> > > > > > ---
> > > > > > Subject: refcount: Strengthen inc_not_zero()
> > > > > >
> > > > > > For speculative lookups where a successful inc_not_zero() pins the
> > > > > > object, but where we still need to double check if the object acquired
> > > > > > is indeed the one we set out to aquire, needs this validation to happen
> > > > > > *after* the increment.
> > > > > >
> > > > > > Notably SLAB_TYPESAFE_BY_RCU is one such an example.
> > > > > >
> > > > > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > > > > ---
> > > > > > include/linux/refcount.h | 15 ++++++++-------
> > > > > > 1 file changed, 8 insertions(+), 7 deletions(-)
> > > > > >
> > > > > > diff --git a/include/linux/refcount.h b/include/linux/refcount.h
> > > > > > index 35f039ecb272..340e7ffa445e 100644
> > > > > > --- a/include/linux/refcount.h
> > > > > > +++ b/include/linux/refcount.h
> > > > > > @@ -69,9 +69,10 @@
> > > > > > * its the lock acquire, for RCU/lockless data structures its the dependent
> > > > > > * load.
> > > > > > *
> > > > > > - * Do note that inc_not_zero() provides a control dependency which will order
> > > > > > - * future stores against the inc, this ensures we'll never modify the object
> > > > > > - * if we did not in fact acquire a reference.
> > > > > > + * Do note that inc_not_zero() does provide acquire order, which will order
> > > > > > + * future load and stores against the inc, this ensures all subsequent accesses
> > > > > > + * are from this object and not anything previously occupying this memory --
> > > > > > + * consider SLAB_TYPESAFE_BY_RCU.
> > > > > > *
> > > > > > * The decrements will provide release order, such that all the prior loads and
> > > > > > * stores will be issued before, it also provides a control dependency, which
> > > > > > @@ -144,7 +145,7 @@ bool __refcount_add_not_zero(int i, refcount_t *r, int *oldp)
> > > > > > do {
> > > > > > if (!old)
> > > > > > break;
> > > > > > - } while (!atomic_try_cmpxchg_relaxed(&r->refs, &old, old + i));
> > > > > > + } while (!atomic_try_cmpxchg_acquire(&r->refs, &old, old + i));
> > > > >
> > > > > Hmm, do the later memory accesses need to be ordered against the store
> > > > > part of the increment or just the read? If it's the former, then I don't
> > > > > think that _acquire is sufficient -- accesses can still get in-between
> > > > > the read and write parts of the RmW.
> > > >
> > > > I dug some more into this at the end of last week. For the
> > > > SLAB_TYPESAFE_BY_RCU where we're racing inc_not_zero() with
> > > > dec_and_test(), then I think using _acquire() above is correct as the
> > > > later references can only move up into the critical section in the case
> > > > that we successfully obtained a reference.
> > > >
> > > > However, if we're going to make the barriers implicit in the refcount
> > > > operations here then I think we also need to do something on the producer
> > > > side for when the object is re-initialised after being destroyed and
> > > > allocated again. I think that would necessitate release ordering for
> > > > refcount_set() so that whatever allows the consumer to validate the
> > > > object (e.g. sequence number) is published *before* the refcount.
> > >
> > > Thanks Will!
> > > I would like to expand on your answer to provide an example of the
> > > race that would happen without release ordering in the producer. To
> > > save reader's time I provide a simplified flow and reasoning first.
> > > More detailed code of what I'm considering a typical
> > > SLAB_TYPESAFE_BY_RCU refcounting example is added at the end of my
> > > reply (Addendum).
> > > Simplified flow looks like this:
> > >
> > > consumer:
> > > obj = lookup(collection, key);
> > > if (!refcount_inc_not_zero(&obj->ref))
> > > return;
> > > smp_rmb(); /* Peter's new acquire fence */
> > > if (READ_ONCE(obj->key) != key) {
> > > put_ref(obj);
> > > return;
> > > }
> > > use(obj->value);
> > >
> > > producer:
> > > old_key = obj->key;
> > > remove(collection, old_key);
> > > if (!refcount_dec_and_test(&obj->ref))
> > > return;
> > > obj->key = KEY_INVALID;
> > > free(objj);
> > > ...
> > > obj = malloc(); /* obj is reused */
> > > obj->key = new_key;
> > > obj->value = new_value;
> > > smp_wmb(); /* Will's proposed release fence */
> > > refcount_set(obj->ref, 1);
> > > insert(collection, key, obj);
> > >
> > > Let's consider a case when new_key == old_key. Will call both of them
> > > "key". Without WIll's proposed fence the following reordering is
> > > possible:
> > >
> > > consumer:
> > > obj = lookup(collection, key);
> > >
> > > producer:
> > > key = obj->key
> > > remove(collection, key);
> > > if (!refcount_dec_and_test(&obj->ref))
> > > return;
> > > obj->key = KEY_INVALID;
> > > free(objj);
> > > obj = malloc(); /* obj is reused */
> > > refcount_set(obj->ref, 1);
> > > obj->key = key; /* same key */
> > >
> > > if (!refcount_inc_not_zero(&obj->ref))
> > > return;
> > > smp_rmb();
> > > if (READ_ONCE(obj->key) != key) {
> > > put_ref(obj);
> > > return;
> > > }
> > > use(obj->value);
> > >
> > > obj->value = new_value; /* reordered store */
> > > add(collection, key, obj);
> > >
> > > So, the consumer finds the old object, successfully takes a refcount
> > > and validates the key. It succeeds because the object is allocated and
> > > has the same key, which is fine. However it proceeds to use stale
> > > obj->value. Will's proposed release ordering would prevent that.
> > >
> > > The example in https://elixir.bootlin.com/linux/v6.12.6/source/include/linux/slab.h#L102
> > > omits all these ordering issues for SLAB_TYPESAFE_BY_RCU.
> > > I think it would be better to introduce two new functions:
> > > refcount_add_not_zero_acquire() and refcount_set_release(), clearly
> > > document that they should be used when a freed object can be recycled
> > > and reused, like in SLAB_TYPESAFE_BY_RCU case. refcount_set_release()
> > > should also clarify that once it's called, the object can be accessed
> > > by consumers even if it was not added yet into the collection used for
> > > object lookup (like in the example above). SLAB_TYPESAFE_BY_RCU
> > > comment at https://elixir.bootlin.com/linux/v6.12.6/source/include/linux/slab.h#L102
> > > then can explicitly use these new functions in the example code,
> > > further clarifying their purpose and proper use.
> > > WDYT?
> >
> > Hi Peter,
> > Should I take a stab at preparing a patch to add the two new
> > refcounting functions suggested above with updates to the
> > documentation and comments?
> > If you disagree with that or need more time to decide then I'll wait.
> > Please let me know.
>
> Not sure if "--in-reply-to" worked but I just posted a patch adding
> new refcounting APIs for SLAB_TYPESAFE_BY_RCU here:
> https://lore.kernel.org/all/20250206025201.979573-1-surenb@google.com/
Since I did not get any replies other than Vlastimil's Ack on the
above patch, I went ahead and posted v10 of my patchset [1] and
included the patch above in it [2]. Feedback is highly appreciated!
[1] https://lore.kernel.org/all/20250213224655.1680278-1-surenb@google.com/
[2] https://lore.kernel.org/all/20250213224655.1680278-11-surenb@google.com/
> Since Peter seems to be busy I discussed these ordering requirements
> for SLAB_TYPESAFE_BY_RCU with Paul McKenney and he was leaning towards
> having separate functions with the additional fences for this case.
> That's what I provided in my patch.
> Another possible option is to add acquire ordering in the
> __refcount_add_not_zero() as Peter suggested and add
> refcount_set_release() function.
> Thanks,
> Suren.
>
>
> > Thanks,
> > Suren.
> >
> >
> > >
> > > ADDENDUM.
> > > Detailed code for typical use of refcounting with SLAB_TYPESAFE_BY_RCU:
> > >
> > > struct object {
> > > refcount_t ref;
> > > u64 key;
> > > u64 value;
> > > };
> > >
> > > void init(struct object *obj, u64 key, u64 value)
> > > {
> > > obj->key = key;
> > > obj->value = value;
> > > smp_wmb(); /* Will's proposed release fence */
> > > refcount_set(obj->ref, 1);
> > > }
> > >
> > > bool get_ref(struct object *obj, u64 key)
> > > {
> > > if (!refcount_inc_not_zero(&obj->ref))
> > > return false;
> > > smp_rmb(); /* Peter's new acquire fence */
> > > if (READ_ONCE(obj->key) != key) {
> > > put_ref(obj);
> > > return false;
> > > }
> > > return true;
> > > }
> > >
> > > void put_ref(struct object *obj)
> > > {
> > > if (!refcount_dec_and_test(&obj->ref))
> > > return;
> > > obj->key = KEY_INVALID;
> > > free(obj);
> > > }
> > >
> > > consumer:
> > > obj = lookup(collection, key);
> > > if (!get_ref(obj, key)
> > > return;
> > > use(obj->value);
> > >
> > > producer:
> > > remove(collection, old_obj->key);
> > > put_ref(old_obj);
> > > new_obj = malloc();
> > > init(new_obj, new_key, new_value);
> > > insert(collection, new_key, new_obj);
> > >
> > > With SLAB_TYPESAFE_BY_RCU old_obj in the producer can be reused and be
> > > equal to new_obj.
> > >
> > >
> > > >
> > > > Will
On Wed, Feb 5, 2025 at 7:03 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Tue, Jan 28, 2025 at 3:51 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Mon, Jan 27, 2025 at 11:21 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > On Mon, Jan 27, 2025 at 6:09 AM Will Deacon <will@kernel.org> wrote:
> > > >
> > > > On Fri, Jan 17, 2025 at 03:41:36PM +0000, Will Deacon wrote:
> > > > > On Wed, Jan 15, 2025 at 05:00:11PM +0100, Peter Zijlstra wrote:
> > > > > > On Wed, Jan 15, 2025 at 12:13:34PM +0100, Peter Zijlstra wrote:
> > > > > >
> > > > > > > Notably, it means refcount_t is entirely unsuitable for anything
> > > > > > > SLAB_TYPESAFE_BY_RCU, since they all will need secondary validation
> > > > > > > conditions after the refcount succeeds.
> > > > > > >
> > > > > > > And this is probably fine, but let me ponder this all a little more.
> > > > > >
> > > > > > Even though SLAB_TYPESAFE_BY_RCU is relatively rare, I think we'd better
> > > > > > fix this, these things are hard enough as they are.
> > > > > >
> > > > > > Will, others, wdyt?
> > > > >
> > > > > We should also update the Documentation (atomic_t.txt and
> > > > > refcount-vs-atomic.rst) if we strengthen this.
> > > > >
> > > > > > ---
> > > > > > Subject: refcount: Strengthen inc_not_zero()
> > > > > >
> > > > > > For speculative lookups where a successful inc_not_zero() pins the
> > > > > > object, but where we still need to double check if the object acquired
> > > > > > is indeed the one we set out to aquire, needs this validation to happen
> > > > > > *after* the increment.
> > > > > >
> > > > > > Notably SLAB_TYPESAFE_BY_RCU is one such an example.
> > > > > >
> > > > > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > > > > ---
> > > > > > include/linux/refcount.h | 15 ++++++++-------
> > > > > > 1 file changed, 8 insertions(+), 7 deletions(-)
> > > > > >
> > > > > > diff --git a/include/linux/refcount.h b/include/linux/refcount.h
> > > > > > index 35f039ecb272..340e7ffa445e 100644
> > > > > > --- a/include/linux/refcount.h
> > > > > > +++ b/include/linux/refcount.h
> > > > > > @@ -69,9 +69,10 @@
> > > > > > * its the lock acquire, for RCU/lockless data structures its the dependent
> > > > > > * load.
> > > > > > *
> > > > > > - * Do note that inc_not_zero() provides a control dependency which will order
> > > > > > - * future stores against the inc, this ensures we'll never modify the object
> > > > > > - * if we did not in fact acquire a reference.
> > > > > > + * Do note that inc_not_zero() does provide acquire order, which will order
> > > > > > + * future load and stores against the inc, this ensures all subsequent accesses
> > > > > > + * are from this object and not anything previously occupying this memory --
> > > > > > + * consider SLAB_TYPESAFE_BY_RCU.
> > > > > > *
> > > > > > * The decrements will provide release order, such that all the prior loads and
> > > > > > * stores will be issued before, it also provides a control dependency, which
> > > > > > @@ -144,7 +145,7 @@ bool __refcount_add_not_zero(int i, refcount_t *r, int *oldp)
> > > > > > do {
> > > > > > if (!old)
> > > > > > break;
> > > > > > - } while (!atomic_try_cmpxchg_relaxed(&r->refs, &old, old + i));
> > > > > > + } while (!atomic_try_cmpxchg_acquire(&r->refs, &old, old + i));
> > > > >
> > > > > Hmm, do the later memory accesses need to be ordered against the store
> > > > > part of the increment or just the read? If it's the former, then I don't
> > > > > think that _acquire is sufficient -- accesses can still get in-between
> > > > > the read and write parts of the RmW.
> > > >
> > > > I dug some more into this at the end of last week. For the
> > > > SLAB_TYPESAFE_BY_RCU where we're racing inc_not_zero() with
> > > > dec_and_test(), then I think using _acquire() above is correct as the
> > > > later references can only move up into the critical section in the case
> > > > that we successfully obtained a reference.
> > > >
> > > > However, if we're going to make the barriers implicit in the refcount
> > > > operations here then I think we also need to do something on the producer
> > > > side for when the object is re-initialised after being destroyed and
> > > > allocated again. I think that would necessitate release ordering for
> > > > refcount_set() so that whatever allows the consumer to validate the
> > > > object (e.g. sequence number) is published *before* the refcount.
> > >
> > > Thanks Will!
> > > I would like to expand on your answer to provide an example of the
> > > race that would happen without release ordering in the producer. To
> > > save reader's time I provide a simplified flow and reasoning first.
> > > More detailed code of what I'm considering a typical
> > > SLAB_TYPESAFE_BY_RCU refcounting example is added at the end of my
> > > reply (Addendum).
> > > Simplified flow looks like this:
> > >
> > > consumer:
> > > obj = lookup(collection, key);
> > > if (!refcount_inc_not_zero(&obj->ref))
> > > return;
> > > smp_rmb(); /* Peter's new acquire fence */
> > > if (READ_ONCE(obj->key) != key) {
> > > put_ref(obj);
> > > return;
> > > }
> > > use(obj->value);
> > >
> > > producer:
> > > old_key = obj->key;
> > > remove(collection, old_key);
> > > if (!refcount_dec_and_test(&obj->ref))
> > > return;
> > > obj->key = KEY_INVALID;
> > > free(objj);
> > > ...
> > > obj = malloc(); /* obj is reused */
> > > obj->key = new_key;
> > > obj->value = new_value;
> > > smp_wmb(); /* Will's proposed release fence */
> > > refcount_set(obj->ref, 1);
> > > insert(collection, key, obj);
> > >
> > > Let's consider a case when new_key == old_key. Will call both of them
> > > "key". Without WIll's proposed fence the following reordering is
> > > possible:
> > >
> > > consumer:
> > > obj = lookup(collection, key);
> > >
> > > producer:
> > > key = obj->key
> > > remove(collection, key);
> > > if (!refcount_dec_and_test(&obj->ref))
> > > return;
> > > obj->key = KEY_INVALID;
> > > free(objj);
> > > obj = malloc(); /* obj is reused */
> > > refcount_set(obj->ref, 1);
> > > obj->key = key; /* same key */
> > >
> > > if (!refcount_inc_not_zero(&obj->ref))
> > > return;
> > > smp_rmb();
> > > if (READ_ONCE(obj->key) != key) {
> > > put_ref(obj);
> > > return;
> > > }
> > > use(obj->value);
> > >
> > > obj->value = new_value; /* reordered store */
> > > add(collection, key, obj);
> > >
> > > So, the consumer finds the old object, successfully takes a refcount
> > > and validates the key. It succeeds because the object is allocated and
> > > has the same key, which is fine. However it proceeds to use stale
> > > obj->value. Will's proposed release ordering would prevent that.
> > >
> > > The example in https://elixir.bootlin.com/linux/v6.12.6/source/include/linux/slab.h#L102
> > > omits all these ordering issues for SLAB_TYPESAFE_BY_RCU.
> > > I think it would be better to introduce two new functions:
> > > refcount_add_not_zero_acquire() and refcount_set_release(), clearly
> > > document that they should be used when a freed object can be recycled
> > > and reused, like in SLAB_TYPESAFE_BY_RCU case. refcount_set_release()
> > > should also clarify that once it's called, the object can be accessed
> > > by consumers even if it was not added yet into the collection used for
> > > object lookup (like in the example above). SLAB_TYPESAFE_BY_RCU
> > > comment at https://elixir.bootlin.com/linux/v6.12.6/source/include/linux/slab.h#L102
> > > then can explicitly use these new functions in the example code,
> > > further clarifying their purpose and proper use.
> > > WDYT?
> >
> > Hi Peter,
> > Should I take a stab at preparing a patch to add the two new
> > refcounting functions suggested above with updates to the
> > documentation and comments?
> > If you disagree with that or need more time to decide then I'll wait.
> > Please let me know.
>
> Not sure if "--in-reply-to" worked but I just posted a patch adding
> new refcounting APIs for SLAB_TYPESAFE_BY_RCU here:
> https://lore.kernel.org/all/20250206025201.979573-1-surenb@google.com/
> Since Peter seems to be busy I discussed these ordering requirements
> for SLAB_TYPESAFE_BY_RCU with Paul McKenney and he was leaning towards
> having separate functions with the additional fences for this case.
> That's what I provided in my patch.
> Another possible option is to add acquire ordering in the
> __refcount_add_not_zero() as Peter suggested and add
> refcount_set_release() function.
> Thanks,
> Suren.
>
>
> > Thanks,
> > Suren.
> >
> >
> > >
> > > ADDENDUM.
> > > Detailed code for typical use of refcounting with SLAB_TYPESAFE_BY_RCU:
> > >
> > > struct object {
> > > refcount_t ref;
> > > u64 key;
> > > u64 value;
> > > };
> > >
> > > void init(struct object *obj, u64 key, u64 value)
> > > {
> > > obj->key = key;
> > > obj->value = value;
> > > smp_wmb(); /* Will's proposed release fence */
> > > refcount_set(obj->ref, 1);
> > > }
> > >
> > > bool get_ref(struct object *obj, u64 key)
> > > {
> > > if (!refcount_inc_not_zero(&obj->ref))
> > > return false;
> > > smp_rmb(); /* Peter's new acquire fence */
> > > if (READ_ONCE(obj->key) != key) {
> > > put_ref(obj);
> > > return false;
> > > }
> > > return true;
> > > }
> > >
> > > void put_ref(struct object *obj)
> > > {
> > > if (!refcount_dec_and_test(&obj->ref))
> > > return;
> > > obj->key = KEY_INVALID;
> > > free(obj);
> > > }
> > >
> > > consumer:
> > > obj = lookup(collection, key);
> > > if (!get_ref(obj, key)
> > > return;
> > > use(obj->value);
> > >
> > > producer:
> > > remove(collection, old_obj->key);
> > > put_ref(old_obj);
> > > new_obj = malloc();
> > > init(new_obj, new_key, new_value);
> > > insert(collection, new_key, new_obj);
> > >
> > > With SLAB_TYPESAFE_BY_RCU old_obj in the producer can be reused and be
> > > equal to new_obj.
> > >
> > >
> > > >
> > > > Will
For speculative lookups where a successful inc_not_zero() pins the
object, but where we still need to double check if the object acquired
is indeed the one we set out to acquire (identity check), needs this
validation to happen *after* the increment.
Similarly, when a new object is initialized and its memory might have
been previously occupied by another object, all stores to initialize the
object should happen *before* refcount initialization.
Notably SLAB_TYPESAFE_BY_RCU is one such an example when this ordering
is required for reference counting.
Add refcount_{add|inc}_not_zero_acquire() to guarantee the proper ordering
between acquiring a reference count on an object and performing the
identity check for that object.
Add refcount_set_release() to guarantee proper ordering between stores
initializing object attributes and the store initializing the refcount.
refcount_set_release() should be done after all other object attributes
are initialized. Once refcount_set_release() is called, the object should
be considered visible to other tasks even if it was not yet added into an
object collection normally used to discover it. This is because other
tasks might have discovered the object previously occupying the same
memory and after memory reuse they can succeed in taking refcount for the
new object and start using it.
Object reuse example to consider:
consumer:
obj = lookup(collection, key);
if (!refcount_inc_not_zero_acquire(&obj->ref))
return;
if (READ_ONCE(obj->key) != key) { /* identity check */
put_ref(obj);
return;
}
use(obj->value);
producer:
remove(collection, obj->key);
if (!refcount_dec_and_test(&obj->ref))
return;
obj->key = KEY_INVALID;
free(obj);
obj = malloc(); /* obj is reused */
obj->key = new_key;
obj->value = new_value;
refcount_set_release(obj->ref, 1);
add(collection, new_key, obj);
refcount_{add|inc}_not_zero_acquire() is required to prevent the following
reordering when refcount_inc_not_zero() is used instead:
consumer:
obj = lookup(collection, key);
if (READ_ONCE(obj->key) != key) { /* reordered identity check */
put_ref(obj);
return;
}
producer:
remove(collection, obj->key);
if (!refcount_dec_and_test(&obj->ref))
return;
obj->key = KEY_INVALID;
free(obj);
obj = malloc(); /* obj is reused */
obj->key = new_key;
obj->value = new_value;
refcount_set_release(obj->ref, 1);
add(collection, new_key, obj);
if (!refcount_inc_not_zero(&obj->ref))
return;
use(obj->value); /* USING WRONG OBJECT */
refcount_set_release() is required to prevent the following reordering
when refcount_set() is used instead:
consumer:
obj = lookup(collection, key);
producer:
remove(collection, obj->key);
if (!refcount_dec_and_test(&obj->ref))
return;
obj->key = KEY_INVALID;
free(obj);
obj = malloc(); /* obj is reused */
obj->key = new_key; /* new_key == old_key */
refcount_set(obj->ref, 1);
if (!refcount_inc_not_zero_acquire(&obj->ref))
return;
if (READ_ONCE(obj->key) != key) { /* pass since new_key == old_key */
put_ref(obj);
return;
}
use(obj->value); /* USING STALE obj->value */
obj->value = new_value; /* reordered store */
add(collection, key, obj);
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
---
Documentation/RCU/whatisRCU.rst | 10 ++
Documentation/core-api/refcount-vs-atomic.rst | 37 +++++-
include/linux/refcount.h | 106 ++++++++++++++++++
include/linux/slab.h | 9 ++
4 files changed, 156 insertions(+), 6 deletions(-)
diff --git a/Documentation/RCU/whatisRCU.rst b/Documentation/RCU/whatisRCU.rst
index 1ef5784c1b84..53faeed7c190 100644
--- a/Documentation/RCU/whatisRCU.rst
+++ b/Documentation/RCU/whatisRCU.rst
@@ -971,6 +971,16 @@ unfortunately any spinlock in a ``SLAB_TYPESAFE_BY_RCU`` object must be
initialized after each and every call to kmem_cache_alloc(), which renders
reference-free spinlock acquisition completely unsafe. Therefore, when
using ``SLAB_TYPESAFE_BY_RCU``, make proper use of a reference counter.
+If using refcount_t, the specialized refcount_{add|inc}_not_zero_acquire()
+and refcount_set_release() APIs should be used to ensure correct operation
+ordering when verifying object identity and when initializing newly
+allocated objects. Acquire fence in refcount_{add|inc}_not_zero_acquire()
+ensures that identity checks happen *after* reference count is taken.
+refcount_set_release() should be called after a newly allocated object is
+fully initialized and release fence ensures that new values are visible
+*before* refcount can be successfully taken by other users. Once
+refcount_set_release() is called, the object should be considered visible
+by other tasks.
(Those willing to initialize their locks in a kmem_cache constructor
may also use locking, including cache-friendly sequence locking.)
diff --git a/Documentation/core-api/refcount-vs-atomic.rst b/Documentation/core-api/refcount-vs-atomic.rst
index 79a009ce11df..9551a7bbfd38 100644
--- a/Documentation/core-api/refcount-vs-atomic.rst
+++ b/Documentation/core-api/refcount-vs-atomic.rst
@@ -86,7 +86,19 @@ Memory ordering guarantee changes:
* none (both fully unordered)
-case 2) - increment-based ops that return no value
+case 2) - non-"Read/Modify/Write" (RMW) ops with release ordering
+-------------------------------------------
+
+Function changes:
+
+ * atomic_set_release() --> refcount_set_release()
+
+Memory ordering guarantee changes:
+
+ * none (both provide RELEASE ordering)
+
+
+case 3) - increment-based ops that return no value
--------------------------------------------------
Function changes:
@@ -98,7 +110,7 @@ Memory ordering guarantee changes:
* none (both fully unordered)
-case 3) - decrement-based RMW ops that return no value
+case 4) - decrement-based RMW ops that return no value
------------------------------------------------------
Function changes:
@@ -110,7 +122,7 @@ Memory ordering guarantee changes:
* fully unordered --> RELEASE ordering
-case 4) - increment-based RMW ops that return a value
+case 5) - increment-based RMW ops that return a value
-----------------------------------------------------
Function changes:
@@ -126,7 +138,20 @@ Memory ordering guarantees changes:
result of obtaining pointer to the object!
-case 5) - generic dec/sub decrement-based RMW ops that return a value
+case 6) - increment-based RMW ops with acquire ordering that return a value
+-----------------------------------------------------
+
+Function changes:
+
+ * atomic_inc_not_zero() --> refcount_inc_not_zero_acquire()
+ * no atomic counterpart --> refcount_add_not_zero_acquire()
+
+Memory ordering guarantees changes:
+
+ * fully ordered --> ACQUIRE ordering on success
+
+
+case 7) - generic dec/sub decrement-based RMW ops that return a value
---------------------------------------------------------------------
Function changes:
@@ -139,7 +164,7 @@ Memory ordering guarantees changes:
* fully ordered --> RELEASE ordering + ACQUIRE ordering on success
-case 6) other decrement-based RMW ops that return a value
+case 8) other decrement-based RMW ops that return a value
---------------------------------------------------------
Function changes:
@@ -154,7 +179,7 @@ Memory ordering guarantees changes:
.. note:: atomic_add_unless() only provides full order on success.
-case 7) - lock-based RMW
+case 9) - lock-based RMW
------------------------
Function changes:
diff --git a/include/linux/refcount.h b/include/linux/refcount.h
index 35f039ecb272..4589d2e7bfea 100644
--- a/include/linux/refcount.h
+++ b/include/linux/refcount.h
@@ -87,6 +87,15 @@
* The decrements dec_and_test() and sub_and_test() also provide acquire
* ordering on success.
*
+ * refcount_{add|inc}_not_zero_acquire() and refcount_set_release() provide
+ * acquire and release ordering for cases when the memory occupied by the
+ * object might be reused to store another object. This is important for the
+ * cases where secondary validation is required to detect such reuse, e.g.
+ * SLAB_TYPESAFE_BY_RCU. The secondary validation checks have to happen after
+ * the refcount is taken, hence acquire order is necessary. Similarly, when the
+ * object is initialized, all stores to its attributes should be visible before
+ * the refcount is set, otherwise a stale attribute value might be used by
+ * another task which succeeds in taking a refcount to the new object.
*/
#ifndef _LINUX_REFCOUNT_H
@@ -125,6 +134,31 @@ static inline void refcount_set(refcount_t *r, int n)
atomic_set(&r->refs, n);
}
+/**
+ * refcount_set_release - set a refcount's value with release ordering
+ * @r: the refcount
+ * @n: value to which the refcount will be set
+ *
+ * This function should be used when memory occupied by the object might be
+ * reused to store another object -- consider SLAB_TYPESAFE_BY_RCU.
+ *
+ * Provides release memory ordering which will order previous memory operations
+ * against this store. This ensures all updates to this object are visible
+ * once the refcount is set and stale values from the object previously
+ * occupying this memory are overwritten with new ones.
+ *
+ * This function should be called only after new object is fully initialized.
+ * After this call the object should be considered visible to other tasks even
+ * if it was not yet added into an object collection normally used to discover
+ * it. This is because other tasks might have discovered the object previously
+ * occupying the same memory and after memory reuse they can succeed in taking
+ * refcount to the new object and start using it.
+ */
+static inline void refcount_set_release(refcount_t *r, int n)
+{
+ atomic_set_release(&r->refs, n);
+}
+
/**
* refcount_read - get a refcount's value
* @r: the refcount
@@ -178,6 +212,52 @@ static inline __must_check bool refcount_add_not_zero(int i, refcount_t *r)
return __refcount_add_not_zero(i, r, NULL);
}
+static inline __must_check __signed_wrap
+bool __refcount_add_not_zero_acquire(int i, refcount_t *r, int *oldp)
+{
+ int old = refcount_read(r);
+
+ do {
+ if (!old)
+ break;
+ } while (!atomic_try_cmpxchg_acquire(&r->refs, &old, old + i));
+
+ if (oldp)
+ *oldp = old;
+
+ if (unlikely(old < 0 || old + i < 0))
+ refcount_warn_saturate(r, REFCOUNT_ADD_NOT_ZERO_OVF);
+
+ return old;
+}
+
+/**
+ * refcount_add_not_zero_acquire - add a value to a refcount with acquire ordering unless it is 0
+ *
+ * @i: the value to add to the refcount
+ * @r: the refcount
+ *
+ * Will saturate at REFCOUNT_SATURATED and WARN.
+ *
+ * This function should be used when memory occupied by the object might be
+ * reused to store another object -- consider SLAB_TYPESAFE_BY_RCU.
+ *
+ * Provides acquire memory ordering on success, it is assumed the caller has
+ * guaranteed the object memory to be stable (RCU, etc.). It does provide a
+ * control dependency and thereby orders future stores. See the comment on top.
+ *
+ * Use of this function is not recommended for the normal reference counting
+ * use case in which references are taken and released one at a time. In these
+ * cases, refcount_inc_not_zero_acquire() should instead be used to increment a
+ * reference count.
+ *
+ * Return: false if the passed refcount is 0, true otherwise
+ */
+static inline __must_check bool refcount_add_not_zero_acquire(int i, refcount_t *r)
+{
+ return __refcount_add_not_zero_acquire(i, r, NULL);
+}
+
static inline __signed_wrap
void __refcount_add(int i, refcount_t *r, int *oldp)
{
@@ -236,6 +316,32 @@ static inline __must_check bool refcount_inc_not_zero(refcount_t *r)
return __refcount_inc_not_zero(r, NULL);
}
+static inline __must_check bool __refcount_inc_not_zero_acquire(refcount_t *r, int *oldp)
+{
+ return __refcount_add_not_zero_acquire(1, r, oldp);
+}
+
+/**
+ * refcount_inc_not_zero_acquire - increment a refcount with acquire ordering unless it is 0
+ * @r: the refcount to increment
+ *
+ * Similar to refcount_inc_not_zero(), but provides acquire memory ordering on
+ * success.
+ *
+ * This function should be used when memory occupied by the object might be
+ * reused to store another object -- consider SLAB_TYPESAFE_BY_RCU.
+ *
+ * Provides acquire memory ordering on success, it is assumed the caller has
+ * guaranteed the object memory to be stable (RCU, etc.). It does provide a
+ * control dependency and thereby orders future stores. See the comment on top.
+ *
+ * Return: true if the increment was successful, false otherwise
+ */
+static inline __must_check bool refcount_inc_not_zero_acquire(refcount_t *r)
+{
+ return __refcount_inc_not_zero_acquire(r, NULL);
+}
+
static inline void __refcount_inc(refcount_t *r, int *oldp)
{
__refcount_add(1, r, oldp);
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 09eedaecf120..ad902a2d692b 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -136,6 +136,15 @@ enum _slab_flag_bits {
* rcu_read_lock before reading the address, then rcu_read_unlock after
* taking the spinlock within the structure expected at that address.
*
+ * Note that object identity check has to be done *after* acquiring a
+ * reference, therefore user has to ensure proper ordering for loads.
+ * Similarly, when initializing objects allocated with SLAB_TYPESAFE_BY_RCU,
+ * the newly allocated object has to be fully initialized *before* its
+ * refcount gets initialized and proper ordering for stores is required.
+ * refcount_{add|inc}_not_zero_acquire() and refcount_set_release() are
+ * designed with the proper fences required for reference counting objects
+ * allocated with SLAB_TYPESAFE_BY_RCU.
+ *
* Note that it is not possible to acquire a lock within a structure
* allocated with SLAB_TYPESAFE_BY_RCU without first acquiring a reference
* as described above. The reason is that SLAB_TYPESAFE_BY_RCU pages
base-commit: 92514ef226f511f2ca1fb1b8752966097518edc0
--
2.48.1.362.g079036d154-goog
On 2/6/25 03:52, Suren Baghdasaryan wrote:
> For speculative lookups where a successful inc_not_zero() pins the
> object, but where we still need to double check if the object acquired
> is indeed the one we set out to acquire (identity check), needs this
> validation to happen *after* the increment.
> Similarly, when a new object is initialized and its memory might have
> been previously occupied by another object, all stores to initialize the
> object should happen *before* refcount initialization.
>
> Notably SLAB_TYPESAFE_BY_RCU is one such an example when this ordering
> is required for reference counting.
>
> Add refcount_{add|inc}_not_zero_acquire() to guarantee the proper ordering
> between acquiring a reference count on an object and performing the
> identity check for that object.
> Add refcount_set_release() to guarantee proper ordering between stores
> initializing object attributes and the store initializing the refcount.
> refcount_set_release() should be done after all other object attributes
> are initialized. Once refcount_set_release() is called, the object should
> be considered visible to other tasks even if it was not yet added into an
> object collection normally used to discover it. This is because other
> tasks might have discovered the object previously occupying the same
> memory and after memory reuse they can succeed in taking refcount for the
> new object and start using it.
>
> Object reuse example to consider:
>
> consumer:
> obj = lookup(collection, key);
> if (!refcount_inc_not_zero_acquire(&obj->ref))
> return;
> if (READ_ONCE(obj->key) != key) { /* identity check */
> put_ref(obj);
> return;
> }
> use(obj->value);
>
> producer:
> remove(collection, obj->key);
> if (!refcount_dec_and_test(&obj->ref))
> return;
> obj->key = KEY_INVALID;
> free(obj);
> obj = malloc(); /* obj is reused */
> obj->key = new_key;
> obj->value = new_value;
> refcount_set_release(obj->ref, 1);
> add(collection, new_key, obj);
>
> refcount_{add|inc}_not_zero_acquire() is required to prevent the following
> reordering when refcount_inc_not_zero() is used instead:
>
> consumer:
> obj = lookup(collection, key);
> if (READ_ONCE(obj->key) != key) { /* reordered identity check */
> put_ref(obj);
> return;
> }
> producer:
> remove(collection, obj->key);
> if (!refcount_dec_and_test(&obj->ref))
> return;
> obj->key = KEY_INVALID;
> free(obj);
> obj = malloc(); /* obj is reused */
> obj->key = new_key;
> obj->value = new_value;
> refcount_set_release(obj->ref, 1);
> add(collection, new_key, obj);
>
> if (!refcount_inc_not_zero(&obj->ref))
> return;
> use(obj->value); /* USING WRONG OBJECT */
>
> refcount_set_release() is required to prevent the following reordering
> when refcount_set() is used instead:
>
> consumer:
> obj = lookup(collection, key);
>
> producer:
> remove(collection, obj->key);
> if (!refcount_dec_and_test(&obj->ref))
> return;
> obj->key = KEY_INVALID;
> free(obj);
> obj = malloc(); /* obj is reused */
> obj->key = new_key; /* new_key == old_key */
> refcount_set(obj->ref, 1);
>
> if (!refcount_inc_not_zero_acquire(&obj->ref))
> return;
> if (READ_ONCE(obj->key) != key) { /* pass since new_key == old_key */
> put_ref(obj);
> return;
> }
> use(obj->value); /* USING STALE obj->value */
>
> obj->value = new_value; /* reordered store */
> add(collection, key, obj);
>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Will Deacon <will@kernel.org>
> Cc: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz> #slab
On Wed, Jan 15, 2025 at 8:00 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Jan 15, 2025 at 12:13:34PM +0100, Peter Zijlstra wrote:
>
> > Notably, it means refcount_t is entirely unsuitable for anything
> > SLAB_TYPESAFE_BY_RCU, since they all will need secondary validation
> > conditions after the refcount succeeds.
> >
> > And this is probably fine, but let me ponder this all a little more.
>
> Even though SLAB_TYPESAFE_BY_RCU is relatively rare, I think we'd better
> fix this, these things are hard enough as they are.
>
> Will, others, wdyt?
I'll wait for the verdict on this patch before proceeding with my
series. It obviously simplifies my job. Thanks Peter!
>
> ---
> Subject: refcount: Strengthen inc_not_zero()
>
> For speculative lookups where a successful inc_not_zero() pins the
> object, but where we still need to double check if the object acquired
> is indeed the one we set out to aquire, needs this validation to happen
> *after* the increment.
>
> Notably SLAB_TYPESAFE_BY_RCU is one such an example.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> include/linux/refcount.h | 15 ++++++++-------
> 1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/refcount.h b/include/linux/refcount.h
> index 35f039ecb272..340e7ffa445e 100644
> --- a/include/linux/refcount.h
> +++ b/include/linux/refcount.h
> @@ -69,9 +69,10 @@
> * its the lock acquire, for RCU/lockless data structures its the dependent
> * load.
> *
> - * Do note that inc_not_zero() provides a control dependency which will order
> - * future stores against the inc, this ensures we'll never modify the object
> - * if we did not in fact acquire a reference.
> + * Do note that inc_not_zero() does provide acquire order, which will order
> + * future load and stores against the inc, this ensures all subsequent accesses
> + * are from this object and not anything previously occupying this memory --
> + * consider SLAB_TYPESAFE_BY_RCU.
> *
> * The decrements will provide release order, such that all the prior loads and
> * stores will be issued before, it also provides a control dependency, which
> @@ -144,7 +145,7 @@ bool __refcount_add_not_zero(int i, refcount_t *r, int *oldp)
> do {
> if (!old)
> break;
> - } while (!atomic_try_cmpxchg_relaxed(&r->refs, &old, old + i));
> + } while (!atomic_try_cmpxchg_acquire(&r->refs, &old, old + i));
>
> if (oldp)
> *oldp = old;
> @@ -225,9 +226,9 @@ static inline __must_check bool __refcount_inc_not_zero(refcount_t *r, int *oldp
> * Similar to atomic_inc_not_zero(), but will saturate at REFCOUNT_SATURATED
> * and WARN.
> *
> - * Provides no memory ordering, it is assumed the caller has guaranteed the
> - * object memory to be stable (RCU, etc.). It does provide a control dependency
> - * and thereby orders future stores. See the comment on top.
> + * Provides acquire ordering, such that subsequent accesses are after the
> + * increment. This is important for the cases where secondary validation is
> + * required, eg. SLAB_TYPESAFE_BY_RCU.
> *
> * Return: true if the increment was successful, false otherwise
> */
© 2016 - 2026 Red Hat, Inc.