[PATCH v5 1/5] asm-generic: barrier: Add smp_cond_load_relaxed_timeout()

Ankur Arora posted 5 patches 3 weeks ago
[PATCH v5 1/5] asm-generic: barrier: Add smp_cond_load_relaxed_timeout()
Posted by Ankur Arora 3 weeks ago
Add smp_cond_load_relaxed_timeout(), which extends
smp_cond_load_relaxed() to allow waiting for a duration.

The additional parameter allows for the timeout check.

The waiting is done via the usual cpu_relax() spin-wait around the
condition variable with periodic evaluation of the time-check.

The number of times we spin is defined by SMP_TIMEOUT_SPIN_COUNT
(chosen to be 200 by default) which, assuming each cpu_relax()
iteration takes around 20-30 cycles (measured on a variety of x86
platforms), amounts to around 4000-6000 cycles.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arch@vger.kernel.org
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Haris Okanovic <harisokn@amazon.com>
Tested-by: Haris Okanovic <harisokn@amazon.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
 include/asm-generic/barrier.h | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index d4f581c1e21d..8483e139954f 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -273,6 +273,41 @@ do {									\
 })
 #endif
 
+#ifndef SMP_TIMEOUT_SPIN_COUNT
+#define SMP_TIMEOUT_SPIN_COUNT		200
+#endif
+
+/**
+ * smp_cond_load_relaxed_timeout() - (Spin) wait for cond with no ordering
+ * guarantees until a timeout expires.
+ * @ptr: pointer to the variable to wait on
+ * @cond: boolean expression to wait for
+ * @time_check_expr: expression to decide when to bail out
+ *
+ * Equivalent to using READ_ONCE() on the condition variable.
+ */
+#ifndef smp_cond_load_relaxed_timeout
+#define smp_cond_load_relaxed_timeout(ptr, cond_expr, time_check_expr)	\
+({									\
+	typeof(ptr) __PTR = (ptr);					\
+	__unqual_scalar_typeof(*ptr) VAL;				\
+	u32 __n = 0, __spin = SMP_TIMEOUT_SPIN_COUNT;			\
+									\
+	for (;;) {							\
+		VAL = READ_ONCE(*__PTR);				\
+		if (cond_expr)						\
+			break;						\
+		cpu_relax();						\
+		if (++__n < __spin)					\
+			continue;					\
+		if (time_check_expr)					\
+			break;						\
+		__n = 0;						\
+	}								\
+	(typeof(*ptr))VAL;						\
+})
+#endif
+
 /*
  * pmem_wmb() ensures that all stores for which the modification
  * are written to persistent storage by preceding instructions have
-- 
2.43.5
Re: [PATCH v5 1/5] asm-generic: barrier: Add smp_cond_load_relaxed_timeout()
Posted by Will Deacon 2 weeks ago
On Wed, Sep 10, 2025 at 08:46:51PM -0700, Ankur Arora wrote:
> Add smp_cond_load_relaxed_timeout(), which extends
> smp_cond_load_relaxed() to allow waiting for a duration.
> 
> The additional parameter allows for the timeout check.
> 
> The waiting is done via the usual cpu_relax() spin-wait around the
> condition variable with periodic evaluation of the time-check.
> 
> The number of times we spin is defined by SMP_TIMEOUT_SPIN_COUNT
> (chosen to be 200 by default) which, assuming each cpu_relax()
> iteration takes around 20-30 cycles (measured on a variety of x86
> platforms), amounts to around 4000-6000 cycles.
> 
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Will Deacon <will@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: linux-arch@vger.kernel.org
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> Reviewed-by: Haris Okanovic <harisokn@amazon.com>
> Tested-by: Haris Okanovic <harisokn@amazon.com>
> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
> ---
>  include/asm-generic/barrier.h | 35 +++++++++++++++++++++++++++++++++++
>  1 file changed, 35 insertions(+)
> 
> diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
> index d4f581c1e21d..8483e139954f 100644
> --- a/include/asm-generic/barrier.h
> +++ b/include/asm-generic/barrier.h
> @@ -273,6 +273,41 @@ do {									\
>  })
>  #endif
>  
> +#ifndef SMP_TIMEOUT_SPIN_COUNT
> +#define SMP_TIMEOUT_SPIN_COUNT		200
> +#endif
> +
> +/**
> + * smp_cond_load_relaxed_timeout() - (Spin) wait for cond with no ordering
> + * guarantees until a timeout expires.
> + * @ptr: pointer to the variable to wait on
> + * @cond: boolean expression to wait for
> + * @time_check_expr: expression to decide when to bail out
> + *
> + * Equivalent to using READ_ONCE() on the condition variable.
> + */
> +#ifndef smp_cond_load_relaxed_timeout
> +#define smp_cond_load_relaxed_timeout(ptr, cond_expr, time_check_expr)	\
> +({									\
> +	typeof(ptr) __PTR = (ptr);					\
> +	__unqual_scalar_typeof(*ptr) VAL;				\
> +	u32 __n = 0, __spin = SMP_TIMEOUT_SPIN_COUNT;			\
> +									\
> +	for (;;) {							\
> +		VAL = READ_ONCE(*__PTR);				\
> +		if (cond_expr)						\
> +			break;						\
> +		cpu_relax();						\
> +		if (++__n < __spin)					\
> +			continue;					\
> +		if (time_check_expr)					\
> +			break;						\

There's a funny discrepancy here when compared to the arm64 version in
the next patch. Here, if we time out, then the value returned is
potentially quite stale because it was read before the last cpu_relax().
In the arm64 patch, the timeout check is before the cmpwait/cpu_relax(),
which I think is better.

Regardless, I think having the same behaviour for the two implementations
would be a good idea.

Will
Re: [PATCH v5 1/5] asm-generic: barrier: Add smp_cond_load_relaxed_timeout()
Posted by Ankur Arora 1 week, 5 days ago
Will Deacon <will@kernel.org> writes:

> On Wed, Sep 10, 2025 at 08:46:51PM -0700, Ankur Arora wrote:
>> Add smp_cond_load_relaxed_timeout(), which extends
>> smp_cond_load_relaxed() to allow waiting for a duration.
>>
>> The additional parameter allows for the timeout check.
>>
>> The waiting is done via the usual cpu_relax() spin-wait around the
>> condition variable with periodic evaluation of the time-check.
>>
>> The number of times we spin is defined by SMP_TIMEOUT_SPIN_COUNT
>> (chosen to be 200 by default) which, assuming each cpu_relax()
>> iteration takes around 20-30 cycles (measured on a variety of x86
>> platforms), amounts to around 4000-6000 cycles.
>>
>> Cc: Arnd Bergmann <arnd@arndb.de>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: linux-arch@vger.kernel.org
>> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
>> Reviewed-by: Haris Okanovic <harisokn@amazon.com>
>> Tested-by: Haris Okanovic <harisokn@amazon.com>
>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>> ---
>>  include/asm-generic/barrier.h | 35 +++++++++++++++++++++++++++++++++++
>>  1 file changed, 35 insertions(+)
>>
>> diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
>> index d4f581c1e21d..8483e139954f 100644
>> --- a/include/asm-generic/barrier.h
>> +++ b/include/asm-generic/barrier.h
>> @@ -273,6 +273,41 @@ do {									\
>>  })
>>  #endif
>>
>> +#ifndef SMP_TIMEOUT_SPIN_COUNT
>> +#define SMP_TIMEOUT_SPIN_COUNT		200
>> +#endif
>> +
>> +/**
>> + * smp_cond_load_relaxed_timeout() - (Spin) wait for cond with no ordering
>> + * guarantees until a timeout expires.
>> + * @ptr: pointer to the variable to wait on
>> + * @cond: boolean expression to wait for
>> + * @time_check_expr: expression to decide when to bail out
>> + *
>> + * Equivalent to using READ_ONCE() on the condition variable.
>> + */
>> +#ifndef smp_cond_load_relaxed_timeout
>> +#define smp_cond_load_relaxed_timeout(ptr, cond_expr, time_check_expr)	\
>> +({									\
>> +	typeof(ptr) __PTR = (ptr);					\
>> +	__unqual_scalar_typeof(*ptr) VAL;				\
>> +	u32 __n = 0, __spin = SMP_TIMEOUT_SPIN_COUNT;			\
>> +									\
>> +	for (;;) {							\
>> +		VAL = READ_ONCE(*__PTR);				\
>> +		if (cond_expr)						\
>> +			break;						\
>> +		cpu_relax();						\
>> +		if (++__n < __spin)					\
>> +			continue;					\
>> +		if (time_check_expr)					\
>> +			break;						\
>
> There's a funny discrepancy here when compared to the arm64 version in
> the next patch. Here, if we time out, then the value returned is
> potentially quite stale because it was read before the last cpu_relax().
> In the arm64 patch, the timeout check is before the cmpwait/cpu_relax(),
> which I think is better.

So, that's a good point. But, the return value being stale also seems to
be incorrect.

> Regardless, I think having the same behaviour for the two implementations
> would be a good idea.

Yeah agreed.

As you outlined in the other mail, how about something like this:

#ifndef smp_cond_load_relaxed_timeout
#define smp_cond_load_relaxed_timeout(ptr, cond_expr, time_check_expr)	\
({									\
	typeof(ptr) __PTR = (ptr);					\
	__unqual_scalar_typeof(*ptr) VAL;				\
	u32 __n = 0, __poll = SMP_TIMEOUT_POLL_COUNT;			\
									\
	for (;;) {							\
		VAL = READ_ONCE(*__PTR);				\
		if (cond_expr)						\
			break;						\
		cpu_poll_relax();					\
		if (++__n < __poll)					\
			continue;					\
		if (time_check_expr) {					\
			VAL = READ_ONCE(*__PTR);			\
			break;						\
		}							\
		__n = 0;						\
	}								\
	(typeof(*ptr))VAL;						\
})
#endif

A bit uglier but if the cpu_poll_relax() was a successful WFE then the
value might be ~100us out of date.

Another option might be to just set some state in the time check and
bail out due to a "if (cond_expr || __timed_out)", but I don't want
to add more instructions in the spin path.

--
ankur
Re: [PATCH v5 1/5] asm-generic: barrier: Add smp_cond_load_relaxed_timeout()
Posted by Will Deacon 1 week, 3 days ago
On Fri, Sep 19, 2025 at 04:41:56PM -0700, Ankur Arora wrote:
> Will Deacon <will@kernel.org> writes:
> > On Wed, Sep 10, 2025 at 08:46:51PM -0700, Ankur Arora wrote:
> >> +	for (;;) {							\
> >> +		VAL = READ_ONCE(*__PTR);				\
> >> +		if (cond_expr)						\
> >> +			break;						\
> >> +		cpu_relax();						\
> >> +		if (++__n < __spin)					\
> >> +			continue;					\
> >> +		if (time_check_expr)					\
> >> +			break;						\
> >
> > There's a funny discrepancy here when compared to the arm64 version in
> > the next patch. Here, if we time out, then the value returned is
> > potentially quite stale because it was read before the last cpu_relax().
> > In the arm64 patch, the timeout check is before the cmpwait/cpu_relax(),
> > which I think is better.
> 
> So, that's a good point. But, the return value being stale also seems to
> be incorrect.
> 
> > Regardless, I think having the same behaviour for the two implementations
> > would be a good idea.
> 
> Yeah agreed.
> 
> As you outlined in the other mail, how about something like this:
> 
> #ifndef smp_cond_load_relaxed_timeout
> #define smp_cond_load_relaxed_timeout(ptr, cond_expr, time_check_expr)	\
> ({									\
> 	typeof(ptr) __PTR = (ptr);					\
> 	__unqual_scalar_typeof(*ptr) VAL;				\
> 	u32 __n = 0, __poll = SMP_TIMEOUT_POLL_COUNT;			\
> 									\
> 	for (;;) {							\
> 		VAL = READ_ONCE(*__PTR);				\
> 		if (cond_expr)						\
> 			break;						\
> 		cpu_poll_relax();					\
> 		if (++__n < __poll)					\
> 			continue;					\
> 		if (time_check_expr) {					\
> 			VAL = READ_ONCE(*__PTR);			\
> 			break;						\
> 		}							\
> 		__n = 0;						\
> 	}								\
> 	(typeof(*ptr))VAL;						\
> })
> #endif

That looks better to me, thanks.

Will