[PATCH v3 3/5] asm-generic: barrier: Add smp_cond_load_acquire_timewait()

Ankur Arora posted 5 patches 3 months, 1 week ago
There is a newer version of this series
[PATCH v3 3/5] asm-generic: barrier: Add smp_cond_load_acquire_timewait()
Posted by Ankur Arora 3 months, 1 week ago
Add the acquire variant of smp_cond_load_relaxed_timewait(). This
reuses the relaxed variant, with an additional LOAD->LOAD
ordering via smp_acquire__after_ctrl_dep().

Also, the rqspinlock implementation has a locally cached copy of
this interface. Fixup the parameters used there.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
 arch/arm64/include/asm/rqspinlock.h |  2 +-
 include/asm-generic/barrier.h       | 22 ++++++++++++++++++++++
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/rqspinlock.h b/arch/arm64/include/asm/rqspinlock.h
index 9ea0a74e5892..f1b6a428013e 100644
--- a/arch/arm64/include/asm/rqspinlock.h
+++ b/arch/arm64/include/asm/rqspinlock.h
@@ -86,7 +86,7 @@
 
 #endif
 
-#define res_smp_cond_load_acquire(v, c) smp_cond_load_acquire_timewait(v, c, 0, 1)
+#define res_smp_cond_load_acquire(v, c) smp_cond_load_acquire_timewait(v, c, 0ULL, 1ULL, 0)
 
 #include <asm-generic/rqspinlock.h>
 
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 8299c57d1110..dd7c9ca2dff3 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -388,6 +388,28 @@ static inline u64 ___smp_cond_spinwait(u64 now, u64 prev, u64 end,
 	(typeof(*ptr))_val;						\
 })
 
+/**
+ * smp_cond_load_acquire_timewait() - (Spin) wait for cond with ACQUIRE ordering
+ * until a timeout expires.
+ *
+ * Arguments: same as smp_cond_load_relaxed_timeout().
+ *
+ * Equivalent to using smp_cond_load_acquire() on the condition variable with
+ * a timeout.
+ */
+#ifndef smp_cond_load_acquire_timewait
+#define smp_cond_load_acquire_timewait(ptr, cond_expr,			\
+				       time_expr, time_end,		\
+				       slack) ({			\
+	__unqual_scalar_typeof(*ptr) _val;				\
+	_val = smp_cond_load_relaxed_timewait(ptr, cond_expr,		\
+					      time_expr, time_end,	\
+					      slack);			\
+	/* Depends on the control dependency of the wait above. */	\
+	smp_acquire__after_ctrl_dep();					\
+	(typeof(*ptr))_val;						\
+})
+#endif
 /*
  * pmem_wmb() ensures that all stores for which the modification
  * are written to persistent storage by preceding instructions have
-- 
2.43.5
Re: [PATCH v3 3/5] asm-generic: barrier: Add smp_cond_load_acquire_timewait()
Posted by Catalin Marinas 2 months ago
On Thu, Jun 26, 2025 at 09:48:03PM -0700, Ankur Arora wrote:
> diff --git a/arch/arm64/include/asm/rqspinlock.h b/arch/arm64/include/asm/rqspinlock.h
> index 9ea0a74e5892..f1b6a428013e 100644
> --- a/arch/arm64/include/asm/rqspinlock.h
> +++ b/arch/arm64/include/asm/rqspinlock.h
> @@ -86,7 +86,7 @@
>  
>  #endif
>  
> -#define res_smp_cond_load_acquire(v, c) smp_cond_load_acquire_timewait(v, c, 0, 1)
> +#define res_smp_cond_load_acquire(v, c) smp_cond_load_acquire_timewait(v, c, 0ULL, 1ULL, 0)
>  
>  #include <asm-generic/rqspinlock.h>
>  
> diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
> index 8299c57d1110..dd7c9ca2dff3 100644
> --- a/include/asm-generic/barrier.h
> +++ b/include/asm-generic/barrier.h
> @@ -388,6 +388,28 @@ static inline u64 ___smp_cond_spinwait(u64 now, u64 prev, u64 end,
>  	(typeof(*ptr))_val;						\
>  })
>  
> +/**
> + * smp_cond_load_acquire_timewait() - (Spin) wait for cond with ACQUIRE ordering
> + * until a timeout expires.
> + *
> + * Arguments: same as smp_cond_load_relaxed_timeout().
> + *
> + * Equivalent to using smp_cond_load_acquire() on the condition variable with
> + * a timeout.
> + */
> +#ifndef smp_cond_load_acquire_timewait
> +#define smp_cond_load_acquire_timewait(ptr, cond_expr,			\
> +				       time_expr, time_end,		\
> +				       slack) ({			\
> +	__unqual_scalar_typeof(*ptr) _val;				\
> +	_val = smp_cond_load_relaxed_timewait(ptr, cond_expr,		\
> +					      time_expr, time_end,	\
> +					      slack);			\
> +	/* Depends on the control dependency of the wait above. */	\
> +	smp_acquire__after_ctrl_dep();					\
> +	(typeof(*ptr))_val;						\
> +})
> +#endif

Using #ifndef in the generic file is the correct thing to do, it allows
architectures to redefine it. Why we have a similar #ifndef in the arm64
rqspinlock.h, no idea, none of the arm64 maintainers acked that patch
(shouldn't have gone in really, we were still discussing the
implementation at the time; I also think it's slightly wrong).

Your change above to rqspinlock.h makes this even more confusing when
you look at the overall result with all the patches applied. We end up
with the same macro in asm/rqspinlock.h but with different number of
arguments.

I'd start with ripping out the current arm64 implementation, add a
generic implementation to barrier.h and then override it in the arch
code.

-- 
Catalin
Re: [PATCH v3 3/5] asm-generic: barrier: Add smp_cond_load_acquire_timewait()
Posted by Ankur Arora 1 month, 3 weeks ago
Catalin Marinas <catalin.marinas@arm.com> writes:

> On Thu, Jun 26, 2025 at 09:48:03PM -0700, Ankur Arora wrote:
>> diff --git a/arch/arm64/include/asm/rqspinlock.h b/arch/arm64/include/asm/rqspinlock.h
>> index 9ea0a74e5892..f1b6a428013e 100644
>> --- a/arch/arm64/include/asm/rqspinlock.h
>> +++ b/arch/arm64/include/asm/rqspinlock.h
>> @@ -86,7 +86,7 @@
>>
>>  #endif
>>
>> -#define res_smp_cond_load_acquire(v, c) smp_cond_load_acquire_timewait(v, c, 0, 1)
>> +#define res_smp_cond_load_acquire(v, c) smp_cond_load_acquire_timewait(v, c, 0ULL, 1ULL, 0)
>>
>>  #include <asm-generic/rqspinlock.h>
>>
>> diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
>> index 8299c57d1110..dd7c9ca2dff3 100644
>> --- a/include/asm-generic/barrier.h
>> +++ b/include/asm-generic/barrier.h
>> @@ -388,6 +388,28 @@ static inline u64 ___smp_cond_spinwait(u64 now, u64 prev, u64 end,
>>  	(typeof(*ptr))_val;						\
>>  })
>>
>> +/**
>> + * smp_cond_load_acquire_timewait() - (Spin) wait for cond with ACQUIRE ordering
>> + * until a timeout expires.
>> + *
>> + * Arguments: same as smp_cond_load_relaxed_timeout().
>> + *
>> + * Equivalent to using smp_cond_load_acquire() on the condition variable with
>> + * a timeout.
>> + */
>> +#ifndef smp_cond_load_acquire_timewait
>> +#define smp_cond_load_acquire_timewait(ptr, cond_expr,			\
>> +				       time_expr, time_end,		\
>> +				       slack) ({			\
>> +	__unqual_scalar_typeof(*ptr) _val;				\
>> +	_val = smp_cond_load_relaxed_timewait(ptr, cond_expr,		\
>> +					      time_expr, time_end,	\
>> +					      slack);			\
>> +	/* Depends on the control dependency of the wait above. */	\
>> +	smp_acquire__after_ctrl_dep();					\
>> +	(typeof(*ptr))_val;						\
>> +})
>> +#endif
>
> Using #ifndef in the generic file is the correct thing to do, it allows
> architectures to redefine it. Why we have a similar #ifndef in the arm64
> rqspinlock.h, no idea, none of the arm64 maintainers acked that patch
> (shouldn't have gone in really, we were still discussing the
> implementation at the time; I also think it's slightly wrong).
>
> Your change above to rqspinlock.h makes this even more confusing when
> you look at the overall result with all the patches applied. We end up
> with the same macro in asm/rqspinlock.h but with different number of
> arguments.

I agree that my change doesn't improve on matters at all.

Just to lay out the problem, rqspinlock defines this in the common code:

  #ifndef res_smp_cond_load_acquire
  #define res_smp_cond_load_acquire(v, c) smp_cond_load_acquire(v, c)
  #endif

And, the definition of res_smp_cond_load_acquire() (only on arm64)
essentially uses smp_cond_load_acquire_timewait() such that it will
be equivalent to smp_cond_load_acquire() but one that's guaranteed
to terminate.

> I'd start with ripping out the current arm64 implementation, add a
> generic implementation to barrier.h and then override it in the arch
> code.

The problem is that rqspinlock code is mostly written as if it is
working with smp_cond_load_acquire().

Fixing it needs some amount of refactoring. I had preliminary patches
patches to do that, but my preference was to send those out after
the barrier changes.

If you feel that is best done as part of this series, I can add those
patches to v4.


Thanks for the comments!

--
ankur