[v3] arm64: Fixes for __READ_ONCE() with CONFIG_LTO=y

[PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y

Posted by Marco Elver 1 week, 2 days ago

When enabling Clang's Context Analysis (aka. Thread Safety Analysis) on
kernel/futex/core.o (see Peter's changes at [1]), in arm64 LTO builds we
could see:

| kernel/futex/core.c:982:1: warning: spinlock 'atomic ? __u.__val : q->lock_ptr' is still held at the end of function [-Wthread-safety-analysis]
|      982 | }
|          | ^
|    kernel/futex/core.c:976:2: note: spinlock acquired here
|      976 |         spin_lock(lock_ptr);
|          |         ^
| kernel/futex/core.c:982:1: warning: expecting spinlock 'q->lock_ptr' to be held at the end of function [-Wthread-safety-analysis]
|      982 | }
|          | ^
|    kernel/futex/core.c:966:6: note: spinlock acquired here
|      966 | void futex_q_lockptr_lock(struct futex_q *q)
|          |      ^
|    2 warnings generated.

Where we have:

	extern void futex_q_lockptr_lock(struct futex_q *q) __acquires(q->lock_ptr);
	..
	void futex_q_lockptr_lock(struct futex_q *q)
	{
		spinlock_t *lock_ptr;

		/*
		 * See futex_unqueue() why lock_ptr can change.
		 */
		guard(rcu)();
	retry:
>>		lock_ptr = READ_ONCE(q->lock_ptr);
		spin_lock(lock_ptr);
	...
	}

At the time of the above report (prior to removal of the 'atomic' flag),
Clang Thread Safety Analysis's alias analysis resolved 'lock_ptr' to
'atomic ?  __u.__val : q->lock_ptr' (now just '__u.__val'), and used
this as the identity of the context lock given it cannot "see through"
the inline assembly; however, we want 'q->lock_ptr' as the canonical
context lock.

While for code generation the compiler simplified to '__u.__val' for
pointers (8 byte case -> 'atomic' was set), TSA's analysis (a) happens
much earlier on the AST, and (b) would be the wrong deduction.

Now that we've gotten rid of the 'atomic' ternary comparison, we can
return '__u.__val' through a pointer that we initialize with '&x', but
then update via a pointer-to-pointer. When READ_ONCE()'ing a context
lock pointer, TSA's alias analysis does not invalidate the initial alias
when updated through the pointer-to-pointer, and we make it effectively
"see through" the __READ_ONCE().

Code generation is unchanged.

Link: https://lkml.kernel.org/r/20260121110704.221498346@infradead.org [1]
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202601221040.TeM0ihff-lkp@intel.com/
Cc: Peter Zijlstra <peterz@infradead.org>
Tested-by: Boqun Feng <boqun@kernel.org>
Signed-off-by: Marco Elver <elver@google.com>
---
v3:
* Use 'typeof(*__ret)'.
* Commit message.

v2:
* Rebase.
---
 arch/arm64/include/asm/rwonce.h | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h
index 42c9e8429274..b7de74d4bf07 100644
--- a/arch/arm64/include/asm/rwonce.h
+++ b/arch/arm64/include/asm/rwonce.h
@@ -45,8 +45,12 @@
  */
 #define __READ_ONCE(x)							\
 ({									\
-	typeof(&(x)) __x = &(x);					\
-	union { __rwonce_typeof_unqual(*__x) __val; char __c[1]; } __u;	\
+	auto __x = &(x);						\
+	auto __ret = (__rwonce_typeof_unqual(*__x) *)__x;		\
+	/* Hides alias reassignment from Clang's -Wthread-safety. */	\
+	auto __retp = &__ret;						\
+	union { typeof(*__ret) __val; char __c[1]; } __u;		\
+	*__retp = &__u.__val;						\
 	switch (sizeof(x)) {						\
 	case 1:								\
 		asm volatile(__LOAD_RCPC(b, %w0, %1)			\
@@ -71,7 +75,7 @@
 	default:							\
 		__u.__val = *(volatile typeof(*__x) *)__x;		\
 	}								\
-	__u.__val;							\
+	*__ret;								\
 })
 
 #endif	/* !BUILD_VDSO */
-- 
2.53.0.rc1.225.gd81095ad13-goog

Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y

Posted by Will Deacon 6 days, 3 hours ago

On Fri, Jan 30, 2026 at 02:28:26PM +0100, Marco Elver wrote:
> When enabling Clang's Context Analysis (aka. Thread Safety Analysis) on
> kernel/futex/core.o (see Peter's changes at [1]), in arm64 LTO builds we
> could see:
> 
> | kernel/futex/core.c:982:1: warning: spinlock 'atomic ? __u.__val : q->lock_ptr' is still held at the end of function [-Wthread-safety-analysis]
> |      982 | }
> |          | ^
> |    kernel/futex/core.c:976:2: note: spinlock acquired here
> |      976 |         spin_lock(lock_ptr);
> |          |         ^
> | kernel/futex/core.c:982:1: warning: expecting spinlock 'q->lock_ptr' to be held at the end of function [-Wthread-safety-analysis]
> |      982 | }
> |          | ^
> |    kernel/futex/core.c:966:6: note: spinlock acquired here
> |      966 | void futex_q_lockptr_lock(struct futex_q *q)
> |          |      ^
> |    2 warnings generated.
> 
> Where we have:
> 
> 	extern void futex_q_lockptr_lock(struct futex_q *q) __acquires(q->lock_ptr);
> 	..
> 	void futex_q_lockptr_lock(struct futex_q *q)
> 	{
> 		spinlock_t *lock_ptr;
> 
> 		/*
> 		 * See futex_unqueue() why lock_ptr can change.
> 		 */
> 		guard(rcu)();
> 	retry:
> >>		lock_ptr = READ_ONCE(q->lock_ptr);
> 		spin_lock(lock_ptr);
> 	...
> 	}
> 
> At the time of the above report (prior to removal of the 'atomic' flag),
> Clang Thread Safety Analysis's alias analysis resolved 'lock_ptr' to
> 'atomic ?  __u.__val : q->lock_ptr' (now just '__u.__val'), and used
> this as the identity of the context lock given it cannot "see through"
> the inline assembly; however, we want 'q->lock_ptr' as the canonical
> context lock.
> 
> While for code generation the compiler simplified to '__u.__val' for
> pointers (8 byte case -> 'atomic' was set), TSA's analysis (a) happens
> much earlier on the AST, and (b) would be the wrong deduction.
> 
> Now that we've gotten rid of the 'atomic' ternary comparison, we can
> return '__u.__val' through a pointer that we initialize with '&x', but
> then update via a pointer-to-pointer. When READ_ONCE()'ing a context
> lock pointer, TSA's alias analysis does not invalidate the initial alias
> when updated through the pointer-to-pointer, and we make it effectively
> "see through" the __READ_ONCE().
> 
> Code generation is unchanged.
> 
> Link: https://lkml.kernel.org/r/20260121110704.221498346@infradead.org [1]
> Reported-by: kernel test robot <lkp@intel.com>
> Closes: https://lore.kernel.org/oe-kbuild-all/202601221040.TeM0ihff-lkp@intel.com/
> Cc: Peter Zijlstra <peterz@infradead.org>
> Tested-by: Boqun Feng <boqun@kernel.org>
> Signed-off-by: Marco Elver <elver@google.com>
> ---
> v3:
> * Use 'typeof(*__ret)'.
> * Commit message.
> 
> v2:
> * Rebase.
> ---
>  arch/arm64/include/asm/rwonce.h | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h
> index 42c9e8429274..b7de74d4bf07 100644
> --- a/arch/arm64/include/asm/rwonce.h
> +++ b/arch/arm64/include/asm/rwonce.h
> @@ -45,8 +45,12 @@
>   */
>  #define __READ_ONCE(x)							\
>  ({									\
> -	typeof(&(x)) __x = &(x);					\
> -	union { __rwonce_typeof_unqual(*__x) __val; char __c[1]; } __u;	\
> +	auto __x = &(x);						\
> +	auto __ret = (__rwonce_typeof_unqual(*__x) *)__x;		\
> +	/* Hides alias reassignment from Clang's -Wthread-safety. */	\
> +	auto __retp = &__ret;						\
> +	union { typeof(*__ret) __val; char __c[1]; } __u;		\
> +	*__retp = &__u.__val;						\
>  	switch (sizeof(x)) {						\
>  	case 1:								\
>  		asm volatile(__LOAD_RCPC(b, %w0, %1)			\
> @@ -71,7 +75,7 @@
>  	default:							\
>  		__u.__val = *(volatile typeof(*__x) *)__x;		\
>  	}								\
> -	__u.__val;							\
> +	*__ret;								\
>  })

What does GCC do with this? :/

Will

Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y

Posted by David Laight 5 days, 23 hours ago

On Mon, 2 Feb 2026 15:39:36 +0000
Will Deacon <will@kernel.org> wrote:

> On Fri, Jan 30, 2026 at 02:28:26PM +0100, Marco Elver wrote:
> > When enabling Clang's Context Analysis (aka. Thread Safety Analysis) on
> > kernel/futex/core.o (see Peter's changes at [1]), in arm64 LTO builds we
> > could see:
> > 
> > | kernel/futex/core.c:982:1: warning: spinlock 'atomic ? __u.__val : q->lock_ptr' is still held at the end of function [-Wthread-safety-analysis]
> > |      982 | }
> > |          | ^
> > |    kernel/futex/core.c:976:2: note: spinlock acquired here
> > |      976 |         spin_lock(lock_ptr);
> > |          |         ^
> > | kernel/futex/core.c:982:1: warning: expecting spinlock 'q->lock_ptr' to be held at the end of function [-Wthread-safety-analysis]
> > |      982 | }
> > |          | ^
> > |    kernel/futex/core.c:966:6: note: spinlock acquired here
> > |      966 | void futex_q_lockptr_lock(struct futex_q *q)
> > |          |      ^
> > |    2 warnings generated.
> > 
> > Where we have:
> > 
> > 	extern void futex_q_lockptr_lock(struct futex_q *q) __acquires(q->lock_ptr);
> > 	..
> > 	void futex_q_lockptr_lock(struct futex_q *q)
> > 	{
> > 		spinlock_t *lock_ptr;
> > 
> > 		/*
> > 		 * See futex_unqueue() why lock_ptr can change.
> > 		 */
> > 		guard(rcu)();
> > 	retry:  
> > >>		lock_ptr = READ_ONCE(q->lock_ptr);  
> > 		spin_lock(lock_ptr);
> > 	...
> > 	}
> > 
> > At the time of the above report (prior to removal of the 'atomic' flag),
> > Clang Thread Safety Analysis's alias analysis resolved 'lock_ptr' to
> > 'atomic ?  __u.__val : q->lock_ptr' (now just '__u.__val'), and used
> > this as the identity of the context lock given it cannot "see through"
> > the inline assembly; however, we want 'q->lock_ptr' as the canonical
> > context lock.
> > 
> > While for code generation the compiler simplified to '__u.__val' for
> > pointers (8 byte case -> 'atomic' was set), TSA's analysis (a) happens
> > much earlier on the AST, and (b) would be the wrong deduction.
> > 
> > Now that we've gotten rid of the 'atomic' ternary comparison, we can
> > return '__u.__val' through a pointer that we initialize with '&x', but
> > then update via a pointer-to-pointer. When READ_ONCE()'ing a context
> > lock pointer, TSA's alias analysis does not invalidate the initial alias
> > when updated through the pointer-to-pointer, and we make it effectively
> > "see through" the __READ_ONCE().
> > 
> > Code generation is unchanged.
> > 
> > Link: https://lkml.kernel.org/r/20260121110704.221498346@infradead.org [1]
> > Reported-by: kernel test robot <lkp@intel.com>
> > Closes: https://lore.kernel.org/oe-kbuild-all/202601221040.TeM0ihff-lkp@intel.com/
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Tested-by: Boqun Feng <boqun@kernel.org>
> > Signed-off-by: Marco Elver <elver@google.com>
> > ---
> > v3:
> > * Use 'typeof(*__ret)'.
> > * Commit message.
> > 
> > v2:
> > * Rebase.
> > ---
> >  arch/arm64/include/asm/rwonce.h | 10 +++++++---
> >  1 file changed, 7 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h
> > index 42c9e8429274..b7de74d4bf07 100644
> > --- a/arch/arm64/include/asm/rwonce.h
> > +++ b/arch/arm64/include/asm/rwonce.h
> > @@ -45,8 +45,12 @@
> >   */
> >  #define __READ_ONCE(x)							\
> >  ({									\
> > -	typeof(&(x)) __x = &(x);					\
> > -	union { __rwonce_typeof_unqual(*__x) __val; char __c[1]; } __u;	\
> > +	auto __x = &(x);						\
> > +	auto __ret = (__rwonce_typeof_unqual(*__x) *)__x;		\
> > +	/* Hides alias reassignment from Clang's -Wthread-safety. */	\
> > +	auto __retp = &__ret;						\
> > +	union { typeof(*__ret) __val; char __c[1]; } __u;		\
> > +	*__retp = &__u.__val;						\
> >  	switch (sizeof(x)) {						\
> >  	case 1:								\
> >  		asm volatile(__LOAD_RCPC(b, %w0, %1)			\
> > @@ -71,7 +75,7 @@
> >  	default:							\
> >  		__u.__val = *(volatile typeof(*__x) *)__x;		\
> >  	}								\
> > -	__u.__val;							\
> > +	*__ret;								\
> >  })  
> 
> What does GCC do with this? :/

GCC currently doesn't see it, LTO is clang only.

	David

> 
> Will

Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y

Posted by Will Deacon 5 days, 7 hours ago

On Mon, Feb 02, 2026 at 07:29:23PM +0000, David Laight wrote:
> On Mon, 2 Feb 2026 15:39:36 +0000
> Will Deacon <will@kernel.org> wrote:
> 
> > On Fri, Jan 30, 2026 at 02:28:26PM +0100, Marco Elver wrote:
> > > When enabling Clang's Context Analysis (aka. Thread Safety Analysis) on
> > > kernel/futex/core.o (see Peter's changes at [1]), in arm64 LTO builds we
> > > could see:
> > > 
> > > | kernel/futex/core.c:982:1: warning: spinlock 'atomic ? __u.__val : q->lock_ptr' is still held at the end of function [-Wthread-safety-analysis]
> > > |      982 | }
> > > |          | ^
> > > |    kernel/futex/core.c:976:2: note: spinlock acquired here
> > > |      976 |         spin_lock(lock_ptr);
> > > |          |         ^
> > > | kernel/futex/core.c:982:1: warning: expecting spinlock 'q->lock_ptr' to be held at the end of function [-Wthread-safety-analysis]
> > > |      982 | }
> > > |          | ^
> > > |    kernel/futex/core.c:966:6: note: spinlock acquired here
> > > |      966 | void futex_q_lockptr_lock(struct futex_q *q)
> > > |          |      ^
> > > |    2 warnings generated.
> > > 
> > > Where we have:
> > > 
> > > 	extern void futex_q_lockptr_lock(struct futex_q *q) __acquires(q->lock_ptr);
> > > 	..
> > > 	void futex_q_lockptr_lock(struct futex_q *q)
> > > 	{
> > > 		spinlock_t *lock_ptr;
> > > 
> > > 		/*
> > > 		 * See futex_unqueue() why lock_ptr can change.
> > > 		 */
> > > 		guard(rcu)();
> > > 	retry:  
> > > >>		lock_ptr = READ_ONCE(q->lock_ptr);  
> > > 		spin_lock(lock_ptr);
> > > 	...
> > > 	}
> > > 
> > > At the time of the above report (prior to removal of the 'atomic' flag),
> > > Clang Thread Safety Analysis's alias analysis resolved 'lock_ptr' to
> > > 'atomic ?  __u.__val : q->lock_ptr' (now just '__u.__val'), and used
> > > this as the identity of the context lock given it cannot "see through"
> > > the inline assembly; however, we want 'q->lock_ptr' as the canonical
> > > context lock.
> > > 
> > > While for code generation the compiler simplified to '__u.__val' for
> > > pointers (8 byte case -> 'atomic' was set), TSA's analysis (a) happens
> > > much earlier on the AST, and (b) would be the wrong deduction.
> > > 
> > > Now that we've gotten rid of the 'atomic' ternary comparison, we can
> > > return '__u.__val' through a pointer that we initialize with '&x', but
> > > then update via a pointer-to-pointer. When READ_ONCE()'ing a context
> > > lock pointer, TSA's alias analysis does not invalidate the initial alias
> > > when updated through the pointer-to-pointer, and we make it effectively
> > > "see through" the __READ_ONCE().
> > > 
> > > Code generation is unchanged.
> > > 
> > > Link: https://lkml.kernel.org/r/20260121110704.221498346@infradead.org [1]
> > > Reported-by: kernel test robot <lkp@intel.com>
> > > Closes: https://lore.kernel.org/oe-kbuild-all/202601221040.TeM0ihff-lkp@intel.com/
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Tested-by: Boqun Feng <boqun@kernel.org>
> > > Signed-off-by: Marco Elver <elver@google.com>
> > > ---
> > > v3:
> > > * Use 'typeof(*__ret)'.
> > > * Commit message.
> > > 
> > > v2:
> > > * Rebase.
> > > ---
> > >  arch/arm64/include/asm/rwonce.h | 10 +++++++---
> > >  1 file changed, 7 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h
> > > index 42c9e8429274..b7de74d4bf07 100644
> > > --- a/arch/arm64/include/asm/rwonce.h
> > > +++ b/arch/arm64/include/asm/rwonce.h
> > > @@ -45,8 +45,12 @@
> > >   */
> > >  #define __READ_ONCE(x)							\
> > >  ({									\
> > > -	typeof(&(x)) __x = &(x);					\
> > > -	union { __rwonce_typeof_unqual(*__x) __val; char __c[1]; } __u;	\
> > > +	auto __x = &(x);						\
> > > +	auto __ret = (__rwonce_typeof_unqual(*__x) *)__x;		\
> > > +	/* Hides alias reassignment from Clang's -Wthread-safety. */	\
> > > +	auto __retp = &__ret;						\
> > > +	union { typeof(*__ret) __val; char __c[1]; } __u;		\
> > > +	*__retp = &__u.__val;						\
> > >  	switch (sizeof(x)) {						\
> > >  	case 1:								\
> > >  		asm volatile(__LOAD_RCPC(b, %w0, %1)			\
> > > @@ -71,7 +75,7 @@
> > >  	default:							\
> > >  		__u.__val = *(volatile typeof(*__x) *)__x;		\
> > >  	}								\
> > > -	__u.__val;							\
> > > +	*__ret;								\
> > >  })  
> > 
> > What does GCC do with this? :/
> 
> GCC currently doesn't see it, LTO is clang only.

LTO is just one way that a compiler could end up breaking dependency
chains, so I really want to maintain the option to enable this path for
GCC in case we run into problems caused by other optimisations in future.

Will

Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y

Posted by Marco Elver 4 days, 8 hours ago

On Tue, 3 Feb 2026 at 12:47, Will Deacon <will@kernel.org> wrote:
[...]
> > > What does GCC do with this? :/
> >
> > GCC currently doesn't see it, LTO is clang only.
>
> LTO is just one way that a compiler could end up breaking dependency
> chains, so I really want to maintain the option to enable this path for
> GCC in case we run into problems caused by other optimisations in future.

It will work for GCC, but only from GCC 11. Before that __auto_type
does not drop qualifiers:
https://godbolt.org/z/sc5bcnzKd (switch to GCC 11 to see it compile)

So to summarize, all supported Clang versions deal with __auto_type
correctly for the fallback; GCC from version 11 does (kernel currently
supports GCC 8 and above). From GCC 14 and Clang 19 we have
__typeof_unqual__.

I really don't see another way forward; there's no other good way to
solve this issue. I would advise against pessimizing new compilers and
features because maybe one day we might still want to enable this
version of READ_ONCE() for GCC 8-10.

Should we one day choose to enable this READ_ONCE() version for GCC,
we will (a) either have bumped the minimum GCC version to 11+, or (b)
we can only do so from GCC 11. At this point GCC 11 was released 5
years ago!

Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y

Posted by Peter Zijlstra 4 days, 6 hours ago

On Wed, Feb 04, 2026 at 11:46:02AM +0100, Marco Elver wrote:
> On Tue, 3 Feb 2026 at 12:47, Will Deacon <will@kernel.org> wrote:
> [...]
> > > > What does GCC do with this? :/
> > >
> > > GCC currently doesn't see it, LTO is clang only.
> >
> > LTO is just one way that a compiler could end up breaking dependency
> > chains, so I really want to maintain the option to enable this path for
> > GCC in case we run into problems caused by other optimisations in future.
> 
> It will work for GCC, but only from GCC 11. Before that __auto_type
> does not drop qualifiers:
> https://godbolt.org/z/sc5bcnzKd (switch to GCC 11 to see it compile)
> 
> So to summarize, all supported Clang versions deal with __auto_type
> correctly for the fallback; GCC from version 11 does (kernel currently
> supports GCC 8 and above). From GCC 14 and Clang 19 we have
> __typeof_unqual__.
> 
> I really don't see another way forward; there's no other good way to
> solve this issue. I would advise against pessimizing new compilers and
> features because maybe one day we might still want to enable this
> version of READ_ONCE() for GCC 8-10.
> 
> Should we one day choose to enable this READ_ONCE() version for GCC,
> we will (a) either have bumped the minimum GCC version to 11+, or (b)
> we can only do so from GCC 11. At this point GCC 11 was released 5
> years ago!

There is, from this thread:

  https://lkml.kernel.org/r/20260111182010.GH3634291@ZenIV

another trick to strip qualifiers:

  #define unqual_non_array(T) __typeof__(((T(*)(void))0)())

which will work from GCC-8.4 onwards. Arguably, it should be possible to
raise the minimum from 8 to 8.4 (IMO).

But yes; in general I think it is fine to have 'old' compilers generate
suboptimal code.

Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y

Posted by Will Deacon 4 days, 5 hours ago

On Wed, Feb 04, 2026 at 02:14:00PM +0100, Peter Zijlstra wrote:
> On Wed, Feb 04, 2026 at 11:46:02AM +0100, Marco Elver wrote:
> > On Tue, 3 Feb 2026 at 12:47, Will Deacon <will@kernel.org> wrote:
> > [...]
> > > > > What does GCC do with this? :/
> > > >
> > > > GCC currently doesn't see it, LTO is clang only.
> > >
> > > LTO is just one way that a compiler could end up breaking dependency
> > > chains, so I really want to maintain the option to enable this path for
> > > GCC in case we run into problems caused by other optimisations in future.
> > 
> > It will work for GCC, but only from GCC 11. Before that __auto_type
> > does not drop qualifiers:
> > https://godbolt.org/z/sc5bcnzKd (switch to GCC 11 to see it compile)
> > 
> > So to summarize, all supported Clang versions deal with __auto_type
> > correctly for the fallback; GCC from version 11 does (kernel currently
> > supports GCC 8 and above). From GCC 14 and Clang 19 we have
> > __typeof_unqual__.
> > 
> > I really don't see another way forward; there's no other good way to
> > solve this issue. I would advise against pessimizing new compilers and
> > features because maybe one day we might still want to enable this
> > version of READ_ONCE() for GCC 8-10.
> > 
> > Should we one day choose to enable this READ_ONCE() version for GCC,
> > we will (a) either have bumped the minimum GCC version to 11+, or (b)
> > we can only do so from GCC 11. At this point GCC 11 was released 5
> > years ago!
> 
> There is, from this thread:
> 
>   https://lkml.kernel.org/r/20260111182010.GH3634291@ZenIV
> 
> another trick to strip qualifiers:
> 
>   #define unqual_non_array(T) __typeof__(((T(*)(void))0)())
> 
> which will work from GCC-8.4 onwards. Arguably, it should be possible to
> raise the minimum from 8 to 8.4 (IMO).

That sounds reasonable to me but I'm not usually the one to push back
on raising the minimum compiler version!

> But yes; in general I think it is fine to have 'old' compilers generate
> suboptimal code.

I'm absolutely fine with the codegen being terrible for ancient
toolchains as long as it's correct.

Will

Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y

Posted by Marco Elver 2 days, 4 hours ago

 On Wed, 4 Feb 2026 at 15:15, Will Deacon <will@kernel.org> wrote:
>
> On Wed, Feb 04, 2026 at 02:14:00PM +0100, Peter Zijlstra wrote:
> > On Wed, Feb 04, 2026 at 11:46:02AM +0100, Marco Elver wrote:
> > > On Tue, 3 Feb 2026 at 12:47, Will Deacon <will@kernel.org> wrote:
> > > [...]
> > > > > > What does GCC do with this? :/
> > > > >
> > > > > GCC currently doesn't see it, LTO is clang only.
> > > >
> > > > LTO is just one way that a compiler could end up breaking dependency
> > > > chains, so I really want to maintain the option to enable this path for
> > > > GCC in case we run into problems caused by other optimisations in future.
> > >
> > > It will work for GCC, but only from GCC 11. Before that __auto_type
> > > does not drop qualifiers:
> > > https://godbolt.org/z/sc5bcnzKd (switch to GCC 11 to see it compile)
> > >
> > > So to summarize, all supported Clang versions deal with __auto_type
> > > correctly for the fallback; GCC from version 11 does (kernel currently
> > > supports GCC 8 and above). From GCC 14 and Clang 19 we have
> > > __typeof_unqual__.
> > >
> > > I really don't see another way forward; there's no other good way to
> > > solve this issue. I would advise against pessimizing new compilers and
> > > features because maybe one day we might still want to enable this
> > > version of READ_ONCE() for GCC 8-10.
> > >
> > > Should we one day choose to enable this READ_ONCE() version for GCC,
> > > we will (a) either have bumped the minimum GCC version to 11+, or (b)
> > > we can only do so from GCC 11. At this point GCC 11 was released 5
> > > years ago!
> >
> > There is, from this thread:
> >
> >   https://lkml.kernel.org/r/20260111182010.GH3634291@ZenIV
> >
> > another trick to strip qualifiers:
> >
> >   #define unqual_non_array(T) __typeof__(((T(*)(void))0)())
> >
> > which will work from GCC-8.4 onwards. Arguably, it should be possible to
> > raise the minimum from 8 to 8.4 (IMO).

That looks like an interesting option.

> That sounds reasonable to me but I'm not usually the one to push back
> on raising the minimum compiler version!
>
> > But yes; in general I think it is fine to have 'old' compilers generate
> > suboptimal code.
>
> I'm absolutely fine with the codegen being terrible for ancient
> toolchains as long as it's correct.

From that discussion a month ago and this one, it seems we need
something to fix __unqual_scalar_typeof().

What's the way forward?

1. Bump minimum GCC version to 8.4. Replace __unqual_scalar_typeof()
for old compilers with the better unqual_non_array hack?

2. Leave __unqual_scalar_typeof() as-is. The patch "compiler: Use
__typeof_unqual__() for __unqual_scalar_typeof()" will fix the codegen
issues for new compilers. Doesn't fix not dropping 'const' for old
compilers for non-scalar types, and requires localized workarounds
(like this patch here).

Either way we need a fix for this arm64 LTO version to fix the
context-analysis "see through" the inline asm (how this patch series
started).

Option #1 needs a lot more due-diligence and testing that it all works
for all compilers and configs (opening Pandora's Box :-)). For option
#2 we just need these patches here to at least fix the acute issue
with this arm64 LTO version.

Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y

Posted by David Laight 2 days ago

On Fri, 6 Feb 2026 16:09:35 +0100
Marco Elver <elver@google.com> wrote:

>  On Wed, 4 Feb 2026 at 15:15, Will Deacon <will@kernel.org> wrote:
> >
> > On Wed, Feb 04, 2026 at 02:14:00PM +0100, Peter Zijlstra wrote:  
> > > On Wed, Feb 04, 2026 at 11:46:02AM +0100, Marco Elver wrote:  
> > > > On Tue, 3 Feb 2026 at 12:47, Will Deacon <will@kernel.org> wrote:
> > > > [...]  
> > > > > > > What does GCC do with this? :/  
> > > > > >
> > > > > > GCC currently doesn't see it, LTO is clang only.  
> > > > >
> > > > > LTO is just one way that a compiler could end up breaking dependency
> > > > > chains, so I really want to maintain the option to enable this path for
> > > > > GCC in case we run into problems caused by other optimisations in future.  
> > > >
> > > > It will work for GCC, but only from GCC 11. Before that __auto_type
> > > > does not drop qualifiers:
> > > > https://godbolt.org/z/sc5bcnzKd (switch to GCC 11 to see it compile)
> > > >
> > > > So to summarize, all supported Clang versions deal with __auto_type
> > > > correctly for the fallback; GCC from version 11 does (kernel currently
> > > > supports GCC 8 and above). From GCC 14 and Clang 19 we have
> > > > __typeof_unqual__.
> > > >
> > > > I really don't see another way forward; there's no other good way to
> > > > solve this issue. I would advise against pessimizing new compilers and
> > > > features because maybe one day we might still want to enable this
> > > > version of READ_ONCE() for GCC 8-10.
> > > >
> > > > Should we one day choose to enable this READ_ONCE() version for GCC,
> > > > we will (a) either have bumped the minimum GCC version to 11+, or (b)
> > > > we can only do so from GCC 11. At this point GCC 11 was released 5
> > > > years ago!  
> > >
> > > There is, from this thread:
> > >
> > >   https://lkml.kernel.org/r/20260111182010.GH3634291@ZenIV
> > >
> > > another trick to strip qualifiers:
> > >
> > >   #define unqual_non_array(T) __typeof__(((T(*)(void))0)())
> > >
> > > which will work from GCC-8.4 onwards. Arguably, it should be possible to
> > > raise the minimum from 8 to 8.4 (IMO).  
> 
> That looks like an interesting option.
> 
> > That sounds reasonable to me but I'm not usually the one to push back
> > on raising the minimum compiler version!
> >  
> > > But yes; in general I think it is fine to have 'old' compilers generate
> > > suboptimal code.  
> >
> > I'm absolutely fine with the codegen being terrible for ancient
> > toolchains as long as it's correct.  
> 
> From that discussion a month ago and this one, it seems we need
> something to fix __unqual_scalar_typeof().
> 
> What's the way forward?
> 
> 1. Bump minimum GCC version to 8.4. Replace __unqual_scalar_typeof()
> for old compilers with the better unqual_non_array hack?
> 
> 2. Leave __unqual_scalar_typeof() as-is. The patch "compiler: Use
> __typeof_unqual__() for __unqual_scalar_typeof()" will fix the codegen
> issues for new compilers. Doesn't fix not dropping 'const' for old
> compilers for non-scalar types, and requires localized workarounds
> (like this patch here).
> 
> Either way we need a fix for this arm64 LTO version to fix the
> context-analysis "see through" the inline asm (how this patch series
> started).
> 
> Option #1 needs a lot more due-diligence and testing that it all works
> for all compilers and configs (opening Pandora's Box :-)). For option
> #2 we just need these patches here to at least fix the acute issue
> with this arm64 LTO version.

Option 3.

Look are where/why they are used and change the code to do it differently.
Don't forget the similar __unsigned_scalar_typeof() in bitfield.h.
(I posted a patch that nuked that one not long ago - used sizeof instead.)

The one in minmax_array (in minmax.h) is particularly pointless.
The value 'suffers' integer promotion as soon as it is used, nothing
wrong with 'auto _x = x + 0' there.
That will work elsewhere.

	David

Re: [PATCH v3 3/3] arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y

Posted by David Laight 1 week, 2 days ago

On Fri, 30 Jan 2026 14:28:26 +0100
Marco Elver <elver@google.com> wrote:

> When enabling Clang's Context Analysis (aka. Thread Safety Analysis) on
> kernel/futex/core.o (see Peter's changes at [1]), in arm64 LTO builds we
> could see:
> 
> | kernel/futex/core.c:982:1: warning: spinlock 'atomic ? __u.__val : q->lock_ptr' is still held at the end of function [-Wthread-safety-analysis]
> |      982 | }
> |          | ^
> |    kernel/futex/core.c:976:2: note: spinlock acquired here
> |      976 |         spin_lock(lock_ptr);
> |          |         ^
> | kernel/futex/core.c:982:1: warning: expecting spinlock 'q->lock_ptr' to be held at the end of function [-Wthread-safety-analysis]
> |      982 | }
> |          | ^
> |    kernel/futex/core.c:966:6: note: spinlock acquired here
> |      966 | void futex_q_lockptr_lock(struct futex_q *q)
> |          |      ^
> |    2 warnings generated.
> 
> Where we have:
> 
> 	extern void futex_q_lockptr_lock(struct futex_q *q) __acquires(q->lock_ptr);
> 	..
> 	void futex_q_lockptr_lock(struct futex_q *q)
> 	{
> 		spinlock_t *lock_ptr;
> 
> 		/*
> 		 * See futex_unqueue() why lock_ptr can change.
> 		 */
> 		guard(rcu)();
> 	retry:
> >>		lock_ptr = READ_ONCE(q->lock_ptr);  
> 		spin_lock(lock_ptr);
> 	...
> 	}
> 
> At the time of the above report (prior to removal of the 'atomic' flag),
> Clang Thread Safety Analysis's alias analysis resolved 'lock_ptr' to
> 'atomic ?  __u.__val : q->lock_ptr' (now just '__u.__val'), and used
> this as the identity of the context lock given it cannot "see through"
> the inline assembly; however, we want 'q->lock_ptr' as the canonical
> context lock.
> 
> While for code generation the compiler simplified to '__u.__val' for
> pointers (8 byte case -> 'atomic' was set), TSA's analysis (a) happens
> much earlier on the AST, and (b) would be the wrong deduction.
> 
> Now that we've gotten rid of the 'atomic' ternary comparison, we can
> return '__u.__val' through a pointer that we initialize with '&x', but
> then update via a pointer-to-pointer. When READ_ONCE()'ing a context
> lock pointer, TSA's alias analysis does not invalidate the initial alias
> when updated through the pointer-to-pointer, and we make it effectively
> "see through" the __READ_ONCE().
> 
> Code generation is unchanged.
> 
> Link: https://lkml.kernel.org/r/20260121110704.221498346@infradead.org [1]
> Reported-by: kernel test robot <lkp@intel.com>
> Closes: https://lore.kernel.org/oe-kbuild-all/202601221040.TeM0ihff-lkp@intel.com/
> Cc: Peter Zijlstra <peterz@infradead.org>
> Tested-by: Boqun Feng <boqun@kernel.org>
> Signed-off-by: Marco Elver <elver@google.com>

LGTM (for an obscure definition of G).

Reviewed-by: David Laight <david.laight.linux@gmail.com>

> ---
> v3:
> * Use 'typeof(*__ret)'.
> * Commit message.
> 
> v2:
> * Rebase.
> ---
>  arch/arm64/include/asm/rwonce.h | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h
> index 42c9e8429274..b7de74d4bf07 100644
> --- a/arch/arm64/include/asm/rwonce.h
> +++ b/arch/arm64/include/asm/rwonce.h
> @@ -45,8 +45,12 @@
>   */
>  #define __READ_ONCE(x)							\
>  ({									\
> -	typeof(&(x)) __x = &(x);					\
> -	union { __rwonce_typeof_unqual(*__x) __val; char __c[1]; } __u;	\
> +	auto __x = &(x);						\
> +	auto __ret = (__rwonce_typeof_unqual(*__x) *)__x;		\
> +	/* Hides alias reassignment from Clang's -Wthread-safety. */	\
> +	auto __retp = &__ret;						\
> +	union { typeof(*__ret) __val; char __c[1]; } __u;		\
> +	*__retp = &__u.__val;						\
>  	switch (sizeof(x)) {						\
>  	case 1:								\
>  		asm volatile(__LOAD_RCPC(b, %w0, %1)			\
> @@ -71,7 +75,7 @@
>  	default:							\
>  		__u.__val = *(volatile typeof(*__x) *)__x;		\
>  	}								\
> -	__u.__val;							\
> +	*__ret;								\
>  })
>  
>  #endif	/* !BUILD_VDSO */