Rework arm64 LTO __READ_ONCE() to improve code generation as follows:
1. Replace the _Generic-based __unqual_scalar_typeof() with the builtin
typeof_unqual(). This strips qualifiers from all types, not just
integer types, which is required to be able to assign (must be
non-const) to __u.__val in the non-atomic case (required for #2).
One subtle point here is that non-integer types of __val could be const
or volatile within the union with the old __unqual_scalar_typeof(), if
the passed variable is const or volatile. This would then result in a
forced load from the stack if __u.__val is volatile; in the case of
const, it does look odd if the underlying storage changes, but the
compiler is told said member is "const" -- it smells like UB.
2. Eliminate the atomic flag and ternary conditional expression. Move
the fallback volatile load into the default case of the switch,
ensuring __u is unconditionally initialized across all paths.
The statement expression now unconditionally returns __u.__val.
This refactoring appears to help the compiler improve (or fix) code
generation.
With a defconfig + LTO + debug options builds, we observe different
codegen for the following functions:
btrfs_reclaim_sweep (708 -> 1032 bytes)
btrfs_sinfo_bg_reclaim_threshold_store (200 -> 204 bytes)
check_mem_access (3652 -> 3692 bytes) [inlined bpf_map_is_rdonly]
console_flush_all (1268 -> 1264 bytes)
console_lock_spinning_disable_and_check (180 -> 176 bytes)
igb_add_filter (640 -> 636 bytes)
igb_config_tx_modes (2404 -> 2400 bytes)
kvm_vcpu_on_spin (480 -> 476 bytes)
map_freeze (376 -> 380 bytes)
netlink_bind (1664 -> 1656 bytes)
nmi_cpu_backtrace (404 -> 400 bytes)
set_rps_cpu (516 -> 520 bytes)
swap_cluster_readahead (944 -> 932 bytes)
tcp_accecn_third_ack (328 -> 336 bytes)
tcp_create_openreq_child (1764 -> 1772 bytes)
tcp_data_queue (5784 -> 5892 bytes)
tcp_ecn_rcv_synack (620 -> 628 bytes)
xen_manage_runstate_time (944 -> 896 bytes)
xen_steal_clock (340 -> 296 bytes)
Increase of some functions are due to more aggressive inlining due to
better codegen (in this build, e.g. bpf_map_is_rdonly is no longer
present due to being inlined completely).
Signed-off-by: Marco Elver <elver@google.com>
---
arch/arm64/include/asm/rwonce.h | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h
index fc0fb42b0b64..9963948f4b44 100644
--- a/arch/arm64/include/asm/rwonce.h
+++ b/arch/arm64/include/asm/rwonce.h
@@ -32,8 +32,7 @@
#define __READ_ONCE(x) \
({ \
typeof(&(x)) __x = &(x); \
- int atomic = 1; \
- union { __unqual_scalar_typeof(*__x) __val; char __c[1]; } __u; \
+ union { TYPEOF_UNQUAL(*__x) __val; char __c[1]; } __u; \
switch (sizeof(x)) { \
case 1: \
asm volatile(__LOAD_RCPC(b, %w0, %1) \
@@ -56,9 +55,9 @@
: "Q" (*__x) : "memory"); \
break; \
default: \
- atomic = 0; \
+ __u.__val = *(volatile typeof(*__x) *)__x; \
} \
- atomic ? (typeof(*__x))__u.__val : (*(volatile typeof(*__x) *)__x);\
+ __u.__val; \
})
#endif /* !BUILD_VDSO */
--
2.52.0.457.g6b5491de43-goog
On Mon, 26 Jan 2026 01:25:11 +0100
Marco Elver <elver@google.com> wrote:
> Rework arm64 LTO __READ_ONCE() to improve code generation as follows:
>
> 1. Replace the _Generic-based __unqual_scalar_typeof() with the builtin
> typeof_unqual(). This strips qualifiers from all types, not just
> integer types, which is required to be able to assign (must be
> non-const) to __u.__val in the non-atomic case (required for #2).
>
> One subtle point here is that non-integer types of __val could be const
> or volatile within the union with the old __unqual_scalar_typeof(), if
> the passed variable is const or volatile. This would then result in a
> forced load from the stack if __u.__val is volatile; in the case of
> const, it does look odd if the underlying storage changes, but the
> compiler is told said member is "const" -- it smells like UB.
>
> 2. Eliminate the atomic flag and ternary conditional expression. Move
> the fallback volatile load into the default case of the switch,
> ensuring __u is unconditionally initialized across all paths.
> The statement expression now unconditionally returns __u.__val.
Does it even need to be a union?
I think (eg):
TYPEOF_UNQUAL(*__x) __val; \
...
: "=r" (*(__u32 *)&__val) \
will have the same effect (might need an __force for sparse).
Also is the 'default' branch even needed?
READ_ONCE() rejects sizes other than 1, 2, 4 and 8.
A quick search only found one oversize read - for 'struct vcpu_runstate_info'
in arch/x86/kvm/xen.c
Requiring that code use a different define might make sense.
I also did some x86-64 build timings with compiletime_assert_rwonce_type()
commented out.
Expanding and compiling that check seems to add just over 1% to the
build time.
So anything to shrink that define is likely to be noticeable.
David
>
...
> Signed-off-by: Marco Elver <elver@google.com>
> ---
> arch/arm64/include/asm/rwonce.h | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h
> index fc0fb42b0b64..9963948f4b44 100644
> --- a/arch/arm64/include/asm/rwonce.h
> +++ b/arch/arm64/include/asm/rwonce.h
> @@ -32,8 +32,7 @@
> #define __READ_ONCE(x) \
> ({ \
> typeof(&(x)) __x = &(x); \
> - int atomic = 1; \
> - union { __unqual_scalar_typeof(*__x) __val; char __c[1]; } __u; \
> + union { TYPEOF_UNQUAL(*__x) __val; char __c[1]; } __u; \
> switch (sizeof(x)) { \
> case 1: \
> asm volatile(__LOAD_RCPC(b, %w0, %1) \
> @@ -56,9 +55,9 @@
> : "Q" (*__x) : "memory"); \
> break; \
> default: \
> - atomic = 0; \
> + __u.__val = *(volatile typeof(*__x) *)__x; \
> } \
> - atomic ? (typeof(*__x))__u.__val : (*(volatile typeof(*__x) *)__x);\
> + __u.__val; \
> })
>
> #endif /* !BUILD_VDSO */
On Mon, 26 Jan 2026 at 12:16, David Laight <david.laight.linux@gmail.com> wrote: > > On Mon, 26 Jan 2026 01:25:11 +0100 > Marco Elver <elver@google.com> wrote: > > > Rework arm64 LTO __READ_ONCE() to improve code generation as follows: > > > > 1. Replace the _Generic-based __unqual_scalar_typeof() with the builtin > > typeof_unqual(). This strips qualifiers from all types, not just > > integer types, which is required to be able to assign (must be > > non-const) to __u.__val in the non-atomic case (required for #2). > > > > One subtle point here is that non-integer types of __val could be const > > or volatile within the union with the old __unqual_scalar_typeof(), if > > the passed variable is const or volatile. This would then result in a > > forced load from the stack if __u.__val is volatile; in the case of > > const, it does look odd if the underlying storage changes, but the > > compiler is told said member is "const" -- it smells like UB. > > > > 2. Eliminate the atomic flag and ternary conditional expression. Move > > the fallback volatile load into the default case of the switch, > > ensuring __u is unconditionally initialized across all paths. > > The statement expression now unconditionally returns __u.__val. > > Does it even need to be a union? > I think (eg): > TYPEOF_UNQUAL(*__x) __val; \ > ... > : "=r" (*(__u32 *)&__val) \ > will have the same effect (might need an __force for sparse). Unsure, but we might be treading on UB even with -fno-strict-aliasing given all the inline asm around here. > Also is the 'default' branch even needed? > READ_ONCE() rejects sizes other than 1, 2, 4 and 8. > A quick search only found one oversize read - for 'struct vcpu_runstate_info' > in arch/x86/kvm/xen.c > Requiring that code use a different define might make sense. > > I also did some x86-64 build timings with compiletime_assert_rwonce_type() > commented out. > Expanding and compiling that check seems to add just over 1% to the > build time. > So anything to shrink that define is likely to be noticeable. The compiletime_assert_rwonce_type() is for the benefit of the asm-generic variant, which is implemented like the 'default' case here by default. This here is only the arm64 override of all that with LTO.
On Tue, 27 Jan 2026 00:15:38 +0100 Marco Elver <elver@google.com> wrote: > On Mon, 26 Jan 2026 at 12:16, David Laight <david.laight.linux@gmail.com> wrote: > > > > On Mon, 26 Jan 2026 01:25:11 +0100 > > Marco Elver <elver@google.com> wrote: ... > > Also is the 'default' branch even needed? > > READ_ONCE() rejects sizes other than 1, 2, 4 and 8. > > A quick search only found one oversize read - for 'struct vcpu_runstate_info' > > in arch/x86/kvm/xen.c > > Requiring that code use a different define might make sense. > > > > I also did some x86-64 build timings with compiletime_assert_rwonce_type() > > commented out. > > Expanding and compiling that check seems to add just over 1% to the > > build time. > > So anything to shrink that define is likely to be noticeable. > > The compiletime_assert_rwonce_type() is for the benefit of the > asm-generic variant, which is implemented like the 'default' case here > by default. True, but compiletime_assert_rwonce_type() stops you getting to the 'default' case in this version - unless code uses __READ_ONCE() and I couldn't find any examples. So the 'default' could probably just be a BUILD_BUG_ON(). Although part of the 'bloat' from compiletime_assert_rwonce_type() is the repeated definition of the error function. A few places define an 'error function' rather than using compiletime_assert(), perhaps there should be: #define compiletime_error_fn(name, msg) \ __noreturn extern void name __compiletime_error(msg) for consistency. If compiletime_assert_rwonce_type() used a named function this code could just call the same one. > This here is only the arm64 override of all that with LTO. I keep missing that conditional (and that it means clang) ... It also explains why only a few builds get a signedness error from min() because the 'atomic ? x : y' causes integer promotion to happen. David
On Mon, Jan 26, 2026, at 01:25, Marco Elver wrote:
> diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h
> index fc0fb42b0b64..9963948f4b44 100644
> --- a/arch/arm64/include/asm/rwonce.h
> +++ b/arch/arm64/include/asm/rwonce.h
> @@ -32,8 +32,7 @@
> #define __READ_ONCE(x) \
> ({ \
> typeof(&(x)) __x = &(x); \
> - int atomic = 1; \
> - union { __unqual_scalar_typeof(*__x) __val; char __c[1]; } __u; \
> + union { TYPEOF_UNQUAL(*__x) __val; char __c[1]; } __u; \
> switch (sizeof(x)) { \
> case 1: \
> asm volatile(__LOAD_RCPC(b, %w0, %1) \
How does this work with CC_HAS_TYPEOF_UNQUAL=false?
As far as I can tell, TYPEOF_UNQUAL() falls back to __typeof__
on gcc-13, clang-18 and earlier, and not strip out qualifiers.
With fd69b2f7d5f4 ("compiler: Use __typeof_unqual__() for
__unqual_scalar_typeof()"), I would expect __unqual_scalar_typeof()
to do the right thing already.
Arnd
On Mon, Jan 26, 2026 at 08:56AM +0100, Arnd Bergmann wrote:
> On Mon, Jan 26, 2026, at 01:25, Marco Elver wrote:
> > diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h
> > index fc0fb42b0b64..9963948f4b44 100644
> > --- a/arch/arm64/include/asm/rwonce.h
> > +++ b/arch/arm64/include/asm/rwonce.h
> > @@ -32,8 +32,7 @@
> > #define __READ_ONCE(x) \
> > ({ \
> > typeof(&(x)) __x = &(x); \
> > - int atomic = 1; \
> > - union { __unqual_scalar_typeof(*__x) __val; char __c[1]; } __u; \
> > + union { TYPEOF_UNQUAL(*__x) __val; char __c[1]; } __u; \
> > switch (sizeof(x)) { \
> > case 1: \
> > asm volatile(__LOAD_RCPC(b, %w0, %1) \
>
> How does this work with CC_HAS_TYPEOF_UNQUAL=false?
>
> As far as I can tell, TYPEOF_UNQUAL() falls back to __typeof__
> on gcc-13, clang-18 and earlier, and not strip out qualifiers.
I think we only need to worry about Clang for LTO builds. But yeah, our
minimum supported Clang is 15, so between 15-18 it'd be broken.
> With fd69b2f7d5f4 ("compiler: Use __typeof_unqual__() for
> __unqual_scalar_typeof()"), I would expect __unqual_scalar_typeof()
> to do the right thing already.
It'd still be broken for Clang 15-18, so it won't help much. We need
this to work for more than "scalar", so even though it'll work for Clang
19+ given the redefinition to __typeof_unqual__, we should deprecate the
_Generic-based __unqual_scalar_typeof() sooner than later.
I was able to make this work for older compilers:
diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h
index 85b1dd7b0274..d6c808cc01be 100644
--- a/arch/arm64/include/asm/rwonce.h
+++ b/arch/arm64/include/asm/rwonce.h
@@ -19,6 +19,18 @@
"ldapr" #sfx "\t" #regs, \
ARM64_HAS_LDAPR)
+#ifdef USE_TYPEOF_UNQUAL
+#define __read_once_typeof(x) TYPEOF_UNQUAL(x)
+#else
+/*
+ * Fallback for older compilers to infer an unqualified type, using the fact
+ * that __auto_type is supposed to drop qualifiers. Unlike typeof_unqual(), the
+ * type must be complete (defines an unevaluated local variable). This must
+ * already be guaranteed because sizeof(x) is used in the __READ_ONCE macro.
+ */
+#define __read_once_typeof(x) typeof(({ __auto_type ____t = (x); ____t; }))
+#endif
+
/*
* When building with LTO, there is an increased risk of the compiler
* converting an address dependency headed by a READ_ONCE() invocation
@@ -32,8 +44,8 @@
#define __READ_ONCE(x) \
({ \
auto __x = &(x); \
- auto __ret = (TYPEOF_UNQUAL(*__x) *)__x, *__retp = &__ret; \
- union { TYPEOF_UNQUAL(*__x) __val; char __c[1]; } __u; \
+ auto __ret = (__read_once_typeof(*__x) *)__x, *__retp = &__ret; \
+ union { __read_once_typeof(*__x) __val; char __c[1]; } __u; \
*__retp = &__u.__val; \
switch (sizeof(x)) { \
case 1: \
Thoughts?
On Mon, 26 Jan 2026 20:54:24 +0100
Marco Elver <elver@google.com> wrote:
> On Mon, Jan 26, 2026 at 08:56AM +0100, Arnd Bergmann wrote:
> > On Mon, Jan 26, 2026, at 01:25, Marco Elver wrote:
> > > diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h
> > > index fc0fb42b0b64..9963948f4b44 100644
> > > --- a/arch/arm64/include/asm/rwonce.h
> > > +++ b/arch/arm64/include/asm/rwonce.h
> > > @@ -32,8 +32,7 @@
> > > #define __READ_ONCE(x) \
> > > ({ \
> > > typeof(&(x)) __x = &(x); \
> > > - int atomic = 1; \
> > > - union { __unqual_scalar_typeof(*__x) __val; char __c[1]; } __u; \
> > > + union { TYPEOF_UNQUAL(*__x) __val; char __c[1]; } __u; \
> > > switch (sizeof(x)) { \
> > > case 1: \
> > > asm volatile(__LOAD_RCPC(b, %w0, %1) \
> >
> > How does this work with CC_HAS_TYPEOF_UNQUAL=false?
> >
> > As far as I can tell, TYPEOF_UNQUAL() falls back to __typeof__
> > on gcc-13, clang-18 and earlier, and not strip out qualifiers.
>
> I think we only need to worry about Clang for LTO builds. But yeah, our
> minimum supported Clang is 15, so between 15-18 it'd be broken.
>
> > With fd69b2f7d5f4 ("compiler: Use __typeof_unqual__() for
> > __unqual_scalar_typeof()"), I would expect __unqual_scalar_typeof()
> > to do the right thing already.
>
> It'd still be broken for Clang 15-18, so it won't help much. We need
> this to work for more than "scalar", so even though it'll work for Clang
> 19+ given the redefinition to __typeof_unqual__, we should deprecate the
> _Generic-based __unqual_scalar_typeof() sooner than later.
>
> I was able to make this work for older compilers:
>
> diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h
> index 85b1dd7b0274..d6c808cc01be 100644
> --- a/arch/arm64/include/asm/rwonce.h
> +++ b/arch/arm64/include/asm/rwonce.h
> @@ -19,6 +19,18 @@
> "ldapr" #sfx "\t" #regs, \
> ARM64_HAS_LDAPR)
>
> +#ifdef USE_TYPEOF_UNQUAL
> +#define __read_once_typeof(x) TYPEOF_UNQUAL(x)
> +#else
> +/*
> + * Fallback for older compilers to infer an unqualified type, using the fact
> + * that __auto_type is supposed to drop qualifiers. Unlike typeof_unqual(), the
> + * type must be complete (defines an unevaluated local variable). This must
> + * already be guaranteed because sizeof(x) is used in the __READ_ONCE macro.
> + */
> +#define __read_once_typeof(x) typeof(({ __auto_type ____t = (x); ____t; }))
Did you try old versions of gcc?
gcc before 11.0 don't drop qualifiers.
Even const int y=0; __auto_type x = +y; generates a 'const' x (same for - and ~).
You need to do '0+y' to lose the const.
None of them help here because you don't want the integer promotion.
Can you use:
union { u8 u8v, u16, u16v, u32 u32v, u64 u64v } u;
Then get the asm to write to the correct sized u.unn, then finish with:
*(typeof(*x))&u;
or does that spill to stack for 'ptr to volatile'?
It doesn't solve the problem for other sized items, but I'm not sure
how many there really are.
If a handful changing them to READ_BIG_STRUCT_ONCE() shouldn't be a
real problem.
David
> +#endif
> +
> /*
> * When building with LTO, there is an increased risk of the compiler
> * converting an address dependency headed by a READ_ONCE() invocation
> @@ -32,8 +44,8 @@
> #define __READ_ONCE(x) \
> ({ \
> auto __x = &(x); \
> - auto __ret = (TYPEOF_UNQUAL(*__x) *)__x, *__retp = &__ret; \
> - union { TYPEOF_UNQUAL(*__x) __val; char __c[1]; } __u; \
> + auto __ret = (__read_once_typeof(*__x) *)__x, *__retp = &__ret; \
> + union { __read_once_typeof(*__x) __val; char __c[1]; } __u; \
> *__retp = &__u.__val; \
> switch (sizeof(x)) { \
> case 1: \
>
>
> Thoughts?
>
On Mon, Jan 26, 2026, at 20:54, Marco Elver wrote:
> On Mon, Jan 26, 2026 at 08:56AM +0100, Arnd Bergmann wrote:
>> On Mon, Jan 26, 2026, at 01:25, Marco Elver wrote:
>>
>> How does this work with CC_HAS_TYPEOF_UNQUAL=false?
>>
>> As far as I can tell, TYPEOF_UNQUAL() falls back to __typeof__
>> on gcc-13, clang-18 and earlier, and not strip out qualifiers.
>
> I think we only need to worry about Clang for LTO builds. But yeah, our
> minimum supported Clang is 15, so between 15-18 it'd be broken.
Right, I missed the #ifdef CONFIG_LTO check, so indeed gcc is
fine here.
>> With fd69b2f7d5f4 ("compiler: Use __typeof_unqual__() for
>> __unqual_scalar_typeof()"), I would expect __unqual_scalar_typeof()
>> to do the right thing already.
>
> It'd still be broken for Clang 15-18, so it won't help much. We need
> this to work for more than "scalar", so even though it'll work for Clang
> 19+ given the redefinition to __typeof_unqual__, we should deprecate the
> _Generic-based __unqual_scalar_typeof() sooner than later.
>
> I was able to make this work for older compilers:
>
...
> #define __READ_ONCE(x) \
> ({ \
> auto __x = &(x); \
> - auto __ret = (TYPEOF_UNQUAL(*__x) *)__x, *__retp = &__ret; \
> - union { TYPEOF_UNQUAL(*__x) __val; char __c[1]; } __u; \
> + auto __ret = (__read_once_typeof(*__x) *)__x, *__retp = &__ret; \
> + union { __read_once_typeof(*__x) __val; char __c[1]; } __u; \
> *__retp = &__u.__val; \
>
> Thoughts?
Looks better than __unqual_scalar_typeof() to me. Would it make
sense to do the same __read_once_typeof() in the asm-generic
version of __READ_ONCE()? I don't remember if we discussed it
in the thread leading up to dee081bf8f82 ("READ_ONCE: Drop
pointer qualifiers when reading from scalar types").
We probably didn't have __auto_type back then.
Arnd
On Mon, 26 Jan 2026 at 23:24, Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Mon, Jan 26, 2026, at 20:54, Marco Elver wrote:
> > On Mon, Jan 26, 2026 at 08:56AM +0100, Arnd Bergmann wrote:
> >> On Mon, Jan 26, 2026, at 01:25, Marco Elver wrote:
> >>
> >> How does this work with CC_HAS_TYPEOF_UNQUAL=false?
> >>
> >> As far as I can tell, TYPEOF_UNQUAL() falls back to __typeof__
> >> on gcc-13, clang-18 and earlier, and not strip out qualifiers.
> >
> > I think we only need to worry about Clang for LTO builds. But yeah, our
> > minimum supported Clang is 15, so between 15-18 it'd be broken.
>
> Right, I missed the #ifdef CONFIG_LTO check, so indeed gcc is
> fine here.
>
> >> With fd69b2f7d5f4 ("compiler: Use __typeof_unqual__() for
> >> __unqual_scalar_typeof()"), I would expect __unqual_scalar_typeof()
> >> to do the right thing already.
> >
> > It'd still be broken for Clang 15-18, so it won't help much. We need
> > this to work for more than "scalar", so even though it'll work for Clang
> > 19+ given the redefinition to __typeof_unqual__, we should deprecate the
> > _Generic-based __unqual_scalar_typeof() sooner than later.
> >
> > I was able to make this work for older compilers:
> >
> ...
> > #define __READ_ONCE(x) \
> > ({ \
> > auto __x = &(x); \
> > - auto __ret = (TYPEOF_UNQUAL(*__x) *)__x, *__retp = &__ret; \
> > - union { TYPEOF_UNQUAL(*__x) __val; char __c[1]; } __u; \
> > + auto __ret = (__read_once_typeof(*__x) *)__x, *__retp = &__ret; \
> > + union { __read_once_typeof(*__x) __val; char __c[1]; } __u; \
> > *__retp = &__u.__val; \
>
> >
> > Thoughts?
>
> Looks better than __unqual_scalar_typeof() to me. Would it make
> sense to do the same __read_once_typeof() in the asm-generic
> version of __READ_ONCE()? I don't remember if we discussed it
> in the thread leading up to dee081bf8f82 ("READ_ONCE: Drop
> pointer qualifiers when reading from scalar types").
> We probably didn't have __auto_type back then.
I don't see the point for the asm-generic __READ_ONCE(): it's not
wrong to cast to 'const volatile volatile T*' nor 'const volatile
const T*' etc., which is dereferenced directly and not stored in any
temporary variable when used in READ_ONCE(). I actually don't know why
__unqual_scalar_typeof() is used in the asm-generic __READ_ONCE(),
which just adds 'const volatile' right back.
And __READ_ONCE_SCALAR() appears to be gone, where the
qualifier-stripping resulted in better code-gen. Then there's alpha,
but I think nobody cares sufficiently about it.
Did I miss some variant?
For this arm64 LTO variant we do need the qualifier stripping because
of the temporary __val.
And I'm hoping that once we update minimum compiler versions, we can
delete all these typeof-hacks and just use typeof_unqual().
On Tue, 27 Jan 2026 13:01:22 +0100
Marco Elver <elver@google.com> wrote:
> On Mon, 26 Jan 2026 at 23:24, Arnd Bergmann <arnd@arndb.de> wrote:
> >
> > On Mon, Jan 26, 2026, at 20:54, Marco Elver wrote:
> > > On Mon, Jan 26, 2026 at 08:56AM +0100, Arnd Bergmann wrote:
> > >> On Mon, Jan 26, 2026, at 01:25, Marco Elver wrote:
> > >>
> > >> How does this work with CC_HAS_TYPEOF_UNQUAL=false?
> > >>
> > >> As far as I can tell, TYPEOF_UNQUAL() falls back to __typeof__
> > >> on gcc-13, clang-18 and earlier, and not strip out qualifiers.
> > >
> > > I think we only need to worry about Clang for LTO builds. But yeah, our
> > > minimum supported Clang is 15, so between 15-18 it'd be broken.
> >
> > Right, I missed the #ifdef CONFIG_LTO check, so indeed gcc is
> > fine here.
> >
> > >> With fd69b2f7d5f4 ("compiler: Use __typeof_unqual__() for
> > >> __unqual_scalar_typeof()"), I would expect __unqual_scalar_typeof()
> > >> to do the right thing already.
> > >
> > > It'd still be broken for Clang 15-18, so it won't help much. We need
> > > this to work for more than "scalar", so even though it'll work for Clang
> > > 19+ given the redefinition to __typeof_unqual__, we should deprecate the
> > > _Generic-based __unqual_scalar_typeof() sooner than later.
> > >
> > > I was able to make this work for older compilers:
> > >
> > ...
> > > #define __READ_ONCE(x) \
> > > ({ \
> > > auto __x = &(x); \
> > > - auto __ret = (TYPEOF_UNQUAL(*__x) *)__x, *__retp = &__ret; \
> > > - union { TYPEOF_UNQUAL(*__x) __val; char __c[1]; } __u; \
> > > + auto __ret = (__read_once_typeof(*__x) *)__x, *__retp = &__ret; \
> > > + union { __read_once_typeof(*__x) __val; char __c[1]; } __u; \
> > > *__retp = &__u.__val; \
> >
> > >
> > > Thoughts?
> >
> > Looks better than __unqual_scalar_typeof() to me. Would it make
> > sense to do the same __read_once_typeof() in the asm-generic
> > version of __READ_ONCE()? I don't remember if we discussed it
> > in the thread leading up to dee081bf8f82 ("READ_ONCE: Drop
> > pointer qualifiers when reading from scalar types").
> > We probably didn't have __auto_type back then.
>
> I don't see the point for the asm-generic __READ_ONCE(): it's not
> wrong to cast to 'const volatile volatile T*' nor 'const volatile
> const T*' etc., which is dereferenced directly and not stored in any
> temporary variable when used in READ_ONCE(). I actually don't know why
> __unqual_scalar_typeof() is used in the asm-generic __READ_ONCE(),
> which just adds 'const volatile' right back.
That looks historic, once upon a time the code was:
typeof(x) __x = __READ_ONCE(x);
smp_read_barrier_depends();
__x;
but const needed stripping and someone decided to cast the result back
to the original type.
Of course that is pointless since the qualifiers aren't relevant on an
'rvalue' so are then discarded.
Then the barrier and temporary variable got removed.
(The barrier was only needed for alpha - which has its own __READ_ONCE()).
In the arm LTO version the ?: will also remove the qualifiers.
>
> And __READ_ONCE_SCALAR() appears to be gone, where the
> qualifier-stripping resulted in better code-gen.
The qualifier stripping was needed to read a 'const' variable.
Looks to me like the 'better code gen' happened earlier.
The 'earlier' version took the address of the on-stack volatile
variable.
So you may not need to remove the const/volatile qualifiers at all.
David
On Tue, 27 Jan 2026 at 15:30, David Laight <david.laight.linux@gmail.com> wrote:
>
> On Tue, 27 Jan 2026 13:01:22 +0100
> Marco Elver <elver@google.com> wrote:
>
> > On Mon, 26 Jan 2026 at 23:24, Arnd Bergmann <arnd@arndb.de> wrote:
> > >
> > > On Mon, Jan 26, 2026, at 20:54, Marco Elver wrote:
> > > > On Mon, Jan 26, 2026 at 08:56AM +0100, Arnd Bergmann wrote:
> > > >> On Mon, Jan 26, 2026, at 01:25, Marco Elver wrote:
> > > >>
> > > >> How does this work with CC_HAS_TYPEOF_UNQUAL=false?
> > > >>
> > > >> As far as I can tell, TYPEOF_UNQUAL() falls back to __typeof__
> > > >> on gcc-13, clang-18 and earlier, and not strip out qualifiers.
> > > >
> > > > I think we only need to worry about Clang for LTO builds. But yeah, our
> > > > minimum supported Clang is 15, so between 15-18 it'd be broken.
> > >
> > > Right, I missed the #ifdef CONFIG_LTO check, so indeed gcc is
> > > fine here.
> > >
> > > >> With fd69b2f7d5f4 ("compiler: Use __typeof_unqual__() for
> > > >> __unqual_scalar_typeof()"), I would expect __unqual_scalar_typeof()
> > > >> to do the right thing already.
> > > >
> > > > It'd still be broken for Clang 15-18, so it won't help much. We need
> > > > this to work for more than "scalar", so even though it'll work for Clang
> > > > 19+ given the redefinition to __typeof_unqual__, we should deprecate the
> > > > _Generic-based __unqual_scalar_typeof() sooner than later.
> > > >
> > > > I was able to make this work for older compilers:
> > > >
> > > ...
> > > > #define __READ_ONCE(x) \
> > > > ({ \
> > > > auto __x = &(x); \
> > > > - auto __ret = (TYPEOF_UNQUAL(*__x) *)__x, *__retp = &__ret; \
> > > > - union { TYPEOF_UNQUAL(*__x) __val; char __c[1]; } __u; \
> > > > + auto __ret = (__read_once_typeof(*__x) *)__x, *__retp = &__ret; \
> > > > + union { __read_once_typeof(*__x) __val; char __c[1]; } __u; \
> > > > *__retp = &__u.__val; \
> > >
> > > >
> > > > Thoughts?
> > >
> > > Looks better than __unqual_scalar_typeof() to me. Would it make
> > > sense to do the same __read_once_typeof() in the asm-generic
> > > version of __READ_ONCE()? I don't remember if we discussed it
> > > in the thread leading up to dee081bf8f82 ("READ_ONCE: Drop
> > > pointer qualifiers when reading from scalar types").
> > > We probably didn't have __auto_type back then.
> >
> > I don't see the point for the asm-generic __READ_ONCE(): it's not
> > wrong to cast to 'const volatile volatile T*' nor 'const volatile
> > const T*' etc., which is dereferenced directly and not stored in any
> > temporary variable when used in READ_ONCE(). I actually don't know why
> > __unqual_scalar_typeof() is used in the asm-generic __READ_ONCE(),
> > which just adds 'const volatile' right back.
>
> That looks historic, once upon a time the code was:
> typeof(x) __x = __READ_ONCE(x);
> smp_read_barrier_depends();
> __x;
> but const needed stripping and someone decided to cast the result back
> to the original type.
I think the bigger issue was 'volatile': "Passing a volatile-qualified
pointer to READ_ONCE() is an absolute
trainwreck for code generation [...]" per dee081bf8f82.
While the above with the temporary is gone for the asm-generic
version, the arm64 LTO still has the temporary and we need some
qualifier-stripping helper.
And a similar pattern also still exists for the asm-generic variant of
__smp_load_acquire() and friends.
> Of course that is pointless since the qualifiers aren't relevant on an
> 'rvalue' so are then discarded.
> Then the barrier and temporary variable got removed.
> (The barrier was only needed for alpha - which has its own __READ_ONCE()).
>
> In the arm LTO version the ?: will also remove the qualifiers.
>
> > And __READ_ONCE_SCALAR() appears to be gone, where the
> > qualifier-stripping resulted in better code-gen.
>
> The qualifier stripping was needed to read a 'const' variable.
> Looks to me like the 'better code gen' happened earlier.
> The 'earlier' version took the address of the on-stack volatile
> variable.
>
> So you may not need to remove the const/volatile qualifiers at all.
Yeah, I think so too (for the asm-generic variant that is). But I
won't mix up the arm64 fixes with possible asm-generic ones to avoid
this blowing up too much. I don't see a dependency on the asm-generic
cleanup, so we can clean it up separately; but I think it's purely
cosmetic unlike this arm64 LTO version.
On Tue, 27 Jan 2026 16:04:45 +0100
Marco Elver <elver@google.com> wrote:
...
> I think the bigger issue was 'volatile': "Passing a volatile-qualified
> pointer to READ_ONCE() is an absolute
> trainwreck for code generation [...]" per dee081bf8f82.
>
> While the above with the temporary is gone for the asm-generic
> version, the arm64 LTO still has the temporary and we need some
> qualifier-stripping helper.
I've done some real experiments, see https://www.godbolt.org/z/n9f91dGe3
If you don't strip the volatile you get a 'trainwreck'.
Visible on godbolt for the short sub-structure function f_v_s().
Fixable using __unqual_typeof__().
But the code for anything over 8 bytes is also a trainwreck.
eg: for an int[4] in a sub-structure:
return __READ_ONCE(s->s_i).i[2];
f_s_i:
sub sp, sp, #16
ldr x8, [x0, #56]
str x8, [sp, #8]
ldr w8, [x0, #64]
str w8, [sp, #4]
ldr w9, [x0, #68]
mov w0, w8
str w9, [sp], #16
ret
(The same code for a char[4] is fine...)
For integral types you can get a variable of the unqualified
type with:
auto v = (typeof(x))0;
but that doesn't help here because the code needs to read short structures.
For long structures member by member access will give better code.
I'm not sure they should be allowed.
(and there may not be any.)
David
© 2016 - 2026 Red Hat, Inc.