[PATCH] x86: Implement _THIS_IP_ using inline asm for 32-bit

Marco Elver posted 1 patch 3 days, 21 hours ago
arch/x86/include/asm/linkage.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
[PATCH] x86: Implement _THIS_IP_ using inline asm for 32-bit
Posted by Marco Elver 3 days, 21 hours ago
Both GCC [1] and Clang [2] consider the generic version of _THIS_IP_ to
be broken:

        #define _THIS_IP_  ({ __label__ __here; __here: (unsigned long)&&__here; })

In particular, the address of a label is only expected to be used with a
computed goto.

While the generic version more or less works today, it is known to be
brittle and may break with current and future optimizations. For
example, Clang -O2 always returns 1 when this function is inlined:

        static inline unsigned long get_ip(void)
        { return ({ __label__ __here; __here: (unsigned long)&&__here; }); }

Like the 64-bit version (which was missing 'volatile'), fix it by
overriding _THIS_IP_ in <asm/linkage.h> using an inline asm version for
32-bit x86.

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120071 [1]
Link: https://github.com/llvm/llvm-project/issues/138272 [2]
Signed-off-by: Marco Elver <elver@google.com>
---
 arch/x86/include/asm/linkage.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/linkage.h b/arch/x86/include/asm/linkage.h
index a7294656ad90..bce3c6f4b94f 100644
--- a/arch/x86/include/asm/linkage.h
+++ b/arch/x86/include/asm/linkage.h
@@ -13,11 +13,12 @@
  * The generic version tends to create spurious ENDBR instructions under
  * certain conditions.
  */
-#define _THIS_IP_ ({ unsigned long __here; asm ("lea 0(%%rip), %0" : "=r" (__here)); __here; })
+#define _THIS_IP_ ({ unsigned long __here; asm volatile("lea 0(%%rip), %0" : "=r" (__here)); __here; })
 #endif
 
 #ifdef CONFIG_X86_32
 #define asmlinkage CPP_ASMLINKAGE __attribute__((regparm(0)))
+#define _THIS_IP_ ({ unsigned long __ip; asm volatile("call 1f\n1: pop %0" : "=r" (__ip)); __ip; })
 #endif /* CONFIG_X86_32 */
 
 #define __ALIGN		.balign CONFIG_FUNCTION_ALIGNMENT, 0x90;
-- 
2.54.0.746.g67dd491aae-goog
Re: [PATCH] x86: Implement _THIS_IP_ using inline asm for 32-bit
Posted by Peter Zijlstra 3 days, 14 hours ago
On Thu, May 21, 2026 at 02:00:09AM +0200, Marco Elver wrote:
> Both GCC [1] and Clang [2] consider the generic version of _THIS_IP_ to
> be broken:
> 
>         #define _THIS_IP_  ({ __label__ __here; __here: (unsigned long)&&__here; })
> 
> In particular, the address of a label is only expected to be used with a
> computed goto.
> 
> While the generic version more or less works today, it is known to be
> brittle and may break with current and future optimizations. For
> example, Clang -O2 always returns 1 when this function is inlined:
> 
>         static inline unsigned long get_ip(void)
>         { return ({ __label__ __here; __here: (unsigned long)&&__here; }); }
> 

Oh gawd :/

> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120071 [1]
> Link: https://github.com/llvm/llvm-project/issues/138272 [2]
> Signed-off-by: Marco Elver <elver@google.com>
> ---
>  arch/x86/include/asm/linkage.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/linkage.h b/arch/x86/include/asm/linkage.h
> index a7294656ad90..bce3c6f4b94f 100644
> --- a/arch/x86/include/asm/linkage.h
> +++ b/arch/x86/include/asm/linkage.h
> @@ -13,11 +13,12 @@
>   * The generic version tends to create spurious ENDBR instructions under
>   * certain conditions.
>   */
> -#define _THIS_IP_ ({ unsigned long __here; asm ("lea 0(%%rip), %0" : "=r" (__here)); __here; })
> +#define _THIS_IP_ ({ unsigned long __here; asm volatile("lea 0(%%rip), %0" : "=r" (__here)); __here; })
>  #endif
>  
>  #ifdef CONFIG_X86_32
>  #define asmlinkage CPP_ASMLINKAGE __attribute__((regparm(0)))
> +#define _THIS_IP_ ({ unsigned long __ip; asm volatile("call 1f\n1: pop %0" : "=r" (__ip)); __ip; })

This will mess up the RSB and cause bad performance ripple effects for a
bit each use. Now, I don't think anybody still cares about performance
on 32bit (I certainly don't), so perhaps this is fine. But urgh.
Re: [PATCH] x86: Implement _THIS_IP_ using inline asm for 32-bit
Posted by H. Peter Anvin 3 days, 11 hours ago
On May 21, 2026 12:08:01 AM PDT, Peter Zijlstra <peterz@infradead.org> wrote:
>On Thu, May 21, 2026 at 02:00:09AM +0200, Marco Elver wrote:
>> Both GCC [1] and Clang [2] consider the generic version of _THIS_IP_ to
>> be broken:
>> 
>>         #define _THIS_IP_  ({ __label__ __here; __here: (unsigned long)&&__here; })
>> 
>> In particular, the address of a label is only expected to be used with a
>> computed goto.
>> 
>> While the generic version more or less works today, it is known to be
>> brittle and may break with current and future optimizations. For
>> example, Clang -O2 always returns 1 when this function is inlined:
>> 
>>         static inline unsigned long get_ip(void)
>>         { return ({ __label__ __here; __here: (unsigned long)&&__here; }); }
>> 
>
>Oh gawd :/
>
>> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120071 [1]
>> Link: https://github.com/llvm/llvm-project/issues/138272 [2]
>> Signed-off-by: Marco Elver <elver@google.com>
>> ---
>>  arch/x86/include/asm/linkage.h | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/x86/include/asm/linkage.h b/arch/x86/include/asm/linkage.h
>> index a7294656ad90..bce3c6f4b94f 100644
>> --- a/arch/x86/include/asm/linkage.h
>> +++ b/arch/x86/include/asm/linkage.h
>> @@ -13,11 +13,12 @@
>>   * The generic version tends to create spurious ENDBR instructions under
>>   * certain conditions.
>>   */
>> -#define _THIS_IP_ ({ unsigned long __here; asm ("lea 0(%%rip), %0" : "=r" (__here)); __here; })
>> +#define _THIS_IP_ ({ unsigned long __here; asm volatile("lea 0(%%rip), %0" : "=r" (__here)); __here; })
>>  #endif
>>  
>>  #ifdef CONFIG_X86_32
>>  #define asmlinkage CPP_ASMLINKAGE __attribute__((regparm(0)))
>> +#define _THIS_IP_ ({ unsigned long __ip; asm volatile("call 1f\n1: pop %0" : "=r" (__ip)); __ip; })
>
>This will mess up the RSB and cause bad performance ripple effects for a
>bit each use. Now, I don't think anybody still cares about performance
>on 32bit (I certainly don't), so perhaps this is fine. But urgh.

Most microarchitectures do *not* have a problem with call/pop, as they know that call with a zero offset is not going to return. The main exception was the Pentium 4.
Re: [PATCH] x86: Implement _THIS_IP_ using inline asm for 32-bit
Posted by Peter Zijlstra 3 days, 11 hours ago
On Thu, May 21, 2026 at 02:55:22AM -0700, H. Peter Anvin wrote:
> On May 21, 2026 12:08:01 AM PDT, Peter Zijlstra <peterz@infradead.org> wrote:
> >On Thu, May 21, 2026 at 02:00:09AM +0200, Marco Elver wrote:
> >> Both GCC [1] and Clang [2] consider the generic version of _THIS_IP_ to
> >> be broken:
> >> 
> >>         #define _THIS_IP_  ({ __label__ __here; __here: (unsigned long)&&__here; })
> >> 
> >> In particular, the address of a label is only expected to be used with a
> >> computed goto.
> >> 
> >> While the generic version more or less works today, it is known to be
> >> brittle and may break with current and future optimizations. For
> >> example, Clang -O2 always returns 1 when this function is inlined:
> >> 
> >>         static inline unsigned long get_ip(void)
> >>         { return ({ __label__ __here; __here: (unsigned long)&&__here; }); }
> >> 
> >
> >Oh gawd :/
> >
> >> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120071 [1]
> >> Link: https://github.com/llvm/llvm-project/issues/138272 [2]
> >> Signed-off-by: Marco Elver <elver@google.com>
> >> ---
> >>  arch/x86/include/asm/linkage.h | 3 ++-
> >>  1 file changed, 2 insertions(+), 1 deletion(-)
> >> 
> >> diff --git a/arch/x86/include/asm/linkage.h b/arch/x86/include/asm/linkage.h
> >> index a7294656ad90..bce3c6f4b94f 100644
> >> --- a/arch/x86/include/asm/linkage.h
> >> +++ b/arch/x86/include/asm/linkage.h
> >> @@ -13,11 +13,12 @@
> >>   * The generic version tends to create spurious ENDBR instructions under
> >>   * certain conditions.
> >>   */
> >> -#define _THIS_IP_ ({ unsigned long __here; asm ("lea 0(%%rip), %0" : "=r" (__here)); __here; })
> >> +#define _THIS_IP_ ({ unsigned long __here; asm volatile("lea 0(%%rip), %0" : "=r" (__here)); __here; })
> >>  #endif
> >>  
> >>  #ifdef CONFIG_X86_32
> >>  #define asmlinkage CPP_ASMLINKAGE __attribute__((regparm(0)))
> >> +#define _THIS_IP_ ({ unsigned long __ip; asm volatile("call 1f\n1: pop %0" : "=r" (__ip)); __ip; })
> >
> >This will mess up the RSB and cause bad performance ripple effects for a
> >bit each use. Now, I don't think anybody still cares about performance
> >on 32bit (I certainly don't), so perhaps this is fine. But urgh.
> 
> Most microarchitectures do *not* have a problem with call/pop, as they
> know that call with a zero offset is not going to return. The main
> exception was the Pentium 4.

Oh, that's good to know. Still the "1: mov $1b, %reg" thing is shorter,
and generates the exact same code the compilers used to (and GCC still
does). Isn't that a better option?
Re: [PATCH] x86: Implement _THIS_IP_ using inline asm for 32-bit
Posted by Marco Elver 3 days, 9 hours ago
On Thu, 21 May 2026 at 12:20, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, May 21, 2026 at 02:55:22AM -0700, H. Peter Anvin wrote:
> > On May 21, 2026 12:08:01 AM PDT, Peter Zijlstra <peterz@infradead.org> wrote:
> > >On Thu, May 21, 2026 at 02:00:09AM +0200, Marco Elver wrote:
> > >> Both GCC [1] and Clang [2] consider the generic version of _THIS_IP_ to
> > >> be broken:
> > >>
> > >>         #define _THIS_IP_  ({ __label__ __here; __here: (unsigned long)&&__here; })
> > >>
> > >> In particular, the address of a label is only expected to be used with a
> > >> computed goto.
> > >>
> > >> While the generic version more or less works today, it is known to be
> > >> brittle and may break with current and future optimizations. For
> > >> example, Clang -O2 always returns 1 when this function is inlined:
> > >>
> > >>         static inline unsigned long get_ip(void)
> > >>         { return ({ __label__ __here; __here: (unsigned long)&&__here; }); }
> > >>
> > >
> > >Oh gawd :/
> > >
> > >> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120071 [1]
> > >> Link: https://github.com/llvm/llvm-project/issues/138272 [2]
> > >> Signed-off-by: Marco Elver <elver@google.com>
> > >> ---
> > >>  arch/x86/include/asm/linkage.h | 3 ++-
> > >>  1 file changed, 2 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/arch/x86/include/asm/linkage.h b/arch/x86/include/asm/linkage.h
> > >> index a7294656ad90..bce3c6f4b94f 100644
> > >> --- a/arch/x86/include/asm/linkage.h
> > >> +++ b/arch/x86/include/asm/linkage.h
> > >> @@ -13,11 +13,12 @@
> > >>   * The generic version tends to create spurious ENDBR instructions under
> > >>   * certain conditions.
> > >>   */
> > >> -#define _THIS_IP_ ({ unsigned long __here; asm ("lea 0(%%rip), %0" : "=r" (__here)); __here; })
> > >> +#define _THIS_IP_ ({ unsigned long __here; asm volatile("lea 0(%%rip), %0" : "=r" (__here)); __here; })
> > >>  #endif
> > >>
> > >>  #ifdef CONFIG_X86_32
> > >>  #define asmlinkage CPP_ASMLINKAGE __attribute__((regparm(0)))
> > >> +#define _THIS_IP_ ({ unsigned long __ip; asm volatile("call 1f\n1: pop %0" : "=r" (__ip)); __ip; })
> > >
> > >This will mess up the RSB and cause bad performance ripple effects for a
> > >bit each use. Now, I don't think anybody still cares about performance
> > >on 32bit (I certainly don't), so perhaps this is fine. But urgh.
> >
> > Most microarchitectures do *not* have a problem with call/pop, as they
> > know that call with a zero offset is not going to return. The main
> > exception was the Pentium 4.
>
> Oh, that's good to know. Still the "1: mov $1b, %reg" thing is shorter,
> and generates the exact same code the compilers used to (and GCC still
> does). Isn't that a better option?

It should work - just means it's going to emit relocations. If most
microarchitectures do in fact recognize the PIC variant and optimize
it, it might be better to avoid the relocations as it'd produce more
compact kernel images.

Also, while most kernel code doesn't need to be PIC (it's -fno-PIE),
there are a few special bits that are PIC (arch/x86/boot/startup ?),
so if you want this to be generic you need 2 versions guarded by
`#ifdef __PIC__`.
Re: [PATCH] x86: Implement _THIS_IP_ using inline asm for 32-bit
Posted by David Laight 3 days, 11 hours ago
On Thu, 21 May 2026 09:08:01 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, May 21, 2026 at 02:00:09AM +0200, Marco Elver wrote:
..
> >  #ifdef CONFIG_X86_32
> >  #define asmlinkage CPP_ASMLINKAGE __attribute__((regparm(0)))
> > +#define _THIS_IP_ ({ unsigned long __ip; asm volatile("call 1f\n1: pop %0" : "=r" (__ip)); __ip; })  
> 
> This will mess up the RSB and cause bad performance ripple effects for a
> bit each use. Now, I don't think anybody still cares about performance
> on 32bit (I certainly don't), so perhaps this is fine. But urgh.

Nope, the cpu understands that code sequence and doesn't mess up the RSB.
It might even get decoded to a single u-op.

Basically it is present at the start of pretty much every PIC function.

-- David
Re: [PATCH] x86: Implement _THIS_IP_ using inline asm for 32-bit
Posted by Marco Elver 3 days, 13 hours ago
On Thu, 21 May 2026 at 09:09, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, May 21, 2026 at 02:00:09AM +0200, Marco Elver wrote:
> > Both GCC [1] and Clang [2] consider the generic version of _THIS_IP_ to
> > be broken:
> >
> >         #define _THIS_IP_  ({ __label__ __here; __here: (unsigned long)&&__here; })
> >
> > In particular, the address of a label is only expected to be used with a
> > computed goto.
> >
> > While the generic version more or less works today, it is known to be
> > brittle and may break with current and future optimizations. For
> > example, Clang -O2 always returns 1 when this function is inlined:
> >
> >         static inline unsigned long get_ip(void)
> >         { return ({ __label__ __here; __here: (unsigned long)&&__here; }); }
> >
>
> Oh gawd :/
>
> > Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120071 [1]
> > Link: https://github.com/llvm/llvm-project/issues/138272 [2]
> > Signed-off-by: Marco Elver <elver@google.com>
> > ---
> >  arch/x86/include/asm/linkage.h | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/include/asm/linkage.h b/arch/x86/include/asm/linkage.h
> > index a7294656ad90..bce3c6f4b94f 100644
> > --- a/arch/x86/include/asm/linkage.h
> > +++ b/arch/x86/include/asm/linkage.h
> > @@ -13,11 +13,12 @@
> >   * The generic version tends to create spurious ENDBR instructions under
> >   * certain conditions.
> >   */
> > -#define _THIS_IP_ ({ unsigned long __here; asm ("lea 0(%%rip), %0" : "=r" (__here)); __here; })
> > +#define _THIS_IP_ ({ unsigned long __here; asm volatile("lea 0(%%rip), %0" : "=r" (__here)); __here; })
> >  #endif
> >
> >  #ifdef CONFIG_X86_32
> >  #define asmlinkage CPP_ASMLINKAGE __attribute__((regparm(0)))
> > +#define _THIS_IP_ ({ unsigned long __ip; asm volatile("call 1f\n1: pop %0" : "=r" (__ip)); __ip; })
>
> This will mess up the RSB and cause bad performance ripple effects for a
> bit each use. Now, I don't think anybody still cares about performance
> on 32bit (I certainly don't), so perhaps this is fine. But urgh.

Yeah - up to you. GCC appears to do the right thing still even for
32-bit: https://godbolt.org/z/3PWPK8E4f

Here's "Clang returns 1": https://godbolt.org/z/KjMvEWeM5
Re: [PATCH] x86: Implement _THIS_IP_ using inline asm for 32-bit
Posted by Peter Zijlstra 3 days, 12 hours ago
On Thu, May 21, 2026 at 10:34:38AM +0200, Marco Elver wrote:
> On Thu, 21 May 2026 at 09:09, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Thu, May 21, 2026 at 02:00:09AM +0200, Marco Elver wrote:
> > > Both GCC [1] and Clang [2] consider the generic version of _THIS_IP_ to
> > > be broken:
> > >
> > >         #define _THIS_IP_  ({ __label__ __here; __here: (unsigned long)&&__here; })
> > >
> > > In particular, the address of a label is only expected to be used with a
> > > computed goto.
> > >
> > > While the generic version more or less works today, it is known to be
> > > brittle and may break with current and future optimizations. For
> > > example, Clang -O2 always returns 1 when this function is inlined:
> > >
> > >         static inline unsigned long get_ip(void)
> > >         { return ({ __label__ __here; __here: (unsigned long)&&__here; }); }
> > >
> >
> > Oh gawd :/
> >
> > > Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120071 [1]
> > > Link: https://github.com/llvm/llvm-project/issues/138272 [2]
> > > Signed-off-by: Marco Elver <elver@google.com>
> > > ---
> > >  arch/x86/include/asm/linkage.h | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/include/asm/linkage.h b/arch/x86/include/asm/linkage.h
> > > index a7294656ad90..bce3c6f4b94f 100644
> > > --- a/arch/x86/include/asm/linkage.h
> > > +++ b/arch/x86/include/asm/linkage.h
> > > @@ -13,11 +13,12 @@
> > >   * The generic version tends to create spurious ENDBR instructions under
> > >   * certain conditions.
> > >   */
> > > -#define _THIS_IP_ ({ unsigned long __here; asm ("lea 0(%%rip), %0" : "=r" (__here)); __here; })
> > > +#define _THIS_IP_ ({ unsigned long __here; asm volatile("lea 0(%%rip), %0" : "=r" (__here)); __here; })
> > >  #endif
> > >
> > >  #ifdef CONFIG_X86_32
> > >  #define asmlinkage CPP_ASMLINKAGE __attribute__((regparm(0)))
> > > +#define _THIS_IP_ ({ unsigned long __ip; asm volatile("call 1f\n1: pop %0" : "=r" (__ip)); __ip; })
> >
> > This will mess up the RSB and cause bad performance ripple effects for a
> > bit each use. Now, I don't think anybody still cares about performance
> > on 32bit (I certainly don't), so perhaps this is fine. But urgh.
> 
> Yeah - up to you. GCC appears to do the right thing still even for
> 32-bit: https://godbolt.org/z/3PWPK8E4f
> 
> Here's "Clang returns 1": https://godbolt.org/z/KjMvEWeM5

asm volatile ("1: mov $1b, %0" : "=r" (__here))

seems to work?

And then there is this form:

asm volatile ("1: lea 1b, %0" : "=r" (__here))

which should also work but be one more byte. Both are shorter than
CALL+POP and neither mess up the RSB. Am I missing something?