[patch 00/11] rseq: Optimize exit to user space

Thomas Gleixner posted 11 patches 1 month, 3 weeks ago
There is a newer version of this series
drivers/hv/mshv_common.c         |    2
include/linux/entry-common.h     |   12 +--
include/linux/irq-entry-common.h |   31 ++++++--
include/linux/resume_user_mode.h |   40 +++++++---
include/linux/rseq.h             |  115 +++++++++----------------------
include/linux/sched.h            |   10 +-
include/uapi/linux/rseq.h        |   21 +----
io_uring/io_uring.h              |    2
kernel/entry/common.c            |    9 +-
kernel/entry/kvm.c               |    2
kernel/rseq.c                    |  142 +++++++++++++++++++++++++--------------
kernel/sched/core.c              |    8 +-
kernel/sched/membarrier.c        |    8 +-
13 files changed, 213 insertions(+), 189 deletions(-)
[patch 00/11] rseq: Optimize exit to user space
Posted by Thomas Gleixner 1 month, 3 weeks ago
With the more wide spread usage of rseq in glibc, rseq is not longer a
niche use case for special applications.

While working on a sane implementation of a rseq based time slice extension
mechanism, I noticed several shortcomings of the current rseq code:

  1) task::rseq_event_mask is a pointless bitfield despite the fact that
     the ABI flags it was meant to support have been deprecated and
     functionally disabled three years ago.

  2) task::rseq_event_mask is accumulating bits unless there is a critical
     section discovered in the user space rseq memory. This results in
     pointless invocations of the rseq user space exit handler even if
     there had nothing changed. As a matter of correctness these bits have
     to be clear when exiting to user space and therefore pristine when
     coming back into the kernel. Aside of correctness, this also avoids
     pointless evaluation of the user space memory, which is a performance
     benefit.

  3) The evaluation of critical sections does not differentiate between
     syscall and interrupt/exception exits. The current implementation
     silently fixes up critical sections which invoked a syscall unless
     CONFIG_DEBUG_RSEQ is enabled.

     That's just wrong. If user space does that on a production kernel it
     can keep the pieces. The kernel is not there to proliferate mindless
     user space programming and letting everyone pay the performance
     penalty.

This series addresses these issues and on top converts parts of the user
space access over to the new masked access model, which lowers the overhead
of Spectre-V1 mitigations significantly on architectures which support it
(x86 as of today). This is especially noticable in the access to the
rseq_cs field in struct rseq, which is the first quick check to figure out
whether a critical section is installed or not.

It survives the kernels rseq selftests, but I did not any performance tests
vs. rseq because I have no idea how to use the gazillion of undocumented
command line parameters of the benchmark. I leave that to people who are so
familiar with them, that they assume everyone else is too :)

The performance gain on regular workloads is clearly measurable and the
consistent event flag state allows now to build the time slice extension
mechanism on top. The first POC I implemented:

   https://lore.kernel.org/lkml/87o6smb3a0.ffs@tglx/

suffered badly from the stale eventmask bits and the cleaned up version
brought a whopping 25+% performance gain.

The series depends on the seperately posted rseq bugfix:

   https://lore.kernel.org/lkml/87o6sj6z95.ffs@tglx/

and the uaccess generic helper series:

   https://lore.kernel.org/lkml/20250813150610.521355442@linutronix.de/

and a related futex fix in

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git locking/urgent

For your conveniance it is also available as a conglomorate from git:

    git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git rseq/perf

Thanks,

	tglx
---
 drivers/hv/mshv_common.c         |    2 
 include/linux/entry-common.h     |   12 +--
 include/linux/irq-entry-common.h |   31 ++++++--
 include/linux/resume_user_mode.h |   40 +++++++---
 include/linux/rseq.h             |  115 +++++++++----------------------
 include/linux/sched.h            |   10 +-
 include/uapi/linux/rseq.h        |   21 +----
 io_uring/io_uring.h              |    2 
 kernel/entry/common.c            |    9 +-
 kernel/entry/kvm.c               |    2 
 kernel/rseq.c                    |  142 +++++++++++++++++++++++++--------------
 kernel/sched/core.c              |    8 +-
 kernel/sched/membarrier.c        |    8 +-
 13 files changed, 213 insertions(+), 189 deletions(-)
Re: [patch 00/11] rseq: Optimize exit to user space
Posted by Mathieu Desnoyers 1 month, 2 weeks ago
On 2025-08-13 12:29, Thomas Gleixner wrote:
> With the more wide spread usage of rseq in glibc, rseq is not longer a
> niche use case for special applications.
> 
> While working on a sane implementation of a rseq based time slice extension
> mechanism, I noticed several shortcomings of the current rseq code:
> 
>    1) task::rseq_event_mask is a pointless bitfield despite the fact that
>       the ABI flags it was meant to support have been deprecated and
>       functionally disabled three years ago.

Correct. We could replace this mask by a single flag, set when the
events happen, test-and-cleared on return to userspace. This would
lessen the synchronization requirements because bitops are not needed
anymore.

> 
>    2) task::rseq_event_mask is accumulating bits unless there is a critical
>       section discovered in the user space rseq memory. This results in
>       pointless invocations of the rseq user space exit handler even if
>       there had nothing changed. As a matter of correctness these bits have
>       to be clear when exiting to user space and therefore pristine when
>       coming back into the kernel. Aside of correctness, this also avoids
>       pointless evaluation of the user space memory, which is a performance
>       benefit.

Ouch. Yes. The intent here is indeed to clear those bits in all cases,
whether there is an rseq critical section ongoing or not. Good catch!

> 
>    3) The evaluation of critical sections does not differentiate between
>       syscall and interrupt/exception exits. The current implementation
>       silently fixes up critical sections which invoked a syscall unless
>       CONFIG_DEBUG_RSEQ is enabled.
> 
>       That's just wrong. If user space does that on a production kernel it
>       can keep the pieces. The kernel is not there to proliferate mindless
>       user space programming and letting everyone pay the performance
>       penalty.

Agreed. There is no need to fixup the rseq critical section, however we
still need to update other rseq fields (e.g. current CPU number) when
returning from a system call.

> 
> This series addresses these issues and on top converts parts of the user
> space access over to the new masked access model, which lowers the overhead
> of Spectre-V1 mitigations significantly on architectures which support it
> (x86 as of today). This is especially noticable in the access to the
> rseq_cs field in struct rseq, which is the first quick check to figure out
> whether a critical section is installed or not.

Nice!

I'll try to have a closer look at the series soon.

Thanks,

Mathieu

> 
> It survives the kernels rseq selftests, but I did not any performance tests
> vs. rseq because I have no idea how to use the gazillion of undocumented
> command line parameters of the benchmark. I leave that to people who are so
> familiar with them, that they assume everyone else is too :)
> 
> The performance gain on regular workloads is clearly measurable and the
> consistent event flag state allows now to build the time slice extension
> mechanism on top. The first POC I implemented:
> 
>     https://lore.kernel.org/lkml/87o6smb3a0.ffs@tglx/
> 
> suffered badly from the stale eventmask bits and the cleaned up version
> brought a whopping 25+% performance gain.
> 
> The series depends on the seperately posted rseq bugfix:
> 
>     https://lore.kernel.org/lkml/87o6sj6z95.ffs@tglx/
> 
> and the uaccess generic helper series:
> 
>     https://lore.kernel.org/lkml/20250813150610.521355442@linutronix.de/
> 
> and a related futex fix in
> 
>     git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git locking/urgent
> 
> For your conveniance it is also available as a conglomorate from git:
> 
>      git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git rseq/perf
> 
> Thanks,
> 
> 	tglx
> ---
>   drivers/hv/mshv_common.c         |    2
>   include/linux/entry-common.h     |   12 +--
>   include/linux/irq-entry-common.h |   31 ++++++--
>   include/linux/resume_user_mode.h |   40 +++++++---
>   include/linux/rseq.h             |  115 +++++++++----------------------
>   include/linux/sched.h            |   10 +-
>   include/uapi/linux/rseq.h        |   21 +----
>   io_uring/io_uring.h              |    2
>   kernel/entry/common.c            |    9 +-
>   kernel/entry/kvm.c               |    2
>   kernel/rseq.c                    |  142 +++++++++++++++++++++++++--------------
>   kernel/sched/core.c              |    8 +-
>   kernel/sched/membarrier.c        |    8 +-
>   13 files changed, 213 insertions(+), 189 deletions(-)
> 


-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
Re: [patch 00/11] rseq: Optimize exit to user space
Posted by Jens Axboe 1 month, 3 weeks ago
On 8/13/25 10:29 AM, Thomas Gleixner wrote:
> With the more wide spread usage of rseq in glibc, rseq is not longer a
> niche use case for special applications.
> 
> While working on a sane implementation of a rseq based time slice extension
> mechanism, I noticed several shortcomings of the current rseq code:
> 
>   1) task::rseq_event_mask is a pointless bitfield despite the fact that
>      the ABI flags it was meant to support have been deprecated and
>      functionally disabled three years ago.
> 
>   2) task::rseq_event_mask is accumulating bits unless there is a critical
>      section discovered in the user space rseq memory. This results in
>      pointless invocations of the rseq user space exit handler even if
>      there had nothing changed. As a matter of correctness these bits have
>      to be clear when exiting to user space and therefore pristine when
>      coming back into the kernel. Aside of correctness, this also avoids
>      pointless evaluation of the user space memory, which is a performance
>      benefit.
> 
>   3) The evaluation of critical sections does not differentiate between
>      syscall and interrupt/exception exits. The current implementation
>      silently fixes up critical sections which invoked a syscall unless
>      CONFIG_DEBUG_RSEQ is enabled.
> 
>      That's just wrong. If user space does that on a production kernel it
>      can keep the pieces. The kernel is not there to proliferate mindless
>      user space programming and letting everyone pay the performance
>      penalty.
> 
> This series addresses these issues and on top converts parts of the user
> space access over to the new masked access model, which lowers the overhead
> of Spectre-V1 mitigations significantly on architectures which support it
> (x86 as of today). This is especially noticable in the access to the
> rseq_cs field in struct rseq, which is the first quick check to figure out
> whether a critical section is installed or not.
> 
> It survives the kernels rseq selftests, but I did not any performance tests
> vs. rseq because I have no idea how to use the gazillion of undocumented
> command line parameters of the benchmark. I leave that to people who are so
> familiar with them, that they assume everyone else is too :)
> 
> The performance gain on regular workloads is clearly measurable and the
> consistent event flag state allows now to build the time slice extension
> mechanism on top. The first POC I implemented:
> 
>    https://lore.kernel.org/lkml/87o6smb3a0.ffs@tglx/
> 
> suffered badly from the stale eventmask bits and the cleaned up version
> brought a whopping 25+% performance gain.

Thanks for doing this work, it's been on my list to take a look at rseq
as it's quite the pig currently and enabled by default (with what I
assume is from a newer libc).

-- 
Jens Axboe
Re: [patch 00/11] rseq: Optimize exit to user space
Posted by Thomas Gleixner 1 month, 3 weeks ago
On Wed, Aug 13 2025 at 11:45, Jens Axboe wrote:
> On 8/13/25 10:29 AM, Thomas Gleixner wrote:
>> 
>> suffered badly from the stale eventmask bits and the cleaned up version
>> brought a whopping 25+% performance gain.
>
> Thanks for doing this work, it's been on my list to take a look at rseq
> as it's quite the pig currently and enabled by default (with what I
> assume is from a newer libc).

Yes, recent glibc uses it.

Could you give it a test ride to see whether this makes a difference in
your environment?

Thanks

        tglx
Re: [patch 00/11] rseq: Optimize exit to user space
Posted by Jens Axboe 1 month, 3 weeks ago
On 8/13/25 3:32 PM, Thomas Gleixner wrote:
> On Wed, Aug 13 2025 at 11:45, Jens Axboe wrote:
>> On 8/13/25 10:29 AM, Thomas Gleixner wrote:
>>>
>>> suffered badly from the stale eventmask bits and the cleaned up version
>>> brought a whopping 25+% performance gain.
>>
>> Thanks for doing this work, it's been on my list to take a look at rseq
>> as it's quite the pig currently and enabled by default (with what I
>> assume is from a newer libc).
> 
> Yes, recent glibc uses it.
> 
> Could you give it a test ride to see whether this makes a difference in
> your environment?

Yep, I'll give a spin. Won't be before Monday as I'm out of town thu/fri
this week, just as a heads-up.

If I recall correct, I'd see rseq at 2-3% of overall CPU time last I
tested. I'll try and do some pre/post numbers for some of the stuff I
usually run.

-- 
Jens Axboe
Re: [patch 00/11] rseq: Optimize exit to user space
Posted by Thomas Gleixner 1 month, 3 weeks ago
On Wed, Aug 13 2025 at 15:36, Jens Axboe wrote:
> On 8/13/25 3:32 PM, Thomas Gleixner wrote:
>> Could you give it a test ride to see whether this makes a difference in
>> your environment?
>
> Yep, I'll give a spin.

Appreciated.

> If I recall correct, I'd see rseq at 2-3% of overall CPU time last I

I'm not surprised.

> tested. I'll try and do some pre/post numbers for some of the stuff I
> usually run.

Cool. Thanks

      tglx
Re: [patch 00/11] rseq: Optimize exit to user space
Posted by Thomas Gleixner 1 month, 2 weeks ago
On Thu, Aug 14 2025 at 00:08, Thomas Gleixner wrote:
> On Wed, Aug 13 2025 at 15:36, Jens Axboe wrote:
>> On 8/13/25 3:32 PM, Thomas Gleixner wrote:
>>> Could you give it a test ride to see whether this makes a difference in
>>> your environment?
>>
>> Yep, I'll give a spin.
>
> Appreciated.

Please do not use the git branch I had in the cover letter. I did some
more analysis of this and it's even worse than I thought. Use

  git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git rseq/wip

instead.

I've rewritten the whole pile by now and made it a real fast path
without the TIF_NOTIFY horror show, unless the fast path, which runs
_after_ the TIF work loop faults. So far that happens once for each
fork() as that has to fault in the copy of the user space rseq region.

There are lightweight per CPU stats for the various events, which can be
accessed via

      /sys/kernel/debug/rseq/stat

which exposes a racy sum of them. Here is the output after a reboot and
a full kernel recompile

exit:           85703905    // Total invocations
signal:            34635    // Invocations from signal delivery
slowp:               134    // Slow path via TIF_NOTIFY_RESUME
ids:               70052    // Updates of CPU and MM CID
cs:                    0    // Critical section analysis
clear:                 0    // Clearing of critical section
fixup:                 0    // Fixup of critical section (abort)

Before the rewrite this took more than a million of ID updates and
critical section evaluations even when completely pointless.

So on any syscall or interrupt heavy workload this should be clearly
visible as a difference in the profile.

I'm still not happy about the exit to user fast path decision as it's
two conditionals instead of one, but all attempts to do that lightweight
somewhere else turned out to make stuff worse as I just burdened other
fast path operations, i.e. the scheduler with pointless conditionals.

I'll think about that more, but nevertheless this is way better than the
current horror show.

I also have no real good plan yet how to gradually convert this over,
but I'm way too tired to think about that now.

It survives the self test suite after I wasted a day to figure out why
the selftests reliably segfault on a machine which has debian trixie
installed. The fix is in the branch.

Michael, can you please run your librseq tests against that too? They
have the same segfault problem as the kernel and they lack a run script,
so I couldn't be bothered to test against them. See commit 2bff3a0e5998
in that branch. I'll send out a patch with a proper change log later.

Thanks,

        tglx
Re: [patch 00/11] rseq: Optimize exit to user space
Posted by Mathieu Desnoyers 1 month, 2 weeks ago
On 2025-08-17 17:23, Thomas Gleixner wrote:
> On Thu, Aug 14 2025 at 00:08, Thomas Gleixner wrote:
>> On Wed, Aug 13 2025 at 15:36, Jens Axboe wrote:
>>> On 8/13/25 3:32 PM, Thomas Gleixner wrote:
>>>> Could you give it a test ride to see whether this makes a difference in
>>>> your environment?
>>>
>>> Yep, I'll give a spin.
>>
>> Appreciated.
> 
> Please do not use the git branch I had in the cover letter. I did some
> more analysis of this and it's even worse than I thought. Use
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git rseq/wip
> 
> instead.
> 
> I've rewritten the whole pile by now and made it a real fast path
> without the TIF_NOTIFY horror show, unless the fast path, which runs
> _after_ the TIF work loop faults. So far that happens once for each
> fork() as that has to fault in the copy of the user space rseq region.
> 

OK. I'll stop reviewing this version of the series and wait for an
updated version.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
Re: [patch 00/11] rseq: Optimize exit to user space
Posted by Michael Jeanson 1 month, 2 weeks ago
On 2025-08-17 17:23, Thomas Gleixner wrote:
> Michael, can you please run your librseq tests against that too? They
> have the same segfault problem as the kernel and they lack a run script,
> so I couldn't be bothered to test against them. See commit 2bff3a0e5998
> in that branch. I'll send out a patch with a proper change log later.

I ran the librseq test suite on the new branch on a Debian Trixie amd64 
system and it succeeds, here are the rseq stats before and after.

Before:

exit:             746809
signal:                3
slowp:                99
ids:                1053
cs:                    0
clear:                 0
fixup:                 0

After:

exit:          229294046
signal:               11
slowp:              4570
ids:              615950
cs:              2493682
clear:            194637
fixup:           2299044


And I also ran the same test suite in a 32bit chroot on the same system 
which also succeeds with the following rseq stats.

Before:

exit:             717945
signal:                1
slowp:               102
ids:                1039
cs:                    0
clear:                 0
fixup:                 0

After:

exit:          201051038
signal:                9
slowp:              4909
ids:              793551
cs:                28887
clear:             12871
fixup:             16016


If you want to run the librseq tests on your system, just do the regular 
autotools dance and then run 'make check'.

Regards,

Michael
Re: [patch 00/11] rseq: Optimize exit to user space
Posted by Thomas Gleixner 1 month, 2 weeks ago
On Mon, Aug 18 2025 at 13:38, Michael Jeanson wrote:

> On 2025-08-17 17:23, Thomas Gleixner wrote:
>> Michael, can you please run your librseq tests against that too? They
>> have the same segfault problem as the kernel and they lack a run script,
>> so I couldn't be bothered to test against them. See commit 2bff3a0e5998
>> in that branch. I'll send out a patch with a proper change log later.
>
> I ran the librseq test suite on the new branch on a Debian Trixie amd64 
> system and it succeeds, here are the rseq stats before and after.

Thanks!

> Before:
>
> exit:             746809
> signal:                3
> slowp:                99
> ids:                1053
> cs:                    0
> clear:                 0
> fixup:                 0
>
> After:
>
> exit:          229294046
> signal:               11
> slowp:              4570
> ids:              615950
> cs:              2493682
> clear:            194637
> fixup:           2299044

That looks about right. Can you reset the branch to

     commit 85b61b265635 ("rseq: Expose stats")

which is just adding primitive stats on top of the current mainline
code, and provide numbers for that too?

That gives you 'notify: , cpuid:, fixup:' numbers, which are not 1:1
mappable to the final ones, but that should give some interesting
insight.

> If you want to run the librseq tests on your system, just do the regular 
> autotools dance and then run 'make check'.

Might be useful to put such instructions into README.md, no?

Thanks,

        tglx
Re: [patch 00/11] rseq: Optimize exit to user space
Posted by Michael Jeanson 1 month, 2 weeks ago
On 2025-08-18 16:21, Thomas Gleixner wrote:
> That looks about right. Can you reset the branch to
> 
>       commit 85b61b265635 ("rseq: Expose stats")
> 
> which is just adding primitive stats on top of the current mainline
> code, and provide numbers for that too?
> 
> That gives you 'notify: , cpuid:, fixup:' numbers, which are not 1:1
> mappable to the final ones, but that should give some interesting
> insight.

For amd64 kernel and userspace.

Before:

notify:             12467
fixup:              12467
cpuid:              12467


After:

notify:         123669528
fixup:          123669528
cpuid:          123669528


For amd64 kernel, i386 userspace.

Before:

notify:             12857
fixup:              12857
cpuid:              12857


After:

notify:         120621210
fixup:          120621210
cpuid:          120621210


> Might be useful to put such instructions into README.md, no?

Will do.

Regards,

Michael
Re: [patch 00/11] rseq: Optimize exit to user space
Posted by Thomas Gleixner 1 month, 2 weeks ago
On Mon, Aug 18 2025 at 17:29, Michael Jeanson wrote:

> On 2025-08-18 16:21, Thomas Gleixner wrote:
>> That looks about right. Can you reset the branch to
>> 
>>       commit 85b61b265635 ("rseq: Expose stats")
>> 
>> which is just adding primitive stats on top of the current mainline
>> code, and provide numbers for that too?
>> 
>> That gives you 'notify: , cpuid:, fixup:' numbers, which are not 1:1
>> mappable to the final ones, but that should give some interesting
>> insight.
>
> For amd64 kernel and userspace.
>
> Before:
>
> notify:             12467
> fixup:              12467
> cpuid:              12467
>
>
> After:
>
> notify:         123669528
> fixup:          123669528
> cpuid:          123669528

That's insane compared to this:

> exit:          229294046
> signal:               11
> slowp:              4570
> ids:              615950
> cs:              2493682
> clear:            194637
> fixup:           2299044

You can assume that the number of exits (to user) is roughly the same,
i.e. ~23M, so 12M (> 50%) take the TIF_NOTIFY dance on the way out and
most of them for no good reason.

While with the rework only 4.5K go into the NOTIFY slow path and 2.5M
(10 %) do the critical section evaluation and 600k (~ 3%) update CPU/MM
CID.

No suprise that Jens is seeing this in his profiles...

Thanks,

        tglx
BUG: rseq selftests and librseq vs. glibc fail
Posted by Thomas Gleixner 1 month, 2 weeks ago
On Sun, Aug 17 2025 at 23:23, Thomas Gleixner wrote:
> It survives the self test suite after I wasted a day to figure out why
> the selftests reliably segfault on a machine which has debian trixie
> installed. The fix is in the branch.

That's glibc 2.41 FWIW. glibc 2.36 from Debian 12 does not have this
problem.

The fix unfortunately only works with a dynamically linked libc,
statically linked libc fails. The fix is basically a revert of

   3bcbc20942db ("selftests/rseq: Play nice with binaries statically linked
                  against glibc 2.35+")

which introduced these weak libc symbols to make static libc linking work.

I have no idea why this creates havoc, but in GDB I saw that libc
manages to overwrite the TLS of the pthread at some place, but I gave up
decoding it further. If no pthread is created it just works. Removing
this weak muck makes it work too.

It's trivial to reproduce. All it needs is to have in the source:

__weak ptrdiff_t __rseq_offset;

w/o even being referenced and creating a pthread. Reproducer below.

TBH, this interface is a horrible hack. libc should expose a proper
function for querying whether it has registered rseq and return the
parameters in a struct. But well...

Build reproducer with:

# gcc -O2 -Wall -o t test.c
# ./t
Segmentation fault

# gcc -O2 -Wall -o t test.c -static
# ./t
#

Removing the weak __rseq_offset makes the dynamic case work too.

Yours sufficiently grumpy

      tglx

----
#define _GNU_SOURCE
#include <pthread.h>
#include <stddef.h>

#define __weak  __attribute__((__weak__))
__weak ptrdiff_t __rseq_offset;

static void *foo(void *arg)
{
	return NULL;
}

int main(int argc, char **argv)
{
	pthread_t t;

	pthread_create(&t, NULL, foo, NULL);
	pthread_join(t, NULL);
	return 0;
}
Re: BUG: rseq selftests and librseq vs. glibc fail
Posted by Florian Weimer 1 month, 2 weeks ago
* Thomas Gleixner:

> On Sun, Aug 17 2025 at 23:23, Thomas Gleixner wrote:
>> It survives the self test suite after I wasted a day to figure out why
>> the selftests reliably segfault on a machine which has debian trixie
>> installed. The fix is in the branch.
>
> That's glibc 2.41 FWIW. glibc 2.36 from Debian 12 does not have this
> problem.
>
> The fix unfortunately only works with a dynamically linked libc,
> statically linked libc fails. The fix is basically a revert of
>
>    3bcbc20942db ("selftests/rseq: Play nice with binaries statically linked
>                   against glibc 2.35+")
>
> which introduced these weak libc symbols to make static libc linking work.
>
> I have no idea why this creates havoc, but in GDB I saw that libc
> manages to overwrite the TLS of the pthread at some place, but I gave up
> decoding it further. If no pthread is created it just works. Removing
> this weak muck makes it work too.
>
> It's trivial to reproduce. All it needs is to have in the source:
>
> __weak ptrdiff_t __rseq_offset;
>
> w/o even being referenced and creating a pthread. Reproducer below.

Well, that's sort of expected.  You can't define glibc symbols that are
not intended for interposition and expect things to work.  It's kind of
like writing:

int _rtld_global;

That's going to fail rather spectaculary, too.  We make an exception for
symbols that are not reserved (you can build in ISO C mode and define
open, close, etc., at least as long as you link to glibc only).  But
__rseq_offset is a reserved name, so that is not applicable here.

The real change here is GCC changing from -fcommon (which made a lot of
these things work in the past) to -fno-common.

Thanks,
Florian
Re: BUG: rseq selftests and librseq vs. glibc fail
Posted by Thomas Gleixner 1 month, 2 weeks ago
On Mon, Aug 18 2025 at 16:15, Florian Weimer wrote:
> * Thomas Gleixner:
>> It's trivial to reproduce. All it needs is to have in the source:
>>
>> __weak ptrdiff_t __rseq_offset;
>>
>> w/o even being referenced and creating a pthread. Reproducer below.
>
> Well, that's sort of expected.  You can't define glibc symbols that are
> not intended for interposition and expect things to work.  It's kind of
> like writing:
>
> int _rtld_global;
>
> That's going to fail rather spectaculary, too.  We make an exception for
> symbols that are not reserved (you can build in ISO C mode and define
> open, close, etc., at least as long as you link to glibc only).  But
> __rseq_offset is a reserved name, so that is not applicable here.
>
> The real change here is GCC changing from -fcommon (which made a lot of
> these things work in the past) to -fno-common.

Thanks for the explanation!

So the only way to make this actually work is to revert that commit and
the folks who want to link that statically need to come up with:

#ifdef _BUILD_STATICALLY
extern ....

#else
        ptr = dlsym(...);
#endif

or something daft like that. A proper function interface would avoid all
that nonsense, but we can't have nice things or can we?

Thanks,

        tglx
Re: BUG: rseq selftests and librseq vs. glibc fail
Posted by Prakash Sangappa 1 month ago

> On Aug 18, 2025, at 10:13 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> On Mon, Aug 18 2025 at 16:15, Florian Weimer wrote:
>> * Thomas Gleixner:
>>> It's trivial to reproduce. All it needs is to have in the source:
>>> 
>>> __weak ptrdiff_t __rseq_offset;
>>> 
>>> w/o even being referenced and creating a pthread. Reproducer below.
>> 
>> Well, that's sort of expected.  You can't define glibc symbols that are
>> not intended for interposition and expect things to work.  It's kind of
>> like writing:
>> 
>> int _rtld_global;
>> 
>> That's going to fail rather spectaculary, too.  We make an exception for
>> symbols that are not reserved (you can build in ISO C mode and define
>> open, close, etc., at least as long as you link to glibc only).  But
>> __rseq_offset is a reserved name, so that is not applicable here.
>> 
>> The real change here is GCC changing from -fcommon (which made a lot of
>> these things work in the past) to -fno-common.
> 
> Thanks for the explanation!
> 
> So the only way to make this actually work is to revert that commit and
> the folks who want to link that statically need to come up with:
> 
> #ifdef _BUILD_STATICALLY
> extern ....
> 
> #else
>        ptr = dlsym(...);
> #endif
> 
> or something daft like that. A proper function interface would avoid all
> that nonsense, but we can't have nice things or can we?


Could the rseq(2) syscall itself return the already registered rseq structure address?
Perhaps a new flag argument to the rseq(2) syscall to query the registered rseq address
or return the address of the already registered rseq structure when it fails to register a new one.

Application can call it when the call to register a rseq structure fails.

-Prakash

> 
> Thanks,
> 
>        tglx
> 

Re: BUG: rseq selftests and librseq vs. glibc fail
Posted by Mathieu Desnoyers 1 month ago
On 2025-08-29 14:44, Prakash Sangappa wrote:
> 
> 
>> On Aug 18, 2025, at 10:13 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> On Mon, Aug 18 2025 at 16:15, Florian Weimer wrote:
>>> * Thomas Gleixner:
>>>> It's trivial to reproduce. All it needs is to have in the source:
>>>>
>>>> __weak ptrdiff_t __rseq_offset;
>>>>
>>>> w/o even being referenced and creating a pthread. Reproducer below.
>>>
>>> Well, that's sort of expected.  You can't define glibc symbols that are
>>> not intended for interposition and expect things to work.  It's kind of
>>> like writing:
>>>
>>> int _rtld_global;
>>>
>>> That's going to fail rather spectaculary, too.  We make an exception for
>>> symbols that are not reserved (you can build in ISO C mode and define
>>> open, close, etc., at least as long as you link to glibc only).  But
>>> __rseq_offset is a reserved name, so that is not applicable here.
>>>
>>> The real change here is GCC changing from -fcommon (which made a lot of
>>> these things work in the past) to -fno-common.
>>
>> Thanks for the explanation!
>>
>> So the only way to make this actually work is to revert that commit and
>> the folks who want to link that statically need to come up with:
>>
>> #ifdef _BUILD_STATICALLY
>> extern ....
>>
>> #else
>>         ptr = dlsym(...);
>> #endif
>>
>> or something daft like that. A proper function interface would avoid all
>> that nonsense, but we can't have nice things or can we?
> 
> 
> Could the rseq(2) syscall itself return the already registered rseq structure address?
> Perhaps a new flag argument to the rseq(2) syscall to query the registered rseq address
> or return the address of the already registered rseq structure when it fails to register a new one.
> 
> Application can call it when the call to register a rseq structure fails.

There is a ptrace(2) PTRACE_GET_RSEQ_CONFIGURATION to achieve
something similar. I don't know if a dependency on ptrace would
be acceptable for that use-case though.

Thanks,

Mathieu



> 
> -Prakash
> 
>>
>> Thanks,
>>
>>         tglx
>>
> 


-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
Re: BUG: rseq selftests and librseq vs. glibc fail
Posted by Prakash Sangappa 1 month ago

> On Aug 29, 2025, at 11:50 AM, Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
> On 2025-08-29 14:44, Prakash Sangappa wrote:
>>> On Aug 18, 2025, at 10:13 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>>> 
>>> On Mon, Aug 18 2025 at 16:15, Florian Weimer wrote:
>>>> * Thomas Gleixner:
>>>>> It's trivial to reproduce. All it needs is to have in the source:
>>>>> 
>>>>> __weak ptrdiff_t __rseq_offset;
>>>>> 
>>>>> w/o even being referenced and creating a pthread. Reproducer below.
>>>> 
>>>> Well, that's sort of expected.  You can't define glibc symbols that are
>>>> not intended for interposition and expect things to work.  It's kind of
>>>> like writing:
>>>> 
>>>> int _rtld_global;
>>>> 
>>>> That's going to fail rather spectaculary, too.  We make an exception for
>>>> symbols that are not reserved (you can build in ISO C mode and define
>>>> open, close, etc., at least as long as you link to glibc only).  But
>>>> __rseq_offset is a reserved name, so that is not applicable here.
>>>> 
>>>> The real change here is GCC changing from -fcommon (which made a lot of
>>>> these things work in the past) to -fno-common.
>>> 
>>> Thanks for the explanation!
>>> 
>>> So the only way to make this actually work is to revert that commit and
>>> the folks who want to link that statically need to come up with:
>>> 
>>> #ifdef _BUILD_STATICALLY
>>> extern ....
>>> 
>>> #else
>>>        ptr = dlsym(...);
>>> #endif
>>> 
>>> or something daft like that. A proper function interface would avoid all
>>> that nonsense, but we can't have nice things or can we?
>> Could the rseq(2) syscall itself return the already registered rseq structure address?
>> Perhaps a new flag argument to the rseq(2) syscall to query the registered rseq address
>> or return the address of the already registered rseq structure when it fails to register a new one.
>> Application can call it when the call to register a rseq structure fails.
> 
> There is a ptrace(2) PTRACE_GET_RSEQ_CONFIGURATION to achieve
> something similar. I don't know if a dependency on ptrace would
> be acceptable for that use-case though.

Can a thread call ptrace(PTRACE_GET_RSEQ_CONFIGURATION,..) on itself? 

May be something similar can be added to rseq(2) .

Thanks,
-Prakash. 

> 
> Thanks,
> 
> Mathieu
> 
> 
> 
>> -Prakash
>>> 
>>> Thanks,
>>> 
>>>        tglx
>>> 
> 
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com

Re: BUG: rseq selftests and librseq vs. glibc fail
Posted by Florian Weimer 1 month, 2 weeks ago
* Thomas Gleixner:

> On Mon, Aug 18 2025 at 16:15, Florian Weimer wrote:
>> * Thomas Gleixner:
>>> It's trivial to reproduce. All it needs is to have in the source:
>>>
>>> __weak ptrdiff_t __rseq_offset;
>>>
>>> w/o even being referenced and creating a pthread. Reproducer below.
>>
>> Well, that's sort of expected.  You can't define glibc symbols that are
>> not intended for interposition and expect things to work.  It's kind of
>> like writing:
>>
>> int _rtld_global;
>>
>> That's going to fail rather spectaculary, too.  We make an exception for
>> symbols that are not reserved (you can build in ISO C mode and define
>> open, close, etc., at least as long as you link to glibc only).  But
>> __rseq_offset is a reserved name, so that is not applicable here.
>>
>> The real change here is GCC changing from -fcommon (which made a lot of
>> these things work in the past) to -fno-common.
>
> Thanks for the explanation!
>
> So the only way to make this actually work is to revert that commit and
> the folks who want to link that statically need to come up with:
>
> #ifdef _BUILD_STATICALLY
> extern ....
>
> #else
>         ptr = dlsym(...);
> #endif
>
> or something daft like that. A proper function interface would avoid all
> that nonsense, but we can't have nice things or can we?

I don't understand why a function would be different.  Well, a function
*declaration* would be implicitly extern, in a way a variable
declaration is not (without -fcommon).  Maybe it's just about the
missing extern keyword?

You could add the extern keyword and check &__rseq_offset for NULL if
you want to probe for the availability of the signal?  Or use:

#if __has_include(<sys/rseq.h>)
#include <sys/rseq.h>
/* Code that depends on glibc's rseq support goes here. */
#endif

Thanks,
Florian
Re: BUG: rseq selftests and librseq vs. glibc fail
Posted by Sean Christopherson 1 month, 2 weeks ago
On Mon, Aug 18, 2025, Florian Weimer wrote:
> * Thomas Gleixner:
> 
> > On Mon, Aug 18 2025 at 16:15, Florian Weimer wrote:
> >> * Thomas Gleixner:
> >>> It's trivial to reproduce. All it needs is to have in the source:
> >>>
> >>> __weak ptrdiff_t __rseq_offset;
> >>>
> >>> w/o even being referenced and creating a pthread. Reproducer below.
> >>
> >> Well, that's sort of expected.  You can't define glibc symbols that are
> >> not intended for interposition and expect things to work.  It's kind of
> >> like writing:
> >>
> >> int _rtld_global;
> >>
> >> That's going to fail rather spectaculary, too.  We make an exception for
> >> symbols that are not reserved (you can build in ISO C mode and define
> >> open, close, etc., at least as long as you link to glibc only).  But
> >> __rseq_offset is a reserved name, so that is not applicable here.
> >>
> >> The real change here is GCC changing from -fcommon (which made a lot of
> >> these things work in the past) to -fno-common.
> >
> > Thanks for the explanation!
> >
> > So the only way to make this actually work is to revert that commit and
> > the folks who want to link that statically need to come up with:
> >
> > #ifdef _BUILD_STATICALLY
> > extern ....
> >
> > #else
> >         ptr = dlsym(...);
> > #endif
> >
> > or something daft like that. A proper function interface would avoid all
> > that nonsense, but we can't have nice things or can we?
> 
> I don't understand why a function would be different.  Well, a function
> *declaration* would be implicitly extern, in a way a variable
> declaration is not (without -fcommon).  Maybe it's just about the
> missing extern keyword?
> 
> You could add the extern keyword and check &__rseq_offset for NULL if
> you want to probe for the availability of the signal?

That will fail to link if the glibc version doesn't support rseq in any capacity,
which is why I added the __weak crud.

>Or use:
> 
> #if __has_include(<sys/rseq.h>)
> #include <sys/rseq.h>
> /* Code that depends on glibc's rseq support goes here. */

FWIW, the code in question doesn't depend on rseq per se, rather the problem is
that attempting to register a restartable sequence fails if glibc has already
"claimed" rseq.

What about something horrific like this?  Or if __has_include(<sys/rseq.h>) is
preferrable to checking the glibc version, go with that.  The idea with checking
LIBC_RSEQ_STATIC_LINK is to give folks a way to force static linking if their
libc registers rseq, but doesn't satisfy the glibc version checks for whatever
reason.

diff --git a/tools/testing/selftests/rseq/rseq.c b/tools/testing/selftests/rseq/rseq.c
index 663a9cef1952..1a88352fcff3 100644
--- a/tools/testing/selftests/rseq/rseq.c
+++ b/tools/testing/selftests/rseq/rseq.c
@@ -36,17 +36,18 @@
 #include "../kselftest.h"
 #include "rseq.h"
 
-/*
- * Define weak versions to play nice with binaries that are statically linked
- * against a libc that doesn't support registering its own rseq.
- */
-__weak ptrdiff_t __rseq_offset;
-__weak unsigned int __rseq_size;
-__weak unsigned int __rseq_flags;
+#if defined(LIBC_RSEQ_STATIC_LINK) || __GLIBC__ > 2 || (__GLIBC__ == 2 && __GLIBC_MINOR__ >= 35)
+extern ptrdiff_t __rseq_offset;
+extern unsigned int __rseq_size;
+extern unsigned int __rseq_flags;
+#define GLIBC_RSEQ_PTR(x) &x
+#else
+#define GLIBC_RSEQ_PTR(x) NULL
+#endif
 
-static const ptrdiff_t *libc_rseq_offset_p = &__rseq_offset;
-static const unsigned int *libc_rseq_size_p = &__rseq_size;
-static const unsigned int *libc_rseq_flags_p = &__rseq_flags;
+static const ptrdiff_t *libc_rseq_offset_p = GLIBC_RSEQ_PTR(__rseq_offset);
+static const unsigned int *libc_rseq_size_p = GLIBC_RSEQ_PTR(__rseq_size);
+static const unsigned int *libc_rseq_flags_p = GLIBC_RSEQ_PTR(__rseq_flags);
 
 /* Offset from the thread pointer to the rseq area. */
 ptrdiff_t rseq_offset;
@@ -209,7 +210,7 @@ void rseq_init(void)
         * libc not having registered a restartable sequence.  Try to find the
         * symbols if that's the case.
         */
-       if (!*libc_rseq_size_p) {
+       if (!libc_rseq_size_p || !*libc_rseq_size_p) {
                libc_rseq_offset_p = dlsym(RTLD_NEXT, "__rseq_offset");
                libc_rseq_size_p = dlsym(RTLD_NEXT, "__rseq_size");
                libc_rseq_flags_p = dlsym(RTLD_NEXT, "__rseq_flags");
Re: BUG: rseq selftests and librseq vs. glibc fail
Posted by Florian Weimer 1 month, 2 weeks ago
* Sean Christopherson:

> On Mon, Aug 18, 2025, Florian Weimer wrote:
>> You could add the extern keyword and check &__rseq_offset for NULL if
>> you want to probe for the availability of the signal?
>
> That will fail to link if the glibc version doesn't support rseq in
> any capacity, which is why I added the __weak crud.

You need both (extern and weak) to get a weak symbol reference instead
of a weak symbol definition.  You still need to check &__rseq_offset, of
course.

>>Or use:
>> 
>> #if __has_include(<sys/rseq.h>)
>> #include <sys/rseq.h>
>> /* Code that depends on glibc's rseq support goes here. */
>
> FWIW, the code in question doesn't depend on rseq per se, rather the problem is
> that attempting to register a restartable sequence fails if glibc has already
> "claimed" rseq.

You can set GLIBC_TUNABLES=glibc.pthread.rseq=0 as an environment
variable to prevent glibc from registering it.  For a test that's
probably okay?  It won't help with other libcs that might use rseq
eventually.

> What about something horrific like this?  Or if __has_include(<sys/rseq.h>) is
> preferrable to checking the glibc version, go with that.  The idea with checking
> LIBC_RSEQ_STATIC_LINK is to give folks a way to force static linking if their
> libc registers rseq, but doesn't satisfy the glibc version checks for whatever
> reason.
>
> diff --git a/tools/testing/selftests/rseq/rseq.c b/tools/testing/selftests/rseq/rseq.c
> index 663a9cef1952..1a88352fcff3 100644
> --- a/tools/testing/selftests/rseq/rseq.c
> +++ b/tools/testing/selftests/rseq/rseq.c
> @@ -36,17 +36,18 @@
>  #include "../kselftest.h"
>  #include "rseq.h"
>  
> -/*
> - * Define weak versions to play nice with binaries that are statically linked
> - * against a libc that doesn't support registering its own rseq.
> - */
> -__weak ptrdiff_t __rseq_offset;
> -__weak unsigned int __rseq_size;
> -__weak unsigned int __rseq_flags;
> +#if defined(LIBC_RSEQ_STATIC_LINK) || __GLIBC__ > 2 || (__GLIBC__ == 2 && __GLIBC_MINOR__ >= 35)
> +extern ptrdiff_t __rseq_offset;
> +extern unsigned int __rseq_size;
> +extern unsigned int __rseq_flags;
> +#define GLIBC_RSEQ_PTR(x) &x
> +#else
> +#define GLIBC_RSEQ_PTR(x) NULL
> +#endif

That doesn't work for a couple of distributions nowadays that are
nominally based on glibc 2.34.  The AArch64 performance improvement for
sched_getcpu was just too sweet to give up, so these distributions have
__rseq_offset@@GLIBC_2.35, too.  We added these symbols specifically so
that applications that need to interact with rseq can do so safely, even
though glibc claimed it for the sched_getcpu acceleration.

Thanks,
Florian
Re: BUG: rseq selftests and librseq vs. glibc fail
Posted by Sean Christopherson 1 month, 2 weeks ago
On Mon, Aug 18, 2025, Florian Weimer wrote:
> * Sean Christopherson:
> 
> > On Mon, Aug 18, 2025, Florian Weimer wrote:
> >> You could add the extern keyword and check &__rseq_offset for NULL if
> >> you want to probe for the availability of the signal?
> >
> > That will fail to link if the glibc version doesn't support rseq in
> > any capacity, which is why I added the __weak crud.
> 
> You need both (extern and weak) to get a weak symbol reference instead
> of a weak symbol definition.  You still need to check &__rseq_offset, of
> course.

Ooh, you're saying add "extern" to the existing __weak symbol, not replace it.
Huh, TIL weak symbol references are a thing.

This works with static and dynamic linking, with and without an rseq-aware glibc.

Thomas, does this fix the problem you were seeing?

diff --git a/tools/testing/selftests/rseq/rseq.c b/tools/testing/selftests/rseq/rseq.c
index 663a9cef1952..d17ded120d48 100644
--- a/tools/testing/selftests/rseq/rseq.c
+++ b/tools/testing/selftests/rseq/rseq.c
@@ -40,9 +40,9 @@
  * Define weak versions to play nice with binaries that are statically linked
  * against a libc that doesn't support registering its own rseq.
  */
-__weak ptrdiff_t __rseq_offset;
-__weak unsigned int __rseq_size;
-__weak unsigned int __rseq_flags;
+extern __weak ptrdiff_t __rseq_offset;
+extern __weak unsigned int __rseq_size;
+extern __weak unsigned int __rseq_flags;
 
 static const ptrdiff_t *libc_rseq_offset_p = &__rseq_offset;
 static const unsigned int *libc_rseq_size_p = &__rseq_size;
@@ -209,7 +209,7 @@ void rseq_init(void)
         * libc not having registered a restartable sequence.  Try to find the
         * symbols if that's the case.
         */
-       if (!*libc_rseq_size_p) {
+       if (!libc_rseq_offset_p || !*libc_rseq_size_p) {
                libc_rseq_offset_p = dlsym(RTLD_NEXT, "__rseq_offset");
                libc_rseq_size_p = dlsym(RTLD_NEXT, "__rseq_size");
                libc_rseq_flags_p = dlsym(RTLD_NEXT, "__rseq_flags");

> >>Or use:
> >> 
> >> #if __has_include(<sys/rseq.h>)
> >> #include <sys/rseq.h>
> >> /* Code that depends on glibc's rseq support goes here. */
> >
> > FWIW, the code in question doesn't depend on rseq per se, rather the problem is
> > that attempting to register a restartable sequence fails if glibc has already
> > "claimed" rseq.
> 
> You can set GLIBC_TUNABLES=glibc.pthread.rseq=0 as an environment
> variable to prevent glibc from registering it.  For a test that's
> probably okay?  It won't help with other libcs that might use rseq
> eventually.

I vaguely recall exploring that option when trying to get static linking working.
I think I shied away from it because I wanted to have something that Just Worked
for all users, and didn't like the idea of hardcoding that into the KVM selftests
makefile.
Re: BUG: rseq selftests and librseq vs. glibc fail
Posted by Thomas Gleixner 1 month, 2 weeks ago
On Mon, Aug 18 2025 at 13:27, Sean Christopherson wrote:
> On Mon, Aug 18, 2025, Florian Weimer wrote:
>> You need both (extern and weak) to get a weak symbol reference instead
>> of a weak symbol definition.  You still need to check &__rseq_offset, of
>> course.
>
> Ooh, you're saying add "extern" to the existing __weak symbol, not replace it.
> Huh, TIL weak symbol references are a thing.
>
> This works with static and dynamic linking, with and without an rseq-aware glibc.
>
> Thomas, does this fix the problem you were seeing?
>
> diff --git a/tools/testing/selftests/rseq/rseq.c b/tools/testing/selftests/rseq/rseq.c
> index 663a9cef1952..d17ded120d48 100644
> --- a/tools/testing/selftests/rseq/rseq.c
> +++ b/tools/testing/selftests/rseq/rseq.c
> @@ -40,9 +40,9 @@
>   * Define weak versions to play nice with binaries that are statically linked
>   * against a libc that doesn't support registering its own rseq.
>   */
> -__weak ptrdiff_t __rseq_offset;
> -__weak unsigned int __rseq_size;
> -__weak unsigned int __rseq_flags;
> +extern __weak ptrdiff_t __rseq_offset;
> +extern __weak unsigned int __rseq_size;
> +extern __weak unsigned int __rseq_flags;
>  
>  static const ptrdiff_t *libc_rseq_offset_p = &__rseq_offset;
>  static const unsigned int *libc_rseq_size_p = &__rseq_size;
> @@ -209,7 +209,7 @@ void rseq_init(void)
>          * libc not having registered a restartable sequence.  Try to find the
>          * symbols if that's the case.
>          */
> -       if (!*libc_rseq_size_p) {
> +       if (!libc_rseq_offset_p || !*libc_rseq_size_p) {

If I make that:

+       if (!*libc_rseq_offset_p || !*libc_rseq_size_p) {

then it makes sense and actually works. The pointer can hardly be NULL,
even when statically linked, no?

Thanks,

        tglx
Re: BUG: rseq selftests and librseq vs. glibc fail
Posted by Sean Christopherson 1 month, 2 weeks ago
On Tue, Aug 19, 2025, Thomas Gleixner wrote:
> On Mon, Aug 18 2025 at 13:27, Sean Christopherson wrote:
> > On Mon, Aug 18, 2025, Florian Weimer wrote:
> >> You need both (extern and weak) to get a weak symbol reference instead
> >> of a weak symbol definition.  You still need to check &__rseq_offset, of
> >> course.
> >
> > Ooh, you're saying add "extern" to the existing __weak symbol, not replace it.
> > Huh, TIL weak symbol references are a thing.
> >
> > This works with static and dynamic linking, with and without an rseq-aware glibc.
> >
> > Thomas, does this fix the problem you were seeing?
> >
> > diff --git a/tools/testing/selftests/rseq/rseq.c b/tools/testing/selftests/rseq/rseq.c
> > index 663a9cef1952..d17ded120d48 100644
> > --- a/tools/testing/selftests/rseq/rseq.c
> > +++ b/tools/testing/selftests/rseq/rseq.c
> > @@ -40,9 +40,9 @@
> >   * Define weak versions to play nice with binaries that are statically linked
> >   * against a libc that doesn't support registering its own rseq.
> >   */
> > -__weak ptrdiff_t __rseq_offset;
> > -__weak unsigned int __rseq_size;
> > -__weak unsigned int __rseq_flags;
> > +extern __weak ptrdiff_t __rseq_offset;
> > +extern __weak unsigned int __rseq_size;
> > +extern __weak unsigned int __rseq_flags;
> >  
> >  static const ptrdiff_t *libc_rseq_offset_p = &__rseq_offset;
> >  static const unsigned int *libc_rseq_size_p = &__rseq_size;
> > @@ -209,7 +209,7 @@ void rseq_init(void)
> >          * libc not having registered a restartable sequence.  Try to find the
> >          * symbols if that's the case.
> >          */
> > -       if (!*libc_rseq_size_p) {
> > +       if (!libc_rseq_offset_p || !*libc_rseq_size_p) {

Doh, I meant to check libc_rseq_size_p for NULL, i.e.

	if (!libc_rseq_size_p || !*libc_rseq_size_p) {

> 
> If I make that:
> 
> +       if (!*libc_rseq_offset_p || !*libc_rseq_size_p) {
> 
> then it makes sense and actually works. The pointer can hardly be NULL,
> even when statically linked, no?

IIUC, it is indeed the pointers that are set to NULL/0, because for unresolved
symbols, the symbol itself, not its value, is set to '0'.  Which makes sense,
because if there is no symbol, then it can't have a value.

I.e. the address of the symbol is '0', and its value is undefined.

E.g. statically linking this against glibc without rseq support:

diff --git a/tools/testing/selftests/rseq/rseq.c b/tools/testing/selftests/rseq/rseq.c
index 663a9cef1952..959bdcb32e96 100644
--- a/tools/testing/selftests/rseq/rseq.c
+++ b/tools/testing/selftests/rseq/rseq.c
@@ -40,9 +40,9 @@
  * Define weak versions to play nice with binaries that are statically linked
  * against a libc that doesn't support registering its own rseq.
  */
-__weak ptrdiff_t __rseq_offset;
-__weak unsigned int __rseq_size;
-__weak unsigned int __rseq_flags;
+extern __weak ptrdiff_t __rseq_offset;
+extern __weak unsigned int __rseq_size;
+extern __weak unsigned int __rseq_flags;
 
 static const ptrdiff_t *libc_rseq_offset_p = &__rseq_offset;
 static const unsigned int *libc_rseq_size_p = &__rseq_size;
@@ -209,7 +209,12 @@ void rseq_init(void)
         * libc not having registered a restartable sequence.  Try to find the
         * symbols if that's the case.
         */
-       if (!*libc_rseq_size_p) {
+       printf("libc_rseq_offset_p = %lx (%lx), libc_rseq_size_p = %lx (%lx)\n",
+              (unsigned long)libc_rseq_offset_p, (unsigned long)libc_rseq_size_p,
+              (unsigned long)&__rseq_offset, (unsigned long)&__rseq_size);
+       printf("__rseq_size = %u\n", __rseq_size);
+
+       if (!libc_rseq_size_p || !*libc_rseq_size_p) {
                libc_rseq_offset_p = dlsym(RTLD_NEXT, "__rseq_offset");
                libc_rseq_size_p = dlsym(RTLD_NEXT, "__rseq_size");
                libc_rseq_flags_p = dlsym(RTLD_NEXT, "__rseq_flags");

Generates this output:

  $ ./rseq_test 
  libc_rseq_offset_p = 0 (0), libc_rseq_size_p = 0 (0)
  Segmentation fault

Because trying to dereference __rseq_size hits NULL/0.
Re: BUG: rseq selftests and librseq vs. glibc fail
Posted by Florian Weimer 1 month, 2 weeks ago
* Sean Christopherson:

>> If I make that:
>> 
>> +       if (!*libc_rseq_offset_p || !*libc_rseq_size_p) {
>> 
>> then it makes sense and actually works. The pointer can hardly be NULL,
>> even when statically linked, no?
>
> IIUC, it is indeed the pointers that are set to NULL/0, because for unresolved
> symbols, the symbol itself, not its value, is set to '0'.  Which makes sense,
> because if there is no symbol, then it can't have a value.

Right, that's how weak symbol references work.

Thanks,
Florian