tools/testing/selftests/x86/helpers.h | 34 ++++++++++++++++++++++++ tools/testing/selftests/x86/sysret_rip.c | 12 ++++++--- 2 files changed, 43 insertions(+), 3 deletions(-)
The existing 'sysret_rip' selftest asserts that 'regs->r11 ==
regs->flags'. This check relies on the behavior of the SYSCALL
instruction on legacy x86_64, which saves 'RFLAGS' into 'R11'.
However, on systems with FRED (Flexible Return and Event Delivery)
enabled, instead of using registers, all state is saved onto the stack.
Consequently, 'R11' retains its userspace value, causing the assertion
to fail.
Fix this by detecting if FRED is enabled and skipping the register
assertion in that case. The detection is done by checking if the RPL
bits of the GS selector are preserved after a hardware exception.
IDT (via IRET) clears the RPL bits of NULL selectors, while FRED (via
ERETU) preserves them.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Yi Lai <yi1.lai@intel.com>
---
v3:
- Move is_fred_enabled() to helpers.h for other x86 selftests to use.
Rename empty_handler to fred_handler to avoid symbol conflicts.
v2:
- Replaced CPUID check with a runtime probe using INT3 and GS RPL
preservation to robustly detect active FRED usage (Suggested by
Andrew Cooper).
tools/testing/selftests/x86/helpers.h | 34 ++++++++++++++++++++++++
tools/testing/selftests/x86/sysret_rip.c | 12 ++++++---
2 files changed, 43 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/x86/helpers.h b/tools/testing/selftests/x86/helpers.h
index 4c747a1278d9..4d09ed97aaac 100644
--- a/tools/testing/selftests/x86/helpers.h
+++ b/tools/testing/selftests/x86/helpers.h
@@ -4,6 +4,7 @@
#include <signal.h>
#include <string.h>
+#include <stdbool.h>
#include <asm/processor-flags.h>
@@ -50,4 +51,37 @@ static inline void clearhandler(int sig)
ksft_exit_fail_msg("sigaction failed");
}
+static inline void fred_handler(int sig, siginfo_t *info, void *ctx_void)
+{
+}
+
+static inline bool is_fred_enabled(void)
+{
+ unsigned short gs_val;
+
+ sethandler(SIGTRAP, fred_handler, 0);
+
+ /*
+ * Distinguish IDT and FRED mode by loading GS with a non-zero RPL and
+ * triggering an exception:
+ * IDT (IRET) clears RPL bits of NULL selectors.
+ * FRED (ERETU) preserves them.
+ *
+ * If GS is loaded with 3 (Index=0, RPL=3), trigger an exception:
+ * IDT should restore GS as 0.
+ * FRED should preserve GS as 3.
+ */
+ asm volatile (
+ "mov %[rpl3], %%gs\n\t"
+ "int3\n\t"
+ "mov %%gs, %[res]"
+ : [res] "=r" (gs_val)
+ : [rpl3] "r" (3)
+ );
+
+ clearhandler(SIGTRAP);
+
+ return gs_val == 3;
+}
+
#endif /* __SELFTESTS_X86_HELPERS_H */
diff --git a/tools/testing/selftests/x86/sysret_rip.c b/tools/testing/selftests/x86/sysret_rip.c
index 2e423a335e1c..30b195266779 100644
--- a/tools/testing/selftests/x86/sysret_rip.c
+++ b/tools/testing/selftests/x86/sysret_rip.c
@@ -64,9 +64,15 @@ static void sigusr1(int sig, siginfo_t *info, void *ctx_void)
ctx->uc_mcontext.gregs[REG_RIP] = rip;
ctx->uc_mcontext.gregs[REG_RCX] = rip;
- /* R11 and EFLAGS should already match. */
- assert(ctx->uc_mcontext.gregs[REG_EFL] ==
- ctx->uc_mcontext.gregs[REG_R11]);
+ /*
+ * SYSCALL works differently on FRED, it does not save RIP and RFLAGS
+ * to RCX and R11.
+ */
+ if (!is_fred_enabled()) {
+ /* R11 and EFLAGS should already match. */
+ assert(ctx->uc_mcontext.gregs[REG_EFL] ==
+ ctx->uc_mcontext.gregs[REG_R11]);
+ }
sethandler(SIGSEGV, sigsegv_for_sigreturn_test, SA_RESETHAND);
}
--
2.43.0
On Thu, Mar 26, 2026, at 2:44 AM, Yi Lai wrote: > The existing 'sysret_rip' selftest asserts that 'regs->r11 == > regs->flags'. This check relies on the behavior of the SYSCALL > instruction on legacy x86_64, which saves 'RFLAGS' into 'R11'. > > However, on systems with FRED (Flexible Return and Event Delivery) > enabled, instead of using registers, all state is saved onto the stack. > Consequently, 'R11' retains its userspace value, causing the assertion > to fail. > > Fix this by detecting if FRED is enabled and skipping the register > assertion in that case. The detection is done by checking if the RPL > bits of the GS selector are preserved after a hardware exception. > IDT (via IRET) clears the RPL bits of NULL selectors, while FRED (via > ERETU) preserves them. > I don't really like this. I think we have two credible choices: 1. Define the Linux ABI to be that, on FRED systems, SYSCALL preserves R11 and RCX on entry and exit. And update the test to actually test this. 2. Define the Linux ABI to be what it has been for quite a few years: SYSCALL entry copies RFLAGS to R11 and RIP to RCX and SYSCALL exit preserves all registers. I'm in favor of #2. People love making new programming languages and runtimes and inline asm and, these days, vibe coded crap. And it's *easier* to emit a SYSCALL and forget to tell the compiler / code generator that RCX and R11 are clobbered than it is to remember that they're clobbered. And it's easy to test on FRED (well, not really, but it hopefully will be some day) and it's easy to publish one's code, and then everyone is a bit screwed when the resulting program crashes sometimes on non-FRED systems. And it will be miserable to debug. (It's *really* *really* easy to screw this up in a way that sort of works even on non-FRED: RCX and R11 are usually clobbered across function calls, so one can get into a situation in which one's generated code usually doesn't require that SYSCALL preserve one of these registers until an inlining decision changes or some code gets reordered, and then it will start failing. And making the failure depend on hardware details is just nasty. So I think we should add the ~2 lines of code to fix the SYSCALL entry on FRED to match non-FRED. --Andy
On Thu, Mar 26, 2026 at 03:06:05PM -0700, Andy Lutomirski wrote: > > > On Thu, Mar 26, 2026, at 2:44 AM, Yi Lai wrote: > > The existing 'sysret_rip' selftest asserts that 'regs->r11 == > > regs->flags'. This check relies on the behavior of the SYSCALL > > instruction on legacy x86_64, which saves 'RFLAGS' into 'R11'. > > > > However, on systems with FRED (Flexible Return and Event Delivery) > > enabled, instead of using registers, all state is saved onto the stack. > > Consequently, 'R11' retains its userspace value, causing the assertion > > to fail. > > > > Fix this by detecting if FRED is enabled and skipping the register > > assertion in that case. The detection is done by checking if the RPL > > bits of the GS selector are preserved after a hardware exception. > > IDT (via IRET) clears the RPL bits of NULL selectors, while FRED (via > > ERETU) preserves them. > > > > I don't really like this. I think we have two credible choices: > > 1. Define the Linux ABI to be that, on FRED systems, SYSCALL preserves > R11 and RCX on entry and exit. And update the test to actually test > this. > > 2. Define the Linux ABI to be what it has been for quite a few years: > SYSCALL entry copies RFLAGS to R11 and RIP to RCX and SYSCALL exit > preserves all registers. > > I'm in favor of #2. People love making new programming languages and > runtimes and inline asm and, these days, vibe coded crap. And it's > *easier* to emit a SYSCALL and forget to tell the compiler / code > generator that RCX and R11 are clobbered than it is to remember that > they're clobbered. And it's easy to test on FRED (well, not really, > but it hopefully will be some day) and it's easy to publish one's > code, and then everyone is a bit screwed when the resulting program > crashes sometimes on non-FRED systems. And it will be miserable to > debug. > > (It's *really* *really* easy to screw this up in a way that sort of > works even on non-FRED: RCX and R11 are usually clobbered across > function calls, so one can get into a situation in which one's > generated code usually doesn't require that SYSCALL preserve one of > these registers until an inlining decision changes or some code gets > reordered, and then it will start failing. And making the failure > depend on hardware details is just nasty. > > So I think we should add the ~2 lines of code to fix the SYSCALL entry > on FRED to match non-FRED. Yes; I'm afraid I have to concur. Preserving the clobber on entry for FRED systems is by far the safest choice. Aside from this selftest, fancy debuggers and anything that can transfer userspace state between machines might be 'surprised'.
On Fri, Mar 27, 2026 at 01:33:15PM +0100, Peter Zijlstra wrote: > On Thu, Mar 26, 2026 at 03:06:05PM -0700, Andy Lutomirski wrote: > > > > > > On Thu, Mar 26, 2026, at 2:44 AM, Yi Lai wrote: > > > The existing 'sysret_rip' selftest asserts that 'regs->r11 == > > > regs->flags'. This check relies on the behavior of the SYSCALL > > > instruction on legacy x86_64, which saves 'RFLAGS' into 'R11'. > > > > > > However, on systems with FRED (Flexible Return and Event Delivery) > > > enabled, instead of using registers, all state is saved onto the stack. > > > Consequently, 'R11' retains its userspace value, causing the assertion > > > to fail. > > > > > > Fix this by detecting if FRED is enabled and skipping the register > > > assertion in that case. The detection is done by checking if the RPL > > > bits of the GS selector are preserved after a hardware exception. > > > IDT (via IRET) clears the RPL bits of NULL selectors, while FRED (via > > > ERETU) preserves them. > > > > > > > I don't really like this. I think we have two credible choices: > > > > 1. Define the Linux ABI to be that, on FRED systems, SYSCALL preserves > > R11 and RCX on entry and exit. And update the test to actually test > > this. > > > > 2. Define the Linux ABI to be what it has been for quite a few years: > > SYSCALL entry copies RFLAGS to R11 and RIP to RCX and SYSCALL exit > > preserves all registers. > > > > I'm in favor of #2. People love making new programming languages and > > runtimes and inline asm and, these days, vibe coded crap. And it's > > *easier* to emit a SYSCALL and forget to tell the compiler / code > > generator that RCX and R11 are clobbered than it is to remember that > > they're clobbered. And it's easy to test on FRED (well, not really, > > but it hopefully will be some day) and it's easy to publish one's > > code, and then everyone is a bit screwed when the resulting program > > crashes sometimes on non-FRED systems. And it will be miserable to > > debug. > > > > (It's *really* *really* easy to screw this up in a way that sort of > > works even on non-FRED: RCX and R11 are usually clobbered across > > function calls, so one can get into a situation in which one's > > generated code usually doesn't require that SYSCALL preserve one of > > these registers until an inlining decision changes or some code gets > > reordered, and then it will start failing. And making the failure > > depend on hardware details is just nasty. > > > > So I think we should add the ~2 lines of code to fix the SYSCALL entry > > on FRED to match non-FRED. > > Yes; I'm afraid I have to concur. Preserving the clobber on entry for > FRED systems is by far the safest choice. > > Aside from this selftest, fancy debuggers and anything that can transfer > userspace state between machines might be 'surprised'. Thanks Andy and Peter. Indeed, making the selftest branch on FRED vs. non-FRED behavior is not a good practice. The selftest should validate ABI consistency. I agree with Andy's option #2, so this should be fixed in the FRED syscall entry implementation. Li Xin, does this direction look right to you? I can assit with validation and keep the selftest aligned with the agreed ABI. Regards, Yi Lai
>>>> The existing 'sysret_rip' selftest asserts that 'regs->r11 == >>>> regs->flags'. This check relies on the behavior of the SYSCALL >>>> instruction on legacy x86_64, which saves 'RFLAGS' into 'R11'. >>>> >>>> However, on systems with FRED (Flexible Return and Event Delivery) >>>> enabled, instead of using registers, all state is saved onto the stack. >>>> Consequently, 'R11' retains its userspace value, causing the assertion >>>> to fail. >>>> >>>> Fix this by detecting if FRED is enabled and skipping the register >>>> assertion in that case. The detection is done by checking if the RPL >>>> bits of the GS selector are preserved after a hardware exception. >>>> IDT (via IRET) clears the RPL bits of NULL selectors, while FRED (via >>>> ERETU) preserves them. >>>> >>> >>> I don't really like this. I think we have two credible choices: >>> >>> 1. Define the Linux ABI to be that, on FRED systems, SYSCALL preserves >>> R11 and RCX on entry and exit. And update the test to actually test >>> this. >>> >>> 2. Define the Linux ABI to be what it has been for quite a few years: >>> SYSCALL entry copies RFLAGS to R11 and RIP to RCX and SYSCALL exit >>> preserves all registers. >>> >>> I'm in favor of #2. People love making new programming languages and >>> runtimes and inline asm and, these days, vibe coded crap. And it's >>> *easier* to emit a SYSCALL and forget to tell the compiler / code >>> generator that RCX and R11 are clobbered than it is to remember that >>> they're clobbered. And it's easy to test on FRED (well, not really, >>> but it hopefully will be some day) and it's easy to publish one's >>> code, and then everyone is a bit screwed when the resulting program >>> crashes sometimes on non-FRED systems. And it will be miserable to >>> debug. >>> >>> (It's *really* *really* easy to screw this up in a way that sort of >>> works even on non-FRED: RCX and R11 are usually clobbered across >>> function calls, so one can get into a situation in which one's >>> generated code usually doesn't require that SYSCALL preserve one of >>> these registers until an inlining decision changes or some code gets >>> reordered, and then it will start failing. And making the failure >>> depend on hardware details is just nasty. >>> >>> So I think we should add the ~2 lines of code to fix the SYSCALL entry >>> on FRED to match non-FRED. >> >> Yes; I'm afraid I have to concur. Preserving the clobber on entry for >> FRED systems is by far the safest choice. >> >> Aside from this selftest, fancy debuggers and anything that can transfer >> userspace state between machines might be 'surprised'. > > Thanks Andy and Peter. > > Indeed, making the selftest branch on FRED vs. non-FRED behavior > is not a good practice. The selftest should validate ABI consistency. > > I agree with Andy's option #2, so this should be fixed in the FRED > syscall entry implementation. > > Li Xin, does this direction look right to you? I can assit with > validation and keep the selftest aligned with the agreed ABI. > Yes, consistency should take precedence over hardware-specific variations. I would like to hear from Andrew Cooper and hpa before we do it.
> On Mar 30, 2026, at 11:03 PM, Xin Li <xin@zytor.com> wrote:
>
>
>>>>> The existing 'sysret_rip' selftest asserts that 'regs->r11 ==
>>>>> regs->flags'. This check relies on the behavior of the SYSCALL
>>>>> instruction on legacy x86_64, which saves 'RFLAGS' into 'R11'.
>>>>>
>>>>> However, on systems with FRED (Flexible Return and Event Delivery)
>>>>> enabled, instead of using registers, all state is saved onto the stack.
>>>>> Consequently, 'R11' retains its userspace value, causing the assertion
>>>>> to fail.
>>>>>
>>>>> Fix this by detecting if FRED is enabled and skipping the register
>>>>> assertion in that case. The detection is done by checking if the RPL
>>>>> bits of the GS selector are preserved after a hardware exception.
>>>>> IDT (via IRET) clears the RPL bits of NULL selectors, while FRED (via
>>>>> ERETU) preserves them.
>>>>>
>>>>
>>>> I don't really like this. I think we have two credible choices:
>>>>
>>>> 1. Define the Linux ABI to be that, on FRED systems, SYSCALL preserves
>>>> R11 and RCX on entry and exit. And update the test to actually test
>>>> this.
>>>>
>>>> 2. Define the Linux ABI to be what it has been for quite a few years:
>>>> SYSCALL entry copies RFLAGS to R11 and RIP to RCX and SYSCALL exit
>>>> preserves all registers.
>>>>
>>>> I'm in favor of #2. People love making new programming languages and
>>>> runtimes and inline asm and, these days, vibe coded crap. And it's
>>>> *easier* to emit a SYSCALL and forget to tell the compiler / code
>>>> generator that RCX and R11 are clobbered than it is to remember that
>>>> they're clobbered. And it's easy to test on FRED (well, not really,
>>>> but it hopefully will be some day) and it's easy to publish one's
>>>> code, and then everyone is a bit screwed when the resulting program
>>>> crashes sometimes on non-FRED systems. And it will be miserable to
>>>> debug.
>>>>
>>>> (It's *really* *really* easy to screw this up in a way that sort of
>>>> works even on non-FRED: RCX and R11 are usually clobbered across
>>>> function calls, so one can get into a situation in which one's
>>>> generated code usually doesn't require that SYSCALL preserve one of
>>>> these registers until an inlining decision changes or some code gets
>>>> reordered, and then it will start failing. And making the failure
>>>> depend on hardware details is just nasty.
>>>>
>>>> So I think we should add the ~2 lines of code to fix the SYSCALL entry
>>>> on FRED to match non-FRED.
>>>
>>> Yes; I'm afraid I have to concur. Preserving the clobber on entry for
>>> FRED systems is by far the safest choice.
>>>
>>> Aside from this selftest, fancy debuggers and anything that can transfer
>>> userspace state between machines might be 'surprised'.
>>
>> Thanks Andy and Peter.
>>
>> Indeed, making the selftest branch on FRED vs. non-FRED behavior
>> is not a good practice. The selftest should validate ABI consistency.
>>
>> I agree with Andy's option #2, so this should be fixed in the FRED
>> syscall entry implementation.
>>
>> Li Xin, does this direction look right to you? I can assit with
>> validation and keep the selftest aligned with the agreed ABI.
>>
>
> Yes, consistency should take precedence over hardware-specific variations.
>
> I would like to hear from Andrew Cooper and hpa before we do it.
Per Andy’s suggestion, the change would be:
diff --git a/arch/x86/entry/entry_fred.c b/arch/x86/entry/entry_fred.c
index 88c757ac8ccd..a19898747a2c 100644
--- a/arch/x86/entry/entry_fred.c
+++ b/arch/x86/entry/entry_fred.c
@@ -79,6 +79,9 @@ static __always_inline void fred_other(struct pt_regs *regs)
{
/* The compiler can fold these conditions into a single test */
if (likely(regs->fred_ss.vector == FRED_SYSCALL && regs->fred_ss.l)) {
+ regs->cx = regs->ip;
+ regs->r11 = regs->flags;
+
regs->orig_ax = regs->ax;
regs->ax = -ENOSYS;
do_syscall_64(regs, regs->orig_ax);
It adds 4 extra MOVs on this hot path, but I don’t see it's a problem here.
On March 31, 2026 6:59:06 PM PDT, Xin Li <xin@zytor.com> wrote:
>
>
>> On Mar 30, 2026, at 11:03 PM, Xin Li <xin@zytor.com> wrote:
>>
>>
>>>>>> The existing 'sysret_rip' selftest asserts that 'regs->r11 ==
>>>>>> regs->flags'. This check relies on the behavior of the SYSCALL
>>>>>> instruction on legacy x86_64, which saves 'RFLAGS' into 'R11'.
>>>>>>
>>>>>> However, on systems with FRED (Flexible Return and Event Delivery)
>>>>>> enabled, instead of using registers, all state is saved onto the stack.
>>>>>> Consequently, 'R11' retains its userspace value, causing the assertion
>>>>>> to fail.
>>>>>>
>>>>>> Fix this by detecting if FRED is enabled and skipping the register
>>>>>> assertion in that case. The detection is done by checking if the RPL
>>>>>> bits of the GS selector are preserved after a hardware exception.
>>>>>> IDT (via IRET) clears the RPL bits of NULL selectors, while FRED (via
>>>>>> ERETU) preserves them.
>>>>>>
>>>>>
>>>>> I don't really like this. I think we have two credible choices:
>>>>>
>>>>> 1. Define the Linux ABI to be that, on FRED systems, SYSCALL preserves
>>>>> R11 and RCX on entry and exit. And update the test to actually test
>>>>> this.
>>>>>
>>>>> 2. Define the Linux ABI to be what it has been for quite a few years:
>>>>> SYSCALL entry copies RFLAGS to R11 and RIP to RCX and SYSCALL exit
>>>>> preserves all registers.
>>>>>
>>>>> I'm in favor of #2. People love making new programming languages and
>>>>> runtimes and inline asm and, these days, vibe coded crap. And it's
>>>>> *easier* to emit a SYSCALL and forget to tell the compiler / code
>>>>> generator that RCX and R11 are clobbered than it is to remember that
>>>>> they're clobbered. And it's easy to test on FRED (well, not really,
>>>>> but it hopefully will be some day) and it's easy to publish one's
>>>>> code, and then everyone is a bit screwed when the resulting program
>>>>> crashes sometimes on non-FRED systems. And it will be miserable to
>>>>> debug.
>>>>>
>>>>> (It's *really* *really* easy to screw this up in a way that sort of
>>>>> works even on non-FRED: RCX and R11 are usually clobbered across
>>>>> function calls, so one can get into a situation in which one's
>>>>> generated code usually doesn't require that SYSCALL preserve one of
>>>>> these registers until an inlining decision changes or some code gets
>>>>> reordered, and then it will start failing. And making the failure
>>>>> depend on hardware details is just nasty.
>>>>>
>>>>> So I think we should add the ~2 lines of code to fix the SYSCALL entry
>>>>> on FRED to match non-FRED.
>>>>
>>>> Yes; I'm afraid I have to concur. Preserving the clobber on entry for
>>>> FRED systems is by far the safest choice.
>>>>
>>>> Aside from this selftest, fancy debuggers and anything that can transfer
>>>> userspace state between machines might be 'surprised'.
>>>
>>> Thanks Andy and Peter.
>>>
>>> Indeed, making the selftest branch on FRED vs. non-FRED behavior
>>> is not a good practice. The selftest should validate ABI consistency.
>>>
>>> I agree with Andy's option #2, so this should be fixed in the FRED
>>> syscall entry implementation.
>>>
>>> Li Xin, does this direction look right to you? I can assit with
>>> validation and keep the selftest aligned with the agreed ABI.
>>>
>>
>> Yes, consistency should take precedence over hardware-specific variations.
>>
>> I would like to hear from Andrew Cooper and hpa before we do it.
>
>Per Andy’s suggestion, the change would be:
>
>diff --git a/arch/x86/entry/entry_fred.c b/arch/x86/entry/entry_fred.c
>index 88c757ac8ccd..a19898747a2c 100644
>--- a/arch/x86/entry/entry_fred.c
>+++ b/arch/x86/entry/entry_fred.c
>@@ -79,6 +79,9 @@ static __always_inline void fred_other(struct pt_regs *regs)
> {
> /* The compiler can fold these conditions into a single test */
> if (likely(regs->fred_ss.vector == FRED_SYSCALL && regs->fred_ss.l)) {
>+ regs->cx = regs->ip;
>+ regs->r11 = regs->flags;
>+
> regs->orig_ax = regs->ax;
> regs->ax = -ENOSYS;
> do_syscall_64(regs, regs->orig_ax);
>
>It adds 4 extra MOVs on this hot path, but I don’t see it's a problem here.
>
>
>
>
>
>
>
We discussed this over a year ago, and at that point agreed that reserving the register was the desired behavior. Why has this changed now?
Thanks!
Xin
> On Mar 31, 2026, at 8:15 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>
> On March 31, 2026 6:59:06 PM PDT, Xin Li <xin@zytor.com> wrote:
>>
>>
>>>> On Mar 30, 2026, at 11:03 PM, Xin Li <xin@zytor.com> wrote:
>>>
>>>
>>>>>>> The existing 'sysret_rip' selftest asserts that 'regs->r11 ==
>>>>>>> regs->flags'. This check relies on the behavior of the SYSCALL
>>>>>>> instruction on legacy x86_64, which saves 'RFLAGS' into 'R11'.
>>>>>>>
>>>>>>> However, on systems with FRED (Flexible Return and Event Delivery)
>>>>>>> enabled, instead of using registers, all state is saved onto the stack.
>>>>>>> Consequently, 'R11' retains its userspace value, causing the assertion
>>>>>>> to fail.
>>>>>>>
>>>>>>> Fix this by detecting if FRED is enabled and skipping the register
>>>>>>> assertion in that case. The detection is done by checking if the RPL
>>>>>>> bits of the GS selector are preserved after a hardware exception.
>>>>>>> IDT (via IRET) clears the RPL bits of NULL selectors, while FRED (via
>>>>>>> ERETU) preserves them.
>>>>>>>
>>>>>>
>>>>>> I don't really like this. I think we have two credible choices:
>>>>>>
>>>>>> 1. Define the Linux ABI to be that, on FRED systems, SYSCALL preserves
>>>>>> R11 and RCX on entry and exit. And update the test to actually test
>>>>>> this.
>>>>>>
>>>>>> 2. Define the Linux ABI to be what it has been for quite a few years:
>>>>>> SYSCALL entry copies RFLAGS to R11 and RIP to RCX and SYSCALL exit
>>>>>> preserves all registers.
>>>>>>
>>>>>> I'm in favor of #2. People love making new programming languages and
>>>>>> runtimes and inline asm and, these days, vibe coded crap. And it's
>>>>>> *easier* to emit a SYSCALL and forget to tell the compiler / code
>>>>>> generator that RCX and R11 are clobbered than it is to remember that
>>>>>> they're clobbered. And it's easy to test on FRED (well, not really,
>>>>>> but it hopefully will be some day) and it's easy to publish one's
>>>>>> code, and then everyone is a bit screwed when the resulting program
>>>>>> crashes sometimes on non-FRED systems. And it will be miserable to
>>>>>> debug.
>>>>>>
>>>>>> (It's *really* *really* easy to screw this up in a way that sort of
>>>>>> works even on non-FRED: RCX and R11 are usually clobbered across
>>>>>> function calls, so one can get into a situation in which one's
>>>>>> generated code usually doesn't require that SYSCALL preserve one of
>>>>>> these registers until an inlining decision changes or some code gets
>>>>>> reordered, and then it will start failing. And making the failure
>>>>>> depend on hardware details is just nasty.
>>>>>>
>>>>>> So I think we should add the ~2 lines of code to fix the SYSCALL entry
>>>>>> on FRED to match non-FRED.
>>>>>
>>>>> Yes; I'm afraid I have to concur. Preserving the clobber on entry for
>>>>> FRED systems is by far the safest choice.
>>>>>
>>>>> Aside from this selftest, fancy debuggers and anything that can transfer
>>>>> userspace state between machines might be 'surprised'.
>>>>
>>>> Thanks Andy and Peter.
>>>>
>>>> Indeed, making the selftest branch on FRED vs. non-FRED behavior
>>>> is not a good practice. The selftest should validate ABI consistency.
>>>>
>>>> I agree with Andy's option #2, so this should be fixed in the FRED
>>>> syscall entry implementation.
>>>>
>>>> Li Xin, does this direction look right to you? I can assit with
>>>> validation and keep the selftest aligned with the agreed ABI.
>>>>
>>>
>>> Yes, consistency should take precedence over hardware-specific variations.
>>>
>>> I would like to hear from Andrew Cooper and hpa before we do it.
>>
>> Per Andy’s suggestion, the change would be:
>>
>> diff --git a/arch/x86/entry/entry_fred.c b/arch/x86/entry/entry_fred.c
>> index 88c757ac8ccd..a19898747a2c 100644
>> --- a/arch/x86/entry/entry_fred.c
>> +++ b/arch/x86/entry/entry_fred.c
>> @@ -79,6 +79,9 @@ static __always_inline void fred_other(struct pt_regs *regs)
>> {
>> /* The compiler can fold these conditions into a single test */
>> if (likely(regs->fred_ss.vector == FRED_SYSCALL && regs->fred_ss.l)) {
>> + regs->cx = regs->ip;
>> + regs->r11 = regs->flags;
>> +
>> regs->orig_ax = regs->ax;
>> regs->ax = -ENOSYS;
>> do_syscall_64(regs, regs->orig_ax);
>>
>> It adds 4 extra MOVs on this hot path, but I don’t see it's a problem here.
>
> We discussed this over a year ago, and at that point agreed that reserving the register was the desired behavior. Why has this changed now?
Yes, that is technically cleaner.
The question is, is the RCX/R11 clobbering behavior an established architectural contract, or is it an implementation detail that software ignores?
I think Andy and Peter want to be on the safer side, which kind of assumes that this is established.
On April 1, 2026 7:36:48 AM PDT, Xin Li <xin@zytor.com> wrote:
>
>Thanks!
>Xin
>
>> On Mar 31, 2026, at 8:15 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>
>> On March 31, 2026 6:59:06 PM PDT, Xin Li <xin@zytor.com> wrote:
>>>
>>>
>>>>> On Mar 30, 2026, at 11:03 PM, Xin Li <xin@zytor.com> wrote:
>>>>
>>>>
>>>>>>>> The existing 'sysret_rip' selftest asserts that 'regs->r11 ==
>>>>>>>> regs->flags'. This check relies on the behavior of the SYSCALL
>>>>>>>> instruction on legacy x86_64, which saves 'RFLAGS' into 'R11'.
>>>>>>>>
>>>>>>>> However, on systems with FRED (Flexible Return and Event Delivery)
>>>>>>>> enabled, instead of using registers, all state is saved onto the stack.
>>>>>>>> Consequently, 'R11' retains its userspace value, causing the assertion
>>>>>>>> to fail.
>>>>>>>>
>>>>>>>> Fix this by detecting if FRED is enabled and skipping the register
>>>>>>>> assertion in that case. The detection is done by checking if the RPL
>>>>>>>> bits of the GS selector are preserved after a hardware exception.
>>>>>>>> IDT (via IRET) clears the RPL bits of NULL selectors, while FRED (via
>>>>>>>> ERETU) preserves them.
>>>>>>>>
>>>>>>>
>>>>>>> I don't really like this. I think we have two credible choices:
>>>>>>>
>>>>>>> 1. Define the Linux ABI to be that, on FRED systems, SYSCALL preserves
>>>>>>> R11 and RCX on entry and exit. And update the test to actually test
>>>>>>> this.
>>>>>>>
>>>>>>> 2. Define the Linux ABI to be what it has been for quite a few years:
>>>>>>> SYSCALL entry copies RFLAGS to R11 and RIP to RCX and SYSCALL exit
>>>>>>> preserves all registers.
>>>>>>>
>>>>>>> I'm in favor of #2. People love making new programming languages and
>>>>>>> runtimes and inline asm and, these days, vibe coded crap. And it's
>>>>>>> *easier* to emit a SYSCALL and forget to tell the compiler / code
>>>>>>> generator that RCX and R11 are clobbered than it is to remember that
>>>>>>> they're clobbered. And it's easy to test on FRED (well, not really,
>>>>>>> but it hopefully will be some day) and it's easy to publish one's
>>>>>>> code, and then everyone is a bit screwed when the resulting program
>>>>>>> crashes sometimes on non-FRED systems. And it will be miserable to
>>>>>>> debug.
>>>>>>>
>>>>>>> (It's *really* *really* easy to screw this up in a way that sort of
>>>>>>> works even on non-FRED: RCX and R11 are usually clobbered across
>>>>>>> function calls, so one can get into a situation in which one's
>>>>>>> generated code usually doesn't require that SYSCALL preserve one of
>>>>>>> these registers until an inlining decision changes or some code gets
>>>>>>> reordered, and then it will start failing. And making the failure
>>>>>>> depend on hardware details is just nasty.
>>>>>>>
>>>>>>> So I think we should add the ~2 lines of code to fix the SYSCALL entry
>>>>>>> on FRED to match non-FRED.
>>>>>>
>>>>>> Yes; I'm afraid I have to concur. Preserving the clobber on entry for
>>>>>> FRED systems is by far the safest choice.
>>>>>>
>>>>>> Aside from this selftest, fancy debuggers and anything that can transfer
>>>>>> userspace state between machines might be 'surprised'.
>>>>>
>>>>> Thanks Andy and Peter.
>>>>>
>>>>> Indeed, making the selftest branch on FRED vs. non-FRED behavior
>>>>> is not a good practice. The selftest should validate ABI consistency.
>>>>>
>>>>> I agree with Andy's option #2, so this should be fixed in the FRED
>>>>> syscall entry implementation.
>>>>>
>>>>> Li Xin, does this direction look right to you? I can assit with
>>>>> validation and keep the selftest aligned with the agreed ABI.
>>>>>
>>>>
>>>> Yes, consistency should take precedence over hardware-specific variations.
>>>>
>>>> I would like to hear from Andrew Cooper and hpa before we do it.
>>>
>>> Per Andy’s suggestion, the change would be:
>>>
>>> diff --git a/arch/x86/entry/entry_fred.c b/arch/x86/entry/entry_fred.c
>>> index 88c757ac8ccd..a19898747a2c 100644
>>> --- a/arch/x86/entry/entry_fred.c
>>> +++ b/arch/x86/entry/entry_fred.c
>>> @@ -79,6 +79,9 @@ static __always_inline void fred_other(struct pt_regs *regs)
>>> {
>>> /* The compiler can fold these conditions into a single test */
>>> if (likely(regs->fred_ss.vector == FRED_SYSCALL && regs->fred_ss.l)) {
>>> + regs->cx = regs->ip;
>>> + regs->r11 = regs->flags;
>>> +
>>> regs->orig_ax = regs->ax;
>>> regs->ax = -ENOSYS;
>>> do_syscall_64(regs, regs->orig_ax);
>>>
>>> It adds 4 extra MOVs on this hot path, but I don’t see it's a problem here.
>>
>> We discussed this over a year ago, and at that point agreed that reserving the register was the desired behavior. Why has this changed now?
>
>Yes, that is technically cleaner.
>
>The question is, is the RCX/R11 clobbering behavior an established architectural contract, or is it an implementation detail that software ignores?
>
>I think Andy and Peter want to be on the safer side, which kind of assumes that this is established.
>
Clobbering is never an architectural contract; clobbering is always an option. However, I understand the concern that a developer who writes software on a FRED system which breaks on a legacy system.
Last time this came up, the policy we decided on was that a system that clobbers must do so in all cases (in order to not leak internal kernel state) but a system that can preserve (FRED or IDT-without-SYSCALL) may always do so.
I would prefer if we could defer this policy reversal for a bit. Since there is production hardware out now, I have been working on actually tuning the FRED code paths, and because the Linux kernel is so efficient, details matter in surprising ways.
I *particularly* dislike clobbering registers on the way *into* the kernel, though. That needlessly makes them unavailable to a debugger, and one of the benefits of FRED is improving debug visibility in some specific cases.
On Wed, Apr 1, 2026, at 10:54 AM, H. Peter Anvin wrote:
> On April 1, 2026 7:36:48 AM PDT, Xin Li <xin@zytor.com> wrote:
>>
>>Thanks!
>>Xin
>>
>>> On Mar 31, 2026, at 8:15 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>>
>>> On March 31, 2026 6:59:06 PM PDT, Xin Li <xin@zytor.com> wrote:
>>>>
>>>>
>>>>>> On Mar 30, 2026, at 11:03 PM, Xin Li <xin@zytor.com> wrote:
>>>>>
>>>>>
>>>>>>>>> The existing 'sysret_rip' selftest asserts that 'regs->r11 ==
>>>>>>>>> regs->flags'. This check relies on the behavior of the SYSCALL
>>>>>>>>> instruction on legacy x86_64, which saves 'RFLAGS' into 'R11'.
>>>>>>>>>
>>>>>>>>> However, on systems with FRED (Flexible Return and Event Delivery)
>>>>>>>>> enabled, instead of using registers, all state is saved onto the stack.
>>>>>>>>> Consequently, 'R11' retains its userspace value, causing the assertion
>>>>>>>>> to fail.
>>>>>>>>>
>>>>>>>>> Fix this by detecting if FRED is enabled and skipping the register
>>>>>>>>> assertion in that case. The detection is done by checking if the RPL
>>>>>>>>> bits of the GS selector are preserved after a hardware exception.
>>>>>>>>> IDT (via IRET) clears the RPL bits of NULL selectors, while FRED (via
>>>>>>>>> ERETU) preserves them.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I don't really like this. I think we have two credible choices:
>>>>>>>>
>>>>>>>> 1. Define the Linux ABI to be that, on FRED systems, SYSCALL preserves
>>>>>>>> R11 and RCX on entry and exit. And update the test to actually test
>>>>>>>> this.
>>>>>>>>
>>>>>>>> 2. Define the Linux ABI to be what it has been for quite a few years:
>>>>>>>> SYSCALL entry copies RFLAGS to R11 and RIP to RCX and SYSCALL exit
>>>>>>>> preserves all registers.
>>>>>>>>
>>>>>>>> I'm in favor of #2. People love making new programming languages and
>>>>>>>> runtimes and inline asm and, these days, vibe coded crap. And it's
>>>>>>>> *easier* to emit a SYSCALL and forget to tell the compiler / code
>>>>>>>> generator that RCX and R11 are clobbered than it is to remember that
>>>>>>>> they're clobbered. And it's easy to test on FRED (well, not really,
>>>>>>>> but it hopefully will be some day) and it's easy to publish one's
>>>>>>>> code, and then everyone is a bit screwed when the resulting program
>>>>>>>> crashes sometimes on non-FRED systems. And it will be miserable to
>>>>>>>> debug.
>>>>>>>>
>>>>>>>> (It's *really* *really* easy to screw this up in a way that sort of
>>>>>>>> works even on non-FRED: RCX and R11 are usually clobbered across
>>>>>>>> function calls, so one can get into a situation in which one's
>>>>>>>> generated code usually doesn't require that SYSCALL preserve one of
>>>>>>>> these registers until an inlining decision changes or some code gets
>>>>>>>> reordered, and then it will start failing. And making the failure
>>>>>>>> depend on hardware details is just nasty.
>>>>>>>>
>>>>>>>> So I think we should add the ~2 lines of code to fix the SYSCALL entry
>>>>>>>> on FRED to match non-FRED.
>>>>>>>
>>>>>>> Yes; I'm afraid I have to concur. Preserving the clobber on entry for
>>>>>>> FRED systems is by far the safest choice.
>>>>>>>
>>>>>>> Aside from this selftest, fancy debuggers and anything that can transfer
>>>>>>> userspace state between machines might be 'surprised'.
>>>>>>
>>>>>> Thanks Andy and Peter.
>>>>>>
>>>>>> Indeed, making the selftest branch on FRED vs. non-FRED behavior
>>>>>> is not a good practice. The selftest should validate ABI consistency.
>>>>>>
>>>>>> I agree with Andy's option #2, so this should be fixed in the FRED
>>>>>> syscall entry implementation.
>>>>>>
>>>>>> Li Xin, does this direction look right to you? I can assit with
>>>>>> validation and keep the selftest aligned with the agreed ABI.
>>>>>>
>>>>>
>>>>> Yes, consistency should take precedence over hardware-specific variations.
>>>>>
>>>>> I would like to hear from Andrew Cooper and hpa before we do it.
>>>>
>>>> Per Andy’s suggestion, the change would be:
>>>>
>>>> diff --git a/arch/x86/entry/entry_fred.c b/arch/x86/entry/entry_fred.c
>>>> index 88c757ac8ccd..a19898747a2c 100644
>>>> --- a/arch/x86/entry/entry_fred.c
>>>> +++ b/arch/x86/entry/entry_fred.c
>>>> @@ -79,6 +79,9 @@ static __always_inline void fred_other(struct pt_regs *regs)
>>>> {
>>>> /* The compiler can fold these conditions into a single test */
>>>> if (likely(regs->fred_ss.vector == FRED_SYSCALL && regs->fred_ss.l)) {
>>>> + regs->cx = regs->ip;
>>>> + regs->r11 = regs->flags;
>>>> +
>>>> regs->orig_ax = regs->ax;
>>>> regs->ax = -ENOSYS;
>>>> do_syscall_64(regs, regs->orig_ax);
>>>>
>>>> It adds 4 extra MOVs on this hot path, but I don’t see it's a problem here.
>>>
>>> We discussed this over a year ago, and at that point agreed that reserving the register was the desired behavior. Why has this changed now?
>>
>>Yes, that is technically cleaner.
>>
>>The question is, is the RCX/R11 clobbering behavior an established architectural contract, or is it an implementation detail that software ignores?
>>
>>I think Andy and Peter want to be on the safer side, which kind of assumes that this is established.
>>
>
> Clobbering is never an architectural contract; clobbering is always an
> option. However, I understand the concern that a developer who writes
> software on a FRED system which breaks on a legacy system.
>
> Last time this came up, the policy we decided on was that a system that
> clobbers must do so in all cases (in order to not leak internal kernel
> state) but a system that can preserve (FRED or IDT-without-SYSCALL) may
> always do so.
>
> I would prefer if we could defer this policy reversal for a bit. Since
> there is production hardware out now, I have been working on actually
> tuning the FRED code paths, and because the Linux kernel is so
> efficient, details matter in surprising ways.
>
> I *particularly* dislike clobbering registers on the way *into* the
> kernel, though. That needlessly makes them unavailable to a debugger,
> and one of the benefits of FRED is improving debug visibility in some
> specific cases.
I don't really agree. For quite a few years now, we've tried to make the exit path uniform, and we have this logic in syscall_64:
/* SYSRET requires RCX == RIP and R11 == EFLAGS */
if (unlikely(regs->cx != regs->ip || regs->r11 != regs->flags))
return false; <-- fall back to IRET
and this is not just an aesthetic thing -- it allows us to have deliver signals and implement things like sigreturn without needing to track extra flag bits that mean "well, actually, we're in the syscall *code* but we're not returning from a syscall any more". We had that a long time ago, and it was extremely difficult to understand and maintain.
So, on current kernels and kernels going back, I dunno, 10 years (I didn't try to dig out the git history, but I did write much of this code...), the semantics have been that we return to usermode in a state that matches pt_regs as precisely as we can arrange. For the one case where we have a very longstanding divergence between entry and exit regs, we have orig_ax.
So it would be at least a fairly large maintainability regression to make the non-FRED SYSCALL behavior modify rcx and/or r11 on exit.
Now we have FRED. Sure, it would be nice to remember the entry RCX and R11, but if we want to avoid the footgun where the effect of SYSCALL is different on FRED and non-FRED hardware, then we need the context after entry completes to have regs->rcx == regs->rip and regs->rcx == regs->flags (or perhaps RCX and R11 differently poisoned, but that seems a bit silly).
If we really want to have the option to fish the original rcx and r11 out from somewhere or perhaps to have extra-bonus-efficient many-parameter syscalls (I'm not sure why), then we could add orig_rcx and orig_r11. Or we could invent a time machine and fix SYSCALL when it first came out.
--Andy
© 2016 - 2026 Red Hat, Inc.