[PATCH 3/5] x86/vsyscall: Add vsyscall emulation for #GP

Sohil Mehta posted 5 patches 1 month, 1 week ago
[PATCH 3/5] x86/vsyscall: Add vsyscall emulation for #GP
Posted by Sohil Mehta 1 month, 1 week ago
The legacy vsyscall page is mapped at a fixed address in the kernel
address range 0xffffffffff600000-0xffffffffff601000. Prior to LASS, a
vsyscall page access from userspace would always generate a #PF. The
kernel emulates the execute (XONLY) accesses in the #PF handler and
returns the appropriate values to userspace.

With LASS, these accesses are intercepted before the paging structures
are traversed triggering a #GP instead of a #PF. The #GP doesn't provide
much information in terms of the error code.

However, as clarified in the SDM, the LASS violation only triggers after
an instruction fetch happens from the vsyscall address. So, the faulting
RIP, which is preserved in the user registers, can be used to determine
if the #GP was triggered due to a vsyscall access in XONLY mode.

Reuse the common emulation code during a #GP and emulate the vsyscall
access in XONLY mode without going through complex instruction decoding.
Note, this doesn't work for EMULATE mode which maps the vsyscall page as
readable.

Add an extra check in the common emulation code to ensure that the fault
really happened in 64-bit user mode. This is primarily a sanity check
with the #GP handler reusing the emulation code.

Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
---
 arch/x86/entry/vsyscall/vsyscall_64.c | 22 +++++++++++++++++-----
 arch/x86/include/asm/vsyscall.h       |  6 ++++++
 arch/x86/kernel/traps.c               |  4 ++++
 3 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index 5c6559c37c5b..b34c8763d5e9 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -23,7 +23,7 @@
  * soon be no new userspace code that will ever use a vsyscall.
  *
  * The code in this file emulates vsyscalls when notified of a page
- * fault to a vsyscall address.
+ * fault or a general protection fault to a vsyscall address.
  */
 
 #include <linux/kernel.h>
@@ -118,10 +118,9 @@ static bool __emulate_vsyscall(struct pt_regs *regs, unsigned long address)
 	long ret;
 	unsigned long orig_dx;
 
-	/*
-	 * No point in checking CS -- the only way to get here is a user mode
-	 * trap to a high address, which means that we're in 64-bit user code.
-	 */
+	/* Confirm that the fault happened in 64-bit user mode */
+	if (!user_64bit_mode(regs))
+		return false;
 
 	if (vsyscall_mode == NONE) {
 		warn_bad_vsyscall(KERN_INFO, regs,
@@ -282,6 +281,19 @@ bool emulate_vsyscall_pf(unsigned long error_code, struct pt_regs *regs,
 	return __emulate_vsyscall(regs, address);
 }
 
+bool emulate_vsyscall_gp(struct pt_regs *regs)
+{
+	/* Without LASS, vsyscall accesses are expected to generate a #PF */
+	if (!cpu_feature_enabled(X86_FEATURE_LASS))
+		return false;
+
+	/* Emulate only if the RIP points to the vsyscall address */
+	if (!is_vsyscall_vaddr(regs->ip))
+		return false;
+
+	return __emulate_vsyscall(regs, regs->ip);
+}
+
 /*
  * A pseudo VMA to allow ptrace access for the vsyscall page.  This only
  * covers the 64bit vsyscall page now. 32bit has a real VMA now and does
diff --git a/arch/x86/include/asm/vsyscall.h b/arch/x86/include/asm/vsyscall.h
index f34902364972..538053b1656a 100644
--- a/arch/x86/include/asm/vsyscall.h
+++ b/arch/x86/include/asm/vsyscall.h
@@ -15,6 +15,7 @@ extern void set_vsyscall_pgtable_user_bits(pgd_t *root);
  * Returns true if handled.
  */
 bool emulate_vsyscall_pf(unsigned long error_code, struct pt_regs *regs, unsigned long address);
+bool emulate_vsyscall_gp(struct pt_regs *regs);
 #else
 static inline void map_vsyscall(void) {}
 static inline bool emulate_vsyscall_pf(unsigned long error_code,
@@ -22,6 +23,11 @@ static inline bool emulate_vsyscall_pf(unsigned long error_code,
 {
 	return false;
 }
+
+static inline bool emulate_vsyscall_gp(struct pt_regs *regs)
+{
+	return false;
+}
 #endif
 
 /*
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index e21f8ad2f9d7..a896f9225434 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -70,6 +70,7 @@
 #include <asm/tdx.h>
 #include <asm/cfi.h>
 #include <asm/msr.h>
+#include <asm/vsyscall.h>
 
 #ifdef CONFIG_X86_64
 #include <asm/x86_init.h>
@@ -938,6 +939,9 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
 		if (fixup_umip_exception(regs))
 			goto exit;
 
+		if (emulate_vsyscall_gp(regs))
+			goto exit;
+
 		gp_user_force_sig_segv(regs, X86_TRAP_GP, error_code, desc);
 		goto exit;
 	}
-- 
2.43.0
Re: [PATCH 3/5] x86/vsyscall: Add vsyscall emulation for #GP
Posted by Dave Hansen 1 month ago
On 2/19/26 15:35, Sohil Mehta wrote:
> With LASS, these accesses are intercepted before the paging structures
> are traversed triggering a #GP instead of a #PF. The #GP doesn't provide
> much information in terms of the error code.

It's the error code and the address from CR2, right?

> However, as clarified in the SDM, the LASS violation only triggers after
> an instruction fetch happens from the vsyscall address. So, the faulting
> RIP, which is preserved in the user registers, can be used to determine
> if the #GP was triggered due to a vsyscall access in XONLY mode.
> 
> Reuse the common emulation code during a #GP and emulate the vsyscall
> access in XONLY mode without going through complex instruction decoding.
> Note, this doesn't work for EMULATE mode which maps the vsyscall page as
> readable.
> 
> Add an extra check in the common emulation code to ensure that the fault
> really happened in 64-bit user mode. This is primarily a sanity check
> with the #GP handler reusing the emulation code.

This part of the changelog loses me.

I _think_ this is trying to make the point that "emulate" mode is hard
with LASS. It's hard because it needs to be able to tell the difference
between a read of the vsyscall page and an instruction fetch from the
vsyscall page.

But, the "xonly" mode is far easier because reads are simply disallowed.
Any time userspace has an RIP pointing to the vsyscall page (with LASS
enabled), it's assumed to be a vsyscall. Any normal memory reads of the
vsyscall page get normal #GP handling.

Is that right?

BTW, reading it back now, I think the subject is really unfortunate. It
would be quite easy to read it and infer that this "adds
vsyscall=emulate for #GP".

It should probably be:

	x86/vsyscall: Restore vsyscall=xonly mode under LASS

Maybe this structure would help, based around explaining the three
vsyscall= modes:

	The vsyscall page is in the high/kernel part of the address
	space. LASS prevents access to this page from userspace. The
	kernel currently forces vsyscall=none mode with LASS.

	Support vsyscall=xonly mode with LASS. <include more content
	here>

	Keep vsyscall=emulate mutually exclusive with LASS. It is hard
	to support because the #GP handler (unlike #PF) doesn't have
	PFEC or CR2 to give information about the fault.
Re: [PATCH 3/5] x86/vsyscall: Add vsyscall emulation for #GP
Posted by Sohil Mehta 1 month ago
On 3/3/2026 7:51 AM, Dave Hansen wrote:
> On 2/19/26 15:35, Sohil Mehta wrote:
>> With LASS, these accesses are intercepted before the paging structures
>> are traversed triggering a #GP instead of a #PF. The #GP doesn't provide
>> much information in terms of the error code.
> 
> It's the error code and the address from CR2, right?
> 

Yes.

> I _think_ this is trying to make the point that "emulate" mode is hard
> with LASS. It's hard because it needs to be able to tell the difference
> between a read of the vsyscall page and an instruction fetch from the
> vsyscall page.
> 
> But, the "xonly" mode is far easier because reads are simply disallowed.
> Any time userspace has an RIP pointing to the vsyscall page (with LASS
> enabled), it's assumed to be a vsyscall. Any normal memory reads of the
> vsyscall page get normal #GP handling.
> 
> Is that right?

That is correct.

> 
> BTW, reading it back now, I think the subject is really unfortunate. It
> would be quite easy to read it and infer that this "adds
> vsyscall=emulate for #GP".
> 
> It should probably be:
> 
> 	x86/vsyscall: Restore vsyscall=xonly mode under LASS
> 

Yeah, I realize now that the "vsyscall emulation" support and EMULATE
mode of the support can easily cause confusion. Will use your proposed
title.

> Maybe this structure would help, based around explaining the three
> vsyscall= modes:
> 

Sure, combining things from the cover letter and what you suggested
here. How about?


"The vsyscall page is located in the high/kernel part of the address
space. LASS prevents access to this page from userspace. The current
kernel only enables LASS when all vsyscall modes are disabled.

Now add support for LASS when vsyscall=xonly (default) is configured.
With LASS, vsyscall page accesses trigger a #GP instead of a #PF. In
XONLY (execute-only) mode, directly reading the vsyscall page is
disallowed. So, the faulting RIP can be easily used to determine if the
#GP was triggered due to a vsyscall access.

Reuse the #PF emulation code during a #GP and emulate the vsyscall
access in XONLY mode. As multiple fault handlers are now using the
emulation code, add a sanity check to ensure that the fault truly
happened in 64-bit user mode.

In contrast, when vsyscall=emulate (deprecated) is configured, it maps
the vsyscall page as readable. Supporting EMULATE mode with LASS is much
harder because the #GP doesn't provide enough error information (such as
PFEC and CR2 in case of #PF). So, complex instruction decoding would be
required in the #GP handler which isn't worth the effort. LASS and
vsyscall=emulate will be kept mutually exclusive for simplicity."
Re: [PATCH 3/5] x86/vsyscall: Add vsyscall emulation for #GP
Posted by H. Peter Anvin 1 month ago
On 2026-03-03 13:20, Sohil Mehta wrote:
> 
> Sure, combining things from the cover letter and what you suggested
> here. How about?
>

[...]

> 
> "The vsyscall page is located in the high/kernel part of the address
> space. LASS prevents access to this page from userspace. The current
> kernel only enables LASS when all vsyscall modes are disabled.

Suggest making an introductory paragraph here with the background information,
instead of mixing it into the rest of the text in a somewhat incoherent manner:

"vsyscall emulation can be execute-only (XONLY) or read-execute (EMULATE),
specified by the vsyscall= kernel command line option. XONLY mode is the
default. The EMULATE mode has been deprecated since 2022 and is considered
insecure.

This patch adds support for LASS with XONLY vsyscall emulation.

> With LASS, vsyscall page accesses trigger a #GP instead of a #PF. In
> XONLY (execute-only) mode, directly reading the vsyscall page is
> disallowed. So, the faulting RIP can be easily used to determine if the
> #GP was triggered due to a vsyscall access.

How about:

"With LASS, vsyscall page accesses trigger a #GP instead of a #PF. For XONLY
mode, all that is needed is the faulting RIP, which is trivially available
regardless of the type of fault."

> Reuse the #PF emulation code during a #GP and emulate the vsyscall
> access in XONLY mode. As multiple fault handlers are now using the
> emulation code, add a sanity check to ensure that the fault truly
> happened in 64-bit user mode.
> 
> In contrast, when vsyscall=emulate (deprecated) is configured, it maps
> the vsyscall page as readable. Supporting EMULATE mode with LASS is much
> harder because the #GP doesn't provide enough error information (such as
> PFEC and CR2 in case of #PF). So, complex instruction decoding would be
> required in the #GP handler which isn't worth the effort.

"... as remaining users of EMULATE mode can be reasonably assumed to be niche
users, who are already trading off security for compatibility."

Use "EMULATE mode" consistently here. Captializing it makes it clear that it
is a term and not just a prose word.

> LASS and
> vsyscall=emulate will be kept mutually exclusive for simplicity."

	-hpa
Re: [PATCH 3/5] x86/vsyscall: Add vsyscall emulation for #GP
Posted by Sohil Mehta 4 weeks, 1 day ago
On 3/3/2026 2:35 PM, H. Peter Anvin wrote:

> Suggest making an introductory paragraph here with the background information,
> instead of mixing it into the rest of the text in a somewhat incoherent manner:
> 

I rewrote the whole thing based on your and Dave's input. I added
sections because it was getting a bit wordy.

x86/vsyscall: Restore vsyscall=xonly mode under LASS

Background
----------
The vsyscall page is located in the high/kernel part of the address
space. Prior to LASS, a vsyscall page access from userspace would always
generate a #PF. The kernel emulates the accesses in the #PF handler and
returns the appropriate values to userspace.

Vsyscall emulation has two modes of operation, specified by the
vsyscall={xonly, emulate} kernel command line option. The vsyscall page
is marked as execute-only in XONLY mode or read-execute in EMULATE mode.
XONLY mode is the default and the only one expected to be commonly used.
The EMULATE mode has been deprecated since 2022 and is considered
insecure.

With LASS, a vsyscall page access triggers a #GP instead of a #PF.
Currently, LASS is only enabled when all vsyscall modes are disabled.

LASS with XONLY mode
--------------------
Now add support for LASS specifically with XONLY vsyscall emulation. For
XONLY mode, all that is needed is the faulting RIP, which is trivially
available regardless of the type of fault. Reuse the #PF emulation code
during the #GP when the fault address points to the vsyscall page.

As multiple fault handlers will now be using the emulation code, add a
sanity check to ensure that the fault truly happened in 64-bit user
mode.

LASS with EMULATE mode
----------------------
Supporting vsyscall=emulate with LASS is much harder because the #GP
doesn't provide enough error information (such as PFEC and CR2 as in
case of a #PF). So, complex instruction decoding would be required to
emulate this mode in the #GP handler.

This isn't worth the effort as remaining users of EMULATE mode can be
reasonably assumed to be niche users, who are already trading off
security for compatibility. LASS and vsyscall=emulate will be kept
mutually exclusive for simplicity.
Re: [PATCH 3/5] x86/vsyscall: Add vsyscall emulation for #GP
Posted by H. Peter Anvin 4 weeks, 1 day ago
On March 4, 2026 4:10:22 PM PST, Sohil Mehta <sohil.mehta@intel.com> wrote:
>On 3/3/2026 2:35 PM, H. Peter Anvin wrote:
>
>> Suggest making an introductory paragraph here with the background information,
>> instead of mixing it into the rest of the text in a somewhat incoherent manner:
>> 
>
>I rewrote the whole thing based on your and Dave's input. I added
>sections because it was getting a bit wordy.
>
>x86/vsyscall: Restore vsyscall=xonly mode under LASS
>
>Background
>----------
>The vsyscall page is located in the high/kernel part of the address
>space. Prior to LASS, a vsyscall page access from userspace would always
>generate a #PF. The kernel emulates the accesses in the #PF handler and
>returns the appropriate values to userspace.
>
>Vsyscall emulation has two modes of operation, specified by the
>vsyscall={xonly, emulate} kernel command line option. The vsyscall page
>is marked as execute-only in XONLY mode or read-execute in EMULATE mode.
>XONLY mode is the default and the only one expected to be commonly used.
>The EMULATE mode has been deprecated since 2022 and is considered
>insecure.
>
>With LASS, a vsyscall page access triggers a #GP instead of a #PF.
>Currently, LASS is only enabled when all vsyscall modes are disabled.
>
>LASS with XONLY mode
>--------------------
>Now add support for LASS specifically with XONLY vsyscall emulation. For
>XONLY mode, all that is needed is the faulting RIP, which is trivially
>available regardless of the type of fault. Reuse the #PF emulation code
>during the #GP when the fault address points to the vsyscall page.
>
>As multiple fault handlers will now be using the emulation code, add a
>sanity check to ensure that the fault truly happened in 64-bit user
>mode.
>
>LASS with EMULATE mode
>----------------------
>Supporting vsyscall=emulate with LASS is much harder because the #GP
>doesn't provide enough error information (such as PFEC and CR2 as in
>case of a #PF). So, complex instruction decoding would be required to
>emulate this mode in the #GP handler.
>
>This isn't worth the effort as remaining users of EMULATE mode can be
>reasonably assumed to be niche users, who are already trading off
>security for compatibility. LASS and vsyscall=emulate will be kept
>mutually exclusive for simplicity.

Other than David's comment this looks great to me too.
Re: [PATCH 3/5] x86/vsyscall: Add vsyscall emulation for #GP
Posted by Dave Hansen 4 weeks, 1 day ago
On 3/4/26 16:10, Sohil Mehta wrote:
> Vsyscall emulation has two modes of operation, specified by the
> vsyscall={xonly, emulate} kernel command line option. The vsyscall page
> is marked as execute-only in XONLY mode or read-execute in EMULATE mode.

Is it really "marked as execute only"? We don't have a real execute-only
paging permission on x86, the closest we've got is memory marked with a
pkey that's got the AccessDisable bit set.

I think it's _called_ execute-only because the kernel makes it behave
like execute-only memory when it's handling the page fault. But I dobn't
think it is super accurate to say it is "marked" as execute-only.

The rest of it looks great to me, though.
Re: [PATCH 3/5] x86/vsyscall: Add vsyscall emulation for #GP
Posted by H. Peter Anvin 4 weeks, 1 day ago
On March 4, 2026 5:45:48 PM PST, Dave Hansen <dave.hansen@intel.com> wrote:
>On 3/4/26 16:10, Sohil Mehta wrote:
>> Vsyscall emulation has two modes of operation, specified by the
>> vsyscall={xonly, emulate} kernel command line option. The vsyscall page
>> is marked as execute-only in XONLY mode or read-execute in EMULATE mode.
>
>Is it really "marked as execute only"? We don't have a real execute-only
>paging permission on x86, the closest we've got is memory marked with a
>pkey that's got the AccessDisable bit set.
>
>I think it's _called_ execute-only because the kernel makes it behave
>like execute-only memory when it's handling the page fault. But I dobn't
>think it is super accurate to say it is "marked" as execute-only.
>
>The rest of it looks great to me, though.

It's not marked as anything; it is in fact not present.
Re: [PATCH 3/5] x86/vsyscall: Add vsyscall emulation for #GP
Posted by Sohil Mehta 4 weeks, 1 day ago
On 3/4/2026 10:31 PM, H. Peter Anvin wrote:
> On March 4, 2026 5:45:48 PM PST, Dave Hansen <dave.hansen@intel.com> wrote:
>> On 3/4/26 16:10, Sohil Mehta wrote:
>>> Vsyscall emulation has two modes of operation, specified by the
>>> vsyscall={xonly, emulate} kernel command line option. The vsyscall page
>>> is marked as execute-only in XONLY mode or read-execute in EMULATE mode.
>>
>> Is it really "marked as execute only"? We don't have a real execute-only
>> paging permission on x86, the closest we've got is memory marked with a
>> pkey that's got the AccessDisable bit set.
>>
>> I think it's _called_ execute-only because the kernel makes it behave
>> like execute-only memory when it's handling the page fault. But I dobn't
>> think it is super accurate to say it is "marked" as execute-only.
>>

Sorry about the wording. I should have looked at map_vsyscall() more
carefully. As Peter said the page is not even present in XONLY mode and
in EMULATE mode the PTE is marked as _USR but __NX.

How about using "behaves" instead of marked? Or I could use "emulated"
if you prefer that.

So the above paragraph would be:

Vsyscall emulation has two modes of operation, specified by the
vsyscall={xonly, emulate} kernel command line option. The vsyscall page
behaves as execute-only in XONLY mode and as read-execute in EMULATE mode.