For patching, the kernel initializes a temporary mm area in the lower
half of the address range. LASS blocks these accesses because its
enforcement relies on bit 63 of the virtual address as opposed to SMAP
which depends on the _PAGE_BIT_USER bit in the page table. Disable LASS
enforcement by toggling the RFLAGS.AC bit during patching to avoid
triggering a #GP fault.
Introduce LASS-specific STAC/CLAC helpers to set the AC bit only on
platforms that need it. Clarify the usage of the new helpers versus the
existing stac()/clac() helpers for SMAP.
The Text poking functions use standard memcpy()/memset() while patching
kernel code. However, objtool complains about calling such dynamic
functions within an AC=1 region. See warning #9, regarding function
calls with UACCESS enabled, in tools/objtool/Documentation/objtool.txt.
To pacify objtool, one option is to add memcpy() and memset() to the
list of allowed-functions. However, that would provide a blanket
exemption for all usages of memcpy() and memset(). Instead, replace the
standard calls in the text poking functions with their unoptimized,
always-inlined versions. Considering that patching is usually small,
there is no performance impact expected.
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
---
v11:
- Use lass_enable()/lass_disable() naming.
- Improve commit log and code comments.
v10:
- Revert to the inline functions instead of open-coding in assembly.
- Simplify code comments.
---
arch/x86/include/asm/smap.h | 41 +++++++++++++++++++++++++++++++++--
arch/x86/kernel/alternative.c | 18 +++++++++++++--
2 files changed, 55 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
index 4f84d421d1cf..90f178c78f9c 100644
--- a/arch/x86/include/asm/smap.h
+++ b/arch/x86/include/asm/smap.h
@@ -23,18 +23,55 @@
#else /* __ASSEMBLER__ */
+/*
+ * The CLAC/STAC instructions toggle the enforcement of
+ * X86_FEATURE_SMAP along with X86_FEATURE_LASS.
+ *
+ * SMAP enforcement is based on the _PAGE_BIT_USER bit in the page
+ * tables. The kernel is not allowed to touch pages with that bit set
+ * unless the AC bit is set.
+ *
+ * Use stac()/clac() when accessing userspace (_PAGE_USER) mappings,
+ * regardless of location.
+ *
+ * Note: a barrier is implicit in alternative().
+ */
+
static __always_inline void clac(void)
{
- /* Note: a barrier is implicit in alternative() */
alternative("", "clac", X86_FEATURE_SMAP);
}
static __always_inline void stac(void)
{
- /* Note: a barrier is implicit in alternative() */
alternative("", "stac", X86_FEATURE_SMAP);
}
+/*
+ * LASS enforcement is based on bit 63 of the virtual address. The
+ * kernel is not allowed to touch memory in the lower half of the
+ * virtual address space.
+ *
+ * Use lass_disable()/lass_enable() to toggle the AC bit for kernel data
+ * accesses (!_PAGE_USER) that are blocked by LASS, but not by SMAP.
+ *
+ * Even with the AC bit set, LASS will continue to block instruction
+ * fetches from the user half of the address space. To allow those,
+ * clear CR4.LASS to disable the LASS mechanism entirely.
+ *
+ * Note: a barrier is implicit in alternative().
+ */
+
+static __always_inline void lass_enable(void)
+{
+ alternative("", "clac", X86_FEATURE_LASS);
+}
+
+static __always_inline void lass_disable(void)
+{
+ alternative("", "stac", X86_FEATURE_LASS);
+}
+
static __always_inline unsigned long smap_save(void)
{
unsigned long flags;
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 8ee5ff547357..b38dbf08d5cd 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -2469,16 +2469,30 @@ void __init_or_module text_poke_early(void *addr, const void *opcode,
__ro_after_init struct mm_struct *text_poke_mm;
__ro_after_init unsigned long text_poke_mm_addr;
+/*
+ * Text poking creates and uses a mapping in the lower half of the
+ * address space. Relax LASS enforcement when accessing the poking
+ * address.
+ *
+ * objtool enforces a strict policy of "no function calls within AC=1
+ * regions". Adhere to the policy by using inline versions of
+ * memcpy()/memset() that will never result in a function call.
+ */
+
static void text_poke_memcpy(void *dst, const void *src, size_t len)
{
- memcpy(dst, src, len);
+ lass_disable();
+ __inline_memcpy(dst, src, len);
+ lass_enable();
}
static void text_poke_memset(void *dst, const void *src, size_t len)
{
int c = *(const int *)src;
- memset(dst, c, len);
+ lass_disable();
+ __inline_memset(dst, c, len);
+ lass_enable();
}
typedef void text_poke_f(void *dst, const void *src, size_t len);
--
2.43.0
Hi Boris,
On 10/29/2025 2:03 PM, Sohil Mehta wrote:
> +/*
> + * LASS enforcement is based on bit 63 of the virtual address. The
> + * kernel is not allowed to touch memory in the lower half of the
> + * virtual address space.
> + *
> + * Use lass_disable()/lass_enable() to toggle the AC bit for kernel data
> + * accesses (!_PAGE_USER) that are blocked by LASS, but not by SMAP.
> + *
> + * Even with the AC bit set, LASS will continue to block instruction
> + * fetches from the user half of the address space. To allow those,
> + * clear CR4.LASS to disable the LASS mechanism entirely.
> + *
Based on the EFI discussion, it looks like we would now need to toggle
CR4.LASS every time we switch to efi_mm. The lass_enable()/_disable()
naming would be more suitable for those wrappers.
I am thinking of reverting this back to lass_clac()/lass_stac().
lass_clac()/_stac():
Disable enforcement for kernel data accesses similar to SMAP.
lass_enable()/_disable():
Disable the entire LASS mechanism (Data and instruction fetch)
by toggling CR4.LASS
Would that work? Any other suggestions?
> +
> +static __always_inline void lass_enable(void)
> +{
> + alternative("", "clac", X86_FEATURE_LASS);
> +}
> +
> +static __always_inline void lass_disable(void)
> +{
> + alternative("", "stac", X86_FEATURE_LASS);
> +}
> +
On Mon, 10 Nov 2025 at 19:15, Sohil Mehta <sohil.mehta@intel.com> wrote: > > Hi Boris, > > On 10/29/2025 2:03 PM, Sohil Mehta wrote: > > +/* > > + * LASS enforcement is based on bit 63 of the virtual address. The > > + * kernel is not allowed to touch memory in the lower half of the > > + * virtual address space. > > + * > > + * Use lass_disable()/lass_enable() to toggle the AC bit for kernel data > > + * accesses (!_PAGE_USER) that are blocked by LASS, but not by SMAP. > > + * > > + * Even with the AC bit set, LASS will continue to block instruction > > + * fetches from the user half of the address space. To allow those, > > + * clear CR4.LASS to disable the LASS mechanism entirely. > > + * > > Based on the EFI discussion, Which discussion is that? > it looks like we would now need to toggle > CR4.LASS every time we switch to efi_mm. The lass_enable()/_disable() > naming would be more suitable for those wrappers. > Note that Linux/x86 uses SetVirtualAddressMap() to remap all EFI runtime regions into the upper [kernel] half of the address space. SetVirtualAddressMap() itself is a terrible idea, but given that we are already stuck with it, we should be able to rely on ordinary EFI runtime calls to only execute from the upper address range. The only exception is the call to SetVirtualAddressMap() itself, which occurs only once during early boot.
On 11/12/25 05:56, Ard Biesheuvel wrote: ... >> it looks like we would now need to toggle >> CR4.LASS every time we switch to efi_mm. The lass_enable()/_disable() >> naming would be more suitable for those wrappers. >> > Note that Linux/x86 uses SetVirtualAddressMap() to remap all EFI > runtime regions into the upper [kernel] half of the address space. > > SetVirtualAddressMap() itself is a terrible idea, but given that we > are already stuck with it, we should be able to rely on ordinary EFI > runtime calls to only execute from the upper address range. The only > exception is the call to SetVirtualAddressMap() itself, which occurs > only once during early boot. Gah, I had it in my head that we needed to use the lower mapping at runtime. The efi_mm gets used for that SetVirtualAddressMap() and the efi_mm continues to get used at runtime. So I think I just assumed that the lower mappings needed to get used too. Thanks for the education! Let's say we simply delayed CR4.LASS=1 until later in boot. Could we completely ignore LASS during EFI calls, since the calls only use the upper address range? Also, in practice, are there buggy EFI implementations that use the lower address range even though they're not supposed to? *If* we just keep LASS on for these calls is there a chance it will cause a regression in some buggy EFI implementations?
On November 12, 2025 6:51:45 AM PST, Dave Hansen <dave.hansen@intel.com> wrote: >On 11/12/25 05:56, Ard Biesheuvel wrote: >... >>> it looks like we would now need to toggle >>> CR4.LASS every time we switch to efi_mm. The lass_enable()/_disable() >>> naming would be more suitable for those wrappers. >>> >> Note that Linux/x86 uses SetVirtualAddressMap() to remap all EFI >> runtime regions into the upper [kernel] half of the address space. >> >> SetVirtualAddressMap() itself is a terrible idea, but given that we >> are already stuck with it, we should be able to rely on ordinary EFI >> runtime calls to only execute from the upper address range. The only >> exception is the call to SetVirtualAddressMap() itself, which occurs >> only once during early boot. > >Gah, I had it in my head that we needed to use the lower mapping at >runtime. The efi_mm gets used for that SetVirtualAddressMap() and the >efi_mm continues to get used at runtime. So I think I just assumed that >the lower mappings needed to get used too. > >Thanks for the education! > >Let's say we simply delayed CR4.LASS=1 until later in boot. Could we >completely ignore LASS during EFI calls, since the calls only use the >upper address range? > >Also, in practice, are there buggy EFI implementations that use the >lower address range even though they're not supposed to? *If* we just >keep LASS on for these calls is there a chance it will cause a >regression in some buggy EFI implementations? Yes, they are. And there are buggy ones which die if set up with virtual addresses in the low half.
On Wed, 12 Nov 2025 at 15:58, H. Peter Anvin <hpa@zytor.com> wrote: > > On November 12, 2025 6:51:45 AM PST, Dave Hansen <dave.hansen@intel.com> wrote: > >On 11/12/25 05:56, Ard Biesheuvel wrote: > >... > >>> it looks like we would now need to toggle > >>> CR4.LASS every time we switch to efi_mm. The lass_enable()/_disable() > >>> naming would be more suitable for those wrappers. > >>> > >> Note that Linux/x86 uses SetVirtualAddressMap() to remap all EFI > >> runtime regions into the upper [kernel] half of the address space. > >> > >> SetVirtualAddressMap() itself is a terrible idea, but given that we > >> are already stuck with it, we should be able to rely on ordinary EFI > >> runtime calls to only execute from the upper address range. The only > >> exception is the call to SetVirtualAddressMap() itself, which occurs > >> only once during early boot. > > > >Gah, I had it in my head that we needed to use the lower mapping at > >runtime. The efi_mm gets used for that SetVirtualAddressMap() and the > >efi_mm continues to get used at runtime. So I think I just assumed that > >the lower mappings needed to get used too. > > > >Thanks for the education! > > > >Let's say we simply delayed CR4.LASS=1 until later in boot. Could we > >completely ignore LASS during EFI calls, since the calls only use the > >upper address range? > > > >Also, in practice, are there buggy EFI implementations that use the > >lower address range even though they're not supposed to? *If* we just > >keep LASS on for these calls is there a chance it will cause a > >regression in some buggy EFI implementations? > > Yes, they are. And there are buggy ones which die if set up with virtual addresses in the low half. To elaborate on that, there are systems where a) not calling SetVirtualAddressMap() crashes the firmware, because, in spite of being clearly documented as optional, not calling it results in some event hook not being called, causing the firmware to misbehave b) calling SetVirtualAddressMap() with an 1:1 mapping crashes the firmware (and so this is not a possible workaround for a)) c) calling SetVirtualAddressMap() crashes the firmware when not both the old 1:1 and the new kernel mapping are already live (which violates the UEFI spec) d) calling SetVirtualAddressMap() does not result in all 1:1 references being converted to the new mapping. To address d), the x86_64 implementation of efi_map_region() indeed maps an 1:1 alias of each remapped runtime regions, so that stray accesses don't fault. But the code addresses are all remapped, and so the firmware routines are always invoked via their remapped aliases in the kernel VA space. Not calling SetVirtualAddressMap() at all, or calling it with a 1:1 mapping is not feasible, essentially because Windows doesn't do that, and that is the only thing that is tested on all x86 PCs by the respective OEMs. Given that remapping the code is dealt with by the firmware's PE/COFF loader, whereas remapping [dynamically allocated] data requires effort on the part of the programmer, I'd hazard a guess that 99.9% of those bugs do not involve attempts to execute via the lower mapping, but stray references to data objects that were not remapped properly. So we might consider a) remapping those 1:1 aliases NX, so we don't have those patches of RWX memory around b) keeping LASS enabled during ordinary EFI runtime calls, as you suggest.
On November 12, 2025 7:18:33 AM PST, Ard Biesheuvel <ardb@kernel.org> wrote: >On Wed, 12 Nov 2025 at 15:58, H. Peter Anvin <hpa@zytor.com> wrote: >> >> On November 12, 2025 6:51:45 AM PST, Dave Hansen <dave.hansen@intel.com> wrote: >> >On 11/12/25 05:56, Ard Biesheuvel wrote: >> >... >> >>> it looks like we would now need to toggle >> >>> CR4.LASS every time we switch to efi_mm. The lass_enable()/_disable() >> >>> naming would be more suitable for those wrappers. >> >>> >> >> Note that Linux/x86 uses SetVirtualAddressMap() to remap all EFI >> >> runtime regions into the upper [kernel] half of the address space. >> >> >> >> SetVirtualAddressMap() itself is a terrible idea, but given that we >> >> are already stuck with it, we should be able to rely on ordinary EFI >> >> runtime calls to only execute from the upper address range. The only >> >> exception is the call to SetVirtualAddressMap() itself, which occurs >> >> only once during early boot. >> > >> >Gah, I had it in my head that we needed to use the lower mapping at >> >runtime. The efi_mm gets used for that SetVirtualAddressMap() and the >> >efi_mm continues to get used at runtime. So I think I just assumed that >> >the lower mappings needed to get used too. >> > >> >Thanks for the education! >> > >> >Let's say we simply delayed CR4.LASS=1 until later in boot. Could we >> >completely ignore LASS during EFI calls, since the calls only use the >> >upper address range? >> > >> >Also, in practice, are there buggy EFI implementations that use the >> >lower address range even though they're not supposed to? *If* we just >> >keep LASS on for these calls is there a chance it will cause a >> >regression in some buggy EFI implementations? >> >> Yes, they are. And there are buggy ones which die if set up with virtual addresses in the low half. > >To elaborate on that, there are systems where > >a) not calling SetVirtualAddressMap() crashes the firmware, because, >in spite of being clearly documented as optional, not calling it >results in some event hook not being called, causing the firmware to >misbehave > >b) calling SetVirtualAddressMap() with an 1:1 mapping crashes the >firmware (and so this is not a possible workaround for a)) > >c) calling SetVirtualAddressMap() crashes the firmware when not both >the old 1:1 and the new kernel mapping are already live (which >violates the UEFI spec) > >d) calling SetVirtualAddressMap() does not result in all 1:1 >references being converted to the new mapping. > > >To address d), the x86_64 implementation of efi_map_region() indeed >maps an 1:1 alias of each remapped runtime regions, so that stray >accesses don't fault. But the code addresses are all remapped, and so >the firmware routines are always invoked via their remapped aliases in >the kernel VA space. Not calling SetVirtualAddressMap() at all, or >calling it with a 1:1 mapping is not feasible, essentially because >Windows doesn't do that, and that is the only thing that is tested on >all x86 PCs by the respective OEMs. > >Given that remapping the code is dealt with by the firmware's PE/COFF >loader, whereas remapping [dynamically allocated] data requires effort >on the part of the programmer, I'd hazard a guess that 99.9% of those >bugs do not involve attempts to execute via the lower mapping, but >stray references to data objects that were not remapped properly. > >So we might consider >a) remapping those 1:1 aliases NX, so we don't have those patches of >RWX memory around >b) keeping LASS enabled during ordinary EFI runtime calls, as you suggest. Unless someone has a code pointer in their code.
On Wed, 12 Nov 2025 at 16:23, H. Peter Anvin <hpa@zytor.com> wrote: > > On November 12, 2025 7:18:33 AM PST, Ard Biesheuvel <ardb@kernel.org> wrote: > >On Wed, 12 Nov 2025 at 15:58, H. Peter Anvin <hpa@zytor.com> wrote: > >> > >> On November 12, 2025 6:51:45 AM PST, Dave Hansen <dave.hansen@intel.com> wrote: > >> >On 11/12/25 05:56, Ard Biesheuvel wrote: > >> >... > >> >>> it looks like we would now need to toggle > >> >>> CR4.LASS every time we switch to efi_mm. The lass_enable()/_disable() > >> >>> naming would be more suitable for those wrappers. > >> >>> > >> >> Note that Linux/x86 uses SetVirtualAddressMap() to remap all EFI > >> >> runtime regions into the upper [kernel] half of the address space. > >> >> > >> >> SetVirtualAddressMap() itself is a terrible idea, but given that we > >> >> are already stuck with it, we should be able to rely on ordinary EFI > >> >> runtime calls to only execute from the upper address range. The only > >> >> exception is the call to SetVirtualAddressMap() itself, which occurs > >> >> only once during early boot. > >> > > >> >Gah, I had it in my head that we needed to use the lower mapping at > >> >runtime. The efi_mm gets used for that SetVirtualAddressMap() and the > >> >efi_mm continues to get used at runtime. So I think I just assumed that > >> >the lower mappings needed to get used too. > >> > > >> >Thanks for the education! > >> > > >> >Let's say we simply delayed CR4.LASS=1 until later in boot. Could we > >> >completely ignore LASS during EFI calls, since the calls only use the > >> >upper address range? > >> > > >> >Also, in practice, are there buggy EFI implementations that use the > >> >lower address range even though they're not supposed to? *If* we just > >> >keep LASS on for these calls is there a chance it will cause a > >> >regression in some buggy EFI implementations? > >> > >> Yes, they are. And there are buggy ones which die if set up with virtual addresses in the low half. > > > >To elaborate on that, there are systems where > > > >a) not calling SetVirtualAddressMap() crashes the firmware, because, > >in spite of being clearly documented as optional, not calling it > >results in some event hook not being called, causing the firmware to > >misbehave > > > >b) calling SetVirtualAddressMap() with an 1:1 mapping crashes the > >firmware (and so this is not a possible workaround for a)) > > > >c) calling SetVirtualAddressMap() crashes the firmware when not both > >the old 1:1 and the new kernel mapping are already live (which > >violates the UEFI spec) > > > >d) calling SetVirtualAddressMap() does not result in all 1:1 > >references being converted to the new mapping. > > > > > >To address d), the x86_64 implementation of efi_map_region() indeed > >maps an 1:1 alias of each remapped runtime regions, so that stray > >accesses don't fault. But the code addresses are all remapped, and so > >the firmware routines are always invoked via their remapped aliases in > >the kernel VA space. Not calling SetVirtualAddressMap() at all, or > >calling it with a 1:1 mapping is not feasible, essentially because > >Windows doesn't do that, and that is the only thing that is tested on > >all x86 PCs by the respective OEMs. > > > >Given that remapping the code is dealt with by the firmware's PE/COFF > >loader, whereas remapping [dynamically allocated] data requires effort > >on the part of the programmer, I'd hazard a guess that 99.9% of those > >bugs do not involve attempts to execute via the lower mapping, but > >stray references to data objects that were not remapped properly. > > > >So we might consider > >a) remapping those 1:1 aliases NX, so we don't have those patches of > >RWX memory around > >b) keeping LASS enabled during ordinary EFI runtime calls, as you suggest. > > Unless someone has a code pointer in their code. That is a good point, especially because the EFI universe is constructed out of GUIDs and so-called protocols, which are just structs with function pointers. However, EFI protocols are only supported at boot time, and the runtime execution context is much more restricted. So I'd still expect the code pointer case to be much less likely.
On 11/12/2025 7:28 AM, Ard Biesheuvel wrote: >>> d) calling SetVirtualAddressMap() does not result in all 1:1 >>> references being converted to the new mapping. >>> >>> >>> To address d), the x86_64 implementation of efi_map_region() indeed >>> maps an 1:1 alias of each remapped runtime regions, so that stray >>> accesses don't fault. But the code addresses are all remapped, and so >>> the firmware routines are always invoked via their remapped aliases in >>> the kernel VA space. Not calling SetVirtualAddressMap() at all, or >>> calling it with a 1:1 mapping is not feasible, essentially because >>> Windows doesn't do that, and that is the only thing that is tested on >>> all x86 PCs by the respective OEMs. >>> >>> Given that remapping the code is dealt with by the firmware's PE/COFF >>> loader, whereas remapping [dynamically allocated] data requires effort >>> on the part of the programmer, I'd hazard a guess that 99.9% of those >>> bugs do not involve attempts to execute via the lower mapping, but >>> stray references to data objects that were not remapped properly. >>> >>> So we might consider >>> a) remapping those 1:1 aliases NX, so we don't have those patches of >>> RWX memory around >>> b) keeping LASS enabled during ordinary EFI runtime calls, as you suggest. >> >> Unless someone has a code pointer in their code. > > That is a good point, especially because the EFI universe is > constructed out of GUIDs and so-called protocols, which are just > structs with function pointers. > > However, EFI protocols are only supported at boot time, and the > runtime execution context is much more restricted. So I'd still expect > the code pointer case to be much less likely. But, that still leaves the stray data accesses. We would still need to disable the LASS data access enforcement by toggling RFLAGS.AC during the runtime calls. Can we rely on EFI to not mess up RFLAGS and keep the AC bit intact?
On November 12, 2025 8:18:20 AM PST, Sohil Mehta <sohil.mehta@intel.com> wrote: >On 11/12/2025 7:28 AM, Ard Biesheuvel wrote: > >>>> d) calling SetVirtualAddressMap() does not result in all 1:1 >>>> references being converted to the new mapping. >>>> >>>> >>>> To address d), the x86_64 implementation of efi_map_region() indeed >>>> maps an 1:1 alias of each remapped runtime regions, so that stray >>>> accesses don't fault. But the code addresses are all remapped, and so >>>> the firmware routines are always invoked via their remapped aliases in >>>> the kernel VA space. Not calling SetVirtualAddressMap() at all, or >>>> calling it with a 1:1 mapping is not feasible, essentially because >>>> Windows doesn't do that, and that is the only thing that is tested on >>>> all x86 PCs by the respective OEMs. >>>> >>>> Given that remapping the code is dealt with by the firmware's PE/COFF >>>> loader, whereas remapping [dynamically allocated] data requires effort >>>> on the part of the programmer, I'd hazard a guess that 99.9% of those >>>> bugs do not involve attempts to execute via the lower mapping, but >>>> stray references to data objects that were not remapped properly. >>>> >>>> So we might consider >>>> a) remapping those 1:1 aliases NX, so we don't have those patches of >>>> RWX memory around >>>> b) keeping LASS enabled during ordinary EFI runtime calls, as you suggest. >>> >>> Unless someone has a code pointer in their code. >> >> That is a good point, especially because the EFI universe is >> constructed out of GUIDs and so-called protocols, which are just >> structs with function pointers. >> >> However, EFI protocols are only supported at boot time, and the >> runtime execution context is much more restricted. So I'd still expect >> the code pointer case to be much less likely. > >But, that still leaves the stray data accesses. We would still need to >disable the LASS data access enforcement by toggling RFLAGS.AC during >the runtime calls. > >Can we rely on EFI to not mess up RFLAGS and keep the AC bit intact? Let's not muck with this now; it is lately pointless and as you can see it's a rathole.
On November 12, 2025 8:18:20 AM PST, Sohil Mehta <sohil.mehta@intel.com> wrote: >On 11/12/2025 7:28 AM, Ard Biesheuvel wrote: > >>>> d) calling SetVirtualAddressMap() does not result in all 1:1 >>>> references being converted to the new mapping. >>>> >>>> >>>> To address d), the x86_64 implementation of efi_map_region() indeed >>>> maps an 1:1 alias of each remapped runtime regions, so that stray >>>> accesses don't fault. But the code addresses are all remapped, and so >>>> the firmware routines are always invoked via their remapped aliases in >>>> the kernel VA space. Not calling SetVirtualAddressMap() at all, or >>>> calling it with a 1:1 mapping is not feasible, essentially because >>>> Windows doesn't do that, and that is the only thing that is tested on >>>> all x86 PCs by the respective OEMs. >>>> >>>> Given that remapping the code is dealt with by the firmware's PE/COFF >>>> loader, whereas remapping [dynamically allocated] data requires effort >>>> on the part of the programmer, I'd hazard a guess that 99.9% of those >>>> bugs do not involve attempts to execute via the lower mapping, but >>>> stray references to data objects that were not remapped properly. >>>> >>>> So we might consider >>>> a) remapping those 1:1 aliases NX, so we don't have those patches of >>>> RWX memory around >>>> b) keeping LASS enabled during ordinary EFI runtime calls, as you suggest. >>> >>> Unless someone has a code pointer in their code. >> >> That is a good point, especially because the EFI universe is >> constructed out of GUIDs and so-called protocols, which are just >> structs with function pointers. >> >> However, EFI protocols are only supported at boot time, and the >> runtime execution context is much more restricted. So I'd still expect >> the code pointer case to be much less likely. > >But, that still leaves the stray data accesses. We would still need to >disable the LASS data access enforcement by toggling RFLAGS.AC during >the runtime calls. > >Can we rely on EFI to not mess up RFLAGS and keep the AC bit intact? No.
On November 12, 2025 7:28:12 AM PST, Ard Biesheuvel <ardb@kernel.org> wrote: >On Wed, 12 Nov 2025 at 16:23, H. Peter Anvin <hpa@zytor.com> wrote: >> >> On November 12, 2025 7:18:33 AM PST, Ard Biesheuvel <ardb@kernel.org> wrote: >> >On Wed, 12 Nov 2025 at 15:58, H. Peter Anvin <hpa@zytor.com> wrote: >> >> >> >> On November 12, 2025 6:51:45 AM PST, Dave Hansen <dave.hansen@intel.com> wrote: >> >> >On 11/12/25 05:56, Ard Biesheuvel wrote: >> >> >... >> >> >>> it looks like we would now need to toggle >> >> >>> CR4.LASS every time we switch to efi_mm. The lass_enable()/_disable() >> >> >>> naming would be more suitable for those wrappers. >> >> >>> >> >> >> Note that Linux/x86 uses SetVirtualAddressMap() to remap all EFI >> >> >> runtime regions into the upper [kernel] half of the address space. >> >> >> >> >> >> SetVirtualAddressMap() itself is a terrible idea, but given that we >> >> >> are already stuck with it, we should be able to rely on ordinary EFI >> >> >> runtime calls to only execute from the upper address range. The only >> >> >> exception is the call to SetVirtualAddressMap() itself, which occurs >> >> >> only once during early boot. >> >> > >> >> >Gah, I had it in my head that we needed to use the lower mapping at >> >> >runtime. The efi_mm gets used for that SetVirtualAddressMap() and the >> >> >efi_mm continues to get used at runtime. So I think I just assumed that >> >> >the lower mappings needed to get used too. >> >> > >> >> >Thanks for the education! >> >> > >> >> >Let's say we simply delayed CR4.LASS=1 until later in boot. Could we >> >> >completely ignore LASS during EFI calls, since the calls only use the >> >> >upper address range? >> >> > >> >> >Also, in practice, are there buggy EFI implementations that use the >> >> >lower address range even though they're not supposed to? *If* we just >> >> >keep LASS on for these calls is there a chance it will cause a >> >> >regression in some buggy EFI implementations? >> >> >> >> Yes, they are. And there are buggy ones which die if set up with virtual addresses in the low half. >> > >> >To elaborate on that, there are systems where >> > >> >a) not calling SetVirtualAddressMap() crashes the firmware, because, >> >in spite of being clearly documented as optional, not calling it >> >results in some event hook not being called, causing the firmware to >> >misbehave >> > >> >b) calling SetVirtualAddressMap() with an 1:1 mapping crashes the >> >firmware (and so this is not a possible workaround for a)) >> > >> >c) calling SetVirtualAddressMap() crashes the firmware when not both >> >the old 1:1 and the new kernel mapping are already live (which >> >violates the UEFI spec) >> > >> >d) calling SetVirtualAddressMap() does not result in all 1:1 >> >references being converted to the new mapping. >> > >> > >> >To address d), the x86_64 implementation of efi_map_region() indeed >> >maps an 1:1 alias of each remapped runtime regions, so that stray >> >accesses don't fault. But the code addresses are all remapped, and so >> >the firmware routines are always invoked via their remapped aliases in >> >the kernel VA space. Not calling SetVirtualAddressMap() at all, or >> >calling it with a 1:1 mapping is not feasible, essentially because >> >Windows doesn't do that, and that is the only thing that is tested on >> >all x86 PCs by the respective OEMs. >> > >> >Given that remapping the code is dealt with by the firmware's PE/COFF >> >loader, whereas remapping [dynamically allocated] data requires effort >> >on the part of the programmer, I'd hazard a guess that 99.9% of those >> >bugs do not involve attempts to execute via the lower mapping, but >> >stray references to data objects that were not remapped properly. >> > >> >So we might consider >> >a) remapping those 1:1 aliases NX, so we don't have those patches of >> >RWX memory around >> >b) keeping LASS enabled during ordinary EFI runtime calls, as you suggest. >> >> Unless someone has a code pointer in their code. > >That is a good point, especially because the EFI universe is >constructed out of GUIDs and so-called protocols, which are just >structs with function pointers. > >However, EFI protocols are only supported at boot time, and the >runtime execution context is much more restricted. So I'd still expect >the code pointer case to be much less likely. Yes, but it only takes one. The main thing, though, is that this is being bikeshedded for no good reason: there isn't much to be had from trying to narrow down from what we have now, other than restricting the *upper* mapping further. And this has nothing to do with LASS.
On Mon, Nov 10, 2025 at 10:15:23AM -0800, Sohil Mehta wrote:
> lass_clac()/_stac():
> Disable enforcement for kernel data accesses similar to SMAP.
>
> lass_enable()/_disable():
> Disable the entire LASS mechanism (Data and instruction fetch)
> by toggling CR4.LASS
>
> Would that work? Any other suggestions?
Sure, as long as they're documented. And if we decide to change them later for
whatever reason, we can. More than enough bikeshedding we did here.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On November 10, 2025 10:15:23 AM PST, Sohil Mehta <sohil.mehta@intel.com> wrote:
>Hi Boris,
>
>On 10/29/2025 2:03 PM, Sohil Mehta wrote:
>> +/*
>> + * LASS enforcement is based on bit 63 of the virtual address. The
>> + * kernel is not allowed to touch memory in the lower half of the
>> + * virtual address space.
>> + *
>> + * Use lass_disable()/lass_enable() to toggle the AC bit for kernel data
>> + * accesses (!_PAGE_USER) that are blocked by LASS, but not by SMAP.
>> + *
>> + * Even with the AC bit set, LASS will continue to block instruction
>> + * fetches from the user half of the address space. To allow those,
>> + * clear CR4.LASS to disable the LASS mechanism entirely.
>> + *
>
>Based on the EFI discussion, it looks like we would now need to toggle
>CR4.LASS every time we switch to efi_mm. The lass_enable()/_disable()
>naming would be more suitable for those wrappers.
>
>I am thinking of reverting this back to lass_clac()/lass_stac().
>
>lass_clac()/_stac():
> Disable enforcement for kernel data accesses similar to SMAP.
>
>lass_enable()/_disable():
> Disable the entire LASS mechanism (Data and instruction fetch)
> by toggling CR4.LASS
>
>Would that work? Any other suggestions?
>
>
>> +
>> +static __always_inline void lass_enable(void)
>> +{
>> + alternative("", "clac", X86_FEATURE_LASS);
>> +}
>> +
>> +static __always_inline void lass_disable(void)
>> +{
>> + alternative("", "stac", X86_FEATURE_LASS);
>> +}
>> +
That would be my suggestion for making, too.
On 10/29/25 14:03, Sohil Mehta wrote: > Introduce LASS-specific STAC/CLAC helpers to set the AC bit only on > platforms that need it. Clarify the usage of the new helpers versus the > existing stac()/clac() helpers for SMAP. Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> One review nit: The - /* Note: a barrier is implicit in alternative() */ looks a bit funky in the diffstat. It took me a minute to realize that you'd moved it. I _probably_ would have specifically called out that you *added* comments for stac()/clac() and moved and existing duplicated comment there. Adding a whole new comment block deserves calling out explicitly. It is far beyond the "clarify" that's in the changelog. But it's just a nit in the end.
© 2016 - 2025 Red Hat, Inc.