[v11] x86: Enable Linear Address Space Separation support

[PATCH v11 5/9] x86/efi: Disable LASS while mapping the EFI runtime services

Posted by Sohil Mehta 3 months, 1 week ago

From: Alexander Shishkin <alexander.shishkin@linux.intel.com>

While mapping EFI runtime services, set_virtual_address_map() is called
at its lower mapping, which LASS prohibits. Wrapping the EFI call with
lass_disable()/_enable() is not enough, because the AC flag only
controls data accesses, and not instruction fetches.

Use the big hammer and toggle the CR4.LASS bit to make this work.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
---
v11:
 - No change.

v10:
 - Reword code comments
---
 arch/x86/platform/efi/efi.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 463b784499a8..ad9f76f90581 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -786,8 +786,8 @@ static void __init __efi_enter_virtual_mode(void)
 {
 	int count = 0, pg_shift = 0;
 	void *new_memmap = NULL;
+	unsigned long pa, lass;
 	efi_status_t status;
-	unsigned long pa;
 
 	if (efi_alloc_page_tables()) {
 		pr_err("Failed to allocate EFI page tables\n");
@@ -825,11 +825,23 @@ static void __init __efi_enter_virtual_mode(void)
 
 	efi_sync_low_kernel_mappings();
 
+	/*
+	 * LASS complains because set_virtual_address_map() is located
+	 * at a lower address. To pause enforcement, flipping RFLAGS.AC
+	 * is not sufficient, as it only permits data access and not
+	 * instruction fetch. Disable the entire LASS mechanism.
+	 */
+	lass = cr4_read_shadow() & X86_CR4_LASS;
+	cr4_clear_bits(lass);
+
 	status = efi_set_virtual_address_map(efi.memmap.desc_size * count,
 					     efi.memmap.desc_size,
 					     efi.memmap.desc_version,
 					     (efi_memory_desc_t *)pa,
 					     efi_systab_phys);
+
+	cr4_set_bits(lass);
+
 	if (status != EFI_SUCCESS) {
 		pr_err("Unable to switch EFI into virtual mode (status=%lx)!\n",
 		       status);
-- 
2.43.0

Re: [PATCH v11 5/9] x86/efi: Disable LASS while mapping the EFI runtime services

Posted by Dave Hansen 3 months, 1 week ago

On 10/29/25 14:03, Sohil Mehta wrote:
> From: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> 
> While mapping EFI runtime services, set_virtual_address_map() is called
> at its lower mapping, which LASS prohibits. Wrapping the EFI call with
> lass_disable()/_enable() is not enough, because the AC flag only
> controls data accesses, and not instruction fetches.
> 
> Use the big hammer and toggle the CR4.LASS bit to make this work.

One thing that's actually missing here is an explanation on how it's OK
to munge CR bits here. Why are preemption and interrupts not a problem?

A reviewer would have to go off and figure this out on their own.

Re: [PATCH v11 5/9] x86/efi: Disable LASS while mapping the EFI runtime services

Posted by Sohil Mehta 3 months, 1 week ago

On 10/31/2025 10:11 AM, Dave Hansen wrote:
> One thing that's actually missing here is an explanation on how it's OK
> to munge CR bits here. Why are preemption and interrupts not a problem?
> 

This is called pretty early on from the BSP init flow.

start_kernel()
  arch_cpu_finalize_init()
    efi_enter_virtual_mode()
      __efi_enter_virtual_mode()

I had assumed we run with interrupts disabled. But, that's not true.
Interrupts are enabled midway during start_kernel(). So,
arch_cpu_finalize_init() is called with interrupts enabled.

We write to CR bits during FPU init which happens right before EFI
enters virtual mode. So I am probably missing something obvious that
makes it okay.

Re: [PATCH v11 5/9] x86/efi: Disable LASS while mapping the EFI runtime services

Posted by Andy Lutomirski 3 months, 1 week ago

On Fri, Oct 31, 2025, at 10:11 AM, Dave Hansen wrote:
> On 10/29/25 14:03, Sohil Mehta wrote:
>> From: Alexander Shishkin <alexander.shishkin@linux.intel.com>
>> 
>> While mapping EFI runtime services, set_virtual_address_map() is called
>> at its lower mapping, which LASS prohibits. Wrapping the EFI call with
>> lass_disable()/_enable() is not enough, because the AC flag only
>> controls data accesses, and not instruction fetches.
>> 
>> Use the big hammer and toggle the CR4.LASS bit to make this work.
>
> One thing that's actually missing here is an explanation on how it's OK
> to munge CR bits here. Why are preemption and interrupts not a problem?
>
> A reviewer would have to go off and figure this out on their own.

I have another question: why is this one specific call a problem as opposed to something more general?  Wouldn’t any EFI call that touches the low EFI mapping be a problem?  Are there any odd code paths that touch low mapped EFI *data* that would fault?

Am I imagining an issue that doesn’t exist?  Is there some way to be reasonably convinced that you haven’t missed another EFI code path?  Would it be ridiculous to defer enabling LASS until we’re almost ready to run user code?

Re: [PATCH v11 5/9] x86/efi: Disable LASS while mapping the EFI runtime services

Posted by Sohil Mehta 3 months, 1 week ago

On 10/31/2025 10:38 AM, Andy Lutomirski wrote:

> I have another question: why is this one specific call a problem as opposed to something more general?  Wouldn’t any EFI call that touches the low EFI mapping be a problem?  Are there any odd code paths that touch low mapped EFI *data* that would fault?
> 

I assumed EFI is running in physical mode before this.
efi_sync_low_kernel_mappings() is called right before calling
set_virtual_address_map(). So, this is the only call that happens at the
low mapping while switching to virtual mode.

But, my EFI knowledge is fairly limited. I am realizing that there are
some assumptions built into this patch that I may not be aware of.

> Is there some way to be reasonably convinced that you haven’t missed another EFI code path?

We have been running the patches on internal test platforms for a couple
of years. But, that would only cover the common paths. I'll dig deeper
to get you a convincing answer.

Re: [PATCH v11 5/9] x86/efi: Disable LASS while mapping the EFI runtime services

Posted by Sohil Mehta 3 months ago

On 10/31/2025 12:04 PM, Sohil Mehta wrote:
>> Is there some way to be reasonably convinced that you haven’t missed another EFI code path?
>
> We have been running the patches on internal test platforms for a couple
> of years. But, that would only cover the common paths. I'll dig deeper
> to get you a convincing answer.

In summary, the current approach could work for BIOSes that behave well.
But, the kernel makes lots of exceptions for broken firmware and odd
implementations. We would need extra guardrails and changes to support
those, or mark them unsupported. Please see my analysis below.

For now, I am wondering if we should disable the EFI support as
well (similar to vsyscall).

if (IS_ENABLED(CONFIG_EFI))
// Do not enable LASS

I think the rest of the patches are ready. I can post a new revision
with the above change to collect additional reviews/acks. Even though,
this would significantly restrict usage, it would make it easier to
review EFI support (as well vsyscall support) in its independent,
focussed series.

My analysis
-----------
After a 1-week crash course in EFI (mainly reading lkml archives) below
is my understanding. Thanks Rick and Peter Anvin for the pointers and
insights. I would highly appreciate it if folks can validate assumptions
and help with some opens.

1) Does LASS affect EFI BootTimeServices?

Contrary to my assumption, EFI_BOOT_SERVICES_CODE/_DATA could be
accessed even after ExitBootServices() has been called. For example,
early ACPI code in efi_bgrt_init() accesses it.

efi_check_for_embedded_firmwares() accesses this memory even after
SetVirtualAddressMap() has been called right before
efi_free_boot_services().

At a minimum, we need to disable LASS around these special cases or
enable LASS only after EFI has completely finished entering virtual mode
(including freeing boot services).

Ideally, we would enable LASS much later, right before enabling userspace.

2) How does SetVirtualAddressMap() impact LASS?

SetVirtualAddressMap() is the first and only runtime service call that
is made in EFI physical mode (at the lower mapping). After the call,
firmware is expected to switch all its pointers to the high virtual
address provided by the kernel.

If LASS is enabled, it needs to be temporarily turned off during
SetVirtualAddressMap() as done in this patch. Though, the resolution in
#1 would likely make this patch moot.

3) Would LASS interfere with other runtime services?

Unfortunately, some firmware tends to cling to the old physical
addresses even after SetVirtualAddressMap() and doesn't completely
switch over to using the new virtual addresses. To workaround, the
kernel dual maps all the memory marked as EFI_RUNTIME under a separate
efi_mm. First with a 1:1 map and second with the high virtual address.
See efi_map_region().

Also, some runtime services expect to access the First 4kb of physical
memory, which is also mapped 1:1 to avoid failures.

To avoid any of these corner cases, LASS must be toggled everytime we
make a runtime EFI call. Because efi_mm doesn't have real user mappings,
disabling LASS after efi_enter_mm() should be fine.

I am unsure whether the accesses are only data accesses, or could
instruction fetch happen as well. Based on that, we would need a
STAC/CLAC pair or a CR4.LASS toggle to disable LASS.

Writing to CR4 might be the safest option, because performance is not a
concern here, right?

4) What happens if an EFI runtime call trips LASS?

If a LASS violation happens with EFI, the system would trigger a #GP and
hang. For page faults, we seem to have introduced
efi_crash_gracefully_on_page_fault() to attribute the fault to EFI. Do
we require something similar for #GP?

My inclination is to add the helpful prints after we run into an issue.

5) Is there any other aspect of EFI that should be considered?

Please let me know if I have missed something.

Another approach could be to support only limited (well behaving)
firmware implementations with LASS. But, I am not sure how practical
that would be given all the quirks we have in place.

Thanks,
Sohil

Re: [PATCH v11 5/9] x86/efi: Disable LASS while mapping the EFI runtime services

Posted by Dave Hansen 3 months, 1 week ago

On 10/31/25 10:38, Andy Lutomirski wrote:
> Am I imagining an issue that doesn’t exist?  Is there some way to be
> reasonably convinced that you haven’t missed another EFI code path?
> Would it be ridiculous to defer enabling LASS until we’re almost
> ready to run user code?
Deferring is a good idea. I was just asking for the same thing for the
CR pinning enforcement. The earlier we try to do these things, the more
we just trip over ourselves.

Re: [PATCH v11 5/9] x86/efi: Disable LASS while mapping the EFI runtime services

Posted by Sohil Mehta 3 months, 1 week ago

On 10/31/2025 10:41 AM, Dave Hansen wrote:
> On 10/31/25 10:38, Andy Lutomirski wrote:
>> Am I imagining an issue that doesn’t exist?  Is there some way to be
>> reasonably convinced that you haven’t missed another EFI code path?
>> Would it be ridiculous to defer enabling LASS until we’re almost
>> ready to run user code?

> Deferring is a good idea. I was just asking for the same thing for the
> CR pinning enforcement. The earlier we try to do these things, the more
> we just trip over ourselves.

I had suggested deferring as well to Kirill when I was reviewing the
series. He preferred to enable LASS with other similar features such as
SMAP, SMEP.

One other thing to consider:

Doing it in identify_cpu() makes it easy for all the APs to program
their CR4.LASS bit. If we were to defer it, we would need some
additional work to setup all the APs.

Do we already do this for something else? That would make it easier to
tag along.

Re: [PATCH v11 5/9] x86/efi: Disable LASS while mapping the EFI runtime services

Posted by Dave Hansen 3 months, 1 week ago

On 10/31/25 11:03, Sohil Mehta wrote:
>> Deferring is a good idea. I was just asking for the same thing for the
>> CR pinning enforcement. The earlier we try to do these things, the more
>> we just trip over ourselves.
> I had suggested deferring as well to Kirill when I was reviewing the
> series. He preferred to enable LASS with other similar features such as
> SMAP, SMEP.
> 
> One other thing to consider:
> 
> Doing it in identify_cpu() makes it easy for all the APs to program
> their CR4.LASS bit. If we were to defer it, we would need some
> additional work to setup all the APs.

That's true. We'd need an smp_call_function() of some kind. *But*, once
that is in place, it's hopefully just a matter of moving that one line
of code per feature from identify_cpu() over to the new function.

> Do we already do this for something else? That would make it easier to
> tag along.

We don't do it for anything else that I can think of.

But there's a pretty broad set of things that are for "security" that
aren't necessary while you're just running trusted ring0 code:

 * SMAP/SMEP
 * CR pinning itself
 * MSR_IA32_SPEC_CTRL
 * MSR_IA32_TSX_CTRL

They just haven't mattered until now because they don't have any
practical effect until you actually have code running on _PAGE_USER
mappings trying to attack the kernel.

Re: [PATCH v11 5/9] x86/efi: Disable LASS while mapping the EFI runtime services

Posted by Peter Zijlstra 3 months ago

On Fri, Oct 31, 2025 at 11:12:53AM -0700, Dave Hansen wrote:

> But there's a pretty broad set of things that are for "security" that
> aren't necessary while you're just running trusted ring0 code:
> 
>  * SMAP/SMEP
>  * CR pinning itself
>  * MSR_IA32_SPEC_CTRL
>  * MSR_IA32_TSX_CTRL
> 
> They just haven't mattered until now because they don't have any
> practical effect until you actually have code running on _PAGE_USER
> mappings trying to attack the kernel.

But that's just the thing EFI is *NOT* trusted! We're basically
disabling all security features (not listed above are CET and CFI) to
run this random garbage we have no control over.

How about we just flat out refuse EFI runtime services? What are they
actually needed for? Why are we bending over backwards and subverting
our security for this stuff?

Re: [PATCH v11 5/9] x86/efi: Disable LASS while mapping the EFI runtime services

Posted by Ard Biesheuvel 3 months ago

On Fri, 7 Nov 2025 at 10:04, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Oct 31, 2025 at 11:12:53AM -0700, Dave Hansen wrote:
>
> > But there's a pretty broad set of things that are for "security" that
> > aren't necessary while you're just running trusted ring0 code:
> >
> >  * SMAP/SMEP
> >  * CR pinning itself
> >  * MSR_IA32_SPEC_CTRL
> >  * MSR_IA32_TSX_CTRL
> >
> > They just haven't mattered until now because they don't have any
> > practical effect until you actually have code running on _PAGE_USER
> > mappings trying to attack the kernel.
>
> But that's just the thing EFI is *NOT* trusted! We're basically
> disabling all security features (not listed above are CET and CFI) to
> run this random garbage we have no control over.
>
> How about we just flat out refuse EFI runtime services? What are they
> actually needed for? Why are we bending over backwards and subverting
> our security for this stuff?

On x86, it is mostly the EFI variable services that user space has
come to rely on, not only for setting the boot path (which typically
happens only once at installation time, when the path to GRUB is set
as the first boot option). Unfortunately, the systemd folks have taken
a liking to this feature too, and have started storing things in
there.

There is also PRM, which is much worse, as it permits devices in the
ACPI namespace to call firmware routines that are mapped privileged in
the OS address space in the same way. I objected to this at the time,
and asked for a facility where we could at least mark such code as
unprivileged (and run it as such) but this was ignored, as Intel and
MS had already sealed the deal and put this into production. This is
much worse than typical EFI routines, as the PRM code is ODM/OEM code
rather than something that comes from the upstream EFI implementation.
It is basically a dumping ground for code that used to run in SMM
because it was too ugly to run anywhere else. </rant>

It would be nice if we could

a) Get rid of SetVirtualAddressMap(), which is another insane hack
that should never have been supported on 64-bit systems. On arm64, we
no longer call it unless there is a specific need for it (some Ampere
Altra systems with buggy firmware will crash otherwise). On x86,
though, it might be tricky because there so much buggy firmware.
Perhaps we should phase it out by checking for the UEFI version, so
that future systems will avoid it. This would mean, however, that EFI
code remains in the low user address space, which may not be what you
want (unless we do c) perhaps?)

b) Run EFI runtime calls in a sandbox VM - there was a PoC implemented
for arm64 a couple of years ago, but it was very intrusive and the ARM
intern in question went on to do more satisyfing work.

c) Unmap the kernel KPTI style while the runtime calls are in
progress? This should be rather straight-forward, although it might
not help a lot as the code in question still runs privileged.

Re: [PATCH v11 5/9] x86/efi: Disable LASS while mapping the EFI runtime services

Posted by Peter Zijlstra 3 months ago

On Fri, Nov 07, 2025 at 10:22:30AM +0100, Ard Biesheuvel wrote:

> There is also PRM, which is much worse, as it permits devices in the
> ACPI namespace to call firmware routines that are mapped privileged in
> the OS address space in the same way. I objected to this at the time,
> and asked for a facility where we could at least mark such code as
> unprivileged (and run it as such) but this was ignored, as Intel and
> MS had already sealed the deal and put this into production. This is
> much worse than typical EFI routines, as the PRM code is ODM/OEM code
> rather than something that comes from the upstream EFI implementation.
> It is basically a dumping ground for code that used to run in SMM
> because it was too ugly to run anywhere else. </rant>

'https://uefi.org/sites/default/files/resources/Platform Runtime Mechanism - with legal notice.pdf'

Has on page 16, section 3.1:

  8. PRM handlers must not contain any privileged instructions.

So we should be able to actually run this crap in ring3, right?

Re: [PATCH v11 5/9] x86/efi: Disable LASS while mapping the EFI runtime services

Posted by Ard Biesheuvel 3 months ago

On Fri, 7 Nov 2025 at 11:10, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Nov 07, 2025 at 10:22:30AM +0100, Ard Biesheuvel wrote:
>
> > There is also PRM, which is much worse, as it permits devices in the
> > ACPI namespace to call firmware routines that are mapped privileged in
> > the OS address space in the same way. I objected to this at the time,
> > and asked for a facility where we could at least mark such code as
> > unprivileged (and run it as such) but this was ignored, as Intel and
> > MS had already sealed the deal and put this into production. This is
> > much worse than typical EFI routines, as the PRM code is ODM/OEM code
> > rather than something that comes from the upstream EFI implementation.
> > It is basically a dumping ground for code that used to run in SMM
> > because it was too ugly to run anywhere else. </rant>
>
> 'https://uefi.org/sites/default/files/resources/Platform Runtime Mechanism - with legal notice.pdf'
>
> Has on page 16, section 3.1:
>
>   8. PRM handlers must not contain any privileged instructions.
>
> So we should be able to actually run this crap in ring3, right?

How interesting! This wasn't in the draft that I reviewed at the time,
so someone did listen.

So it does seem feasible to drop privileges and reacquire them in
principle, as long as we ensure that all the memory touched by the PRM
services (stack, code, data, MMIO regions) is mapped appropriately in
the EFI memory map.

Re: [PATCH v11 5/9] x86/efi: Disable LASS while mapping the EFI runtime services

Posted by Peter Zijlstra 3 months ago

On Fri, Nov 07, 2025 at 10:22:30AM +0100, Ard Biesheuvel wrote:

> > But that's just the thing EFI is *NOT* trusted! We're basically
> > disabling all security features (not listed above are CET and CFI) to
> > run this random garbage we have no control over.
> >
> > How about we just flat out refuse EFI runtime services? What are they
> > actually needed for? Why are we bending over backwards and subverting
> > our security for this stuff?
> 
> On x86, it is mostly the EFI variable services that user space has
> come to rely on, not only for setting the boot path (which typically
> happens only once at installation time, when the path to GRUB is set
> as the first boot option). Unfortunately, the systemd folks have taken
> a liking to this feature too, and have started storing things in
> there.

*groan*, so booting with noefi (I just went and found that option) will
cause a modern Linux system to fail booting?

> There is also PRM, which is much worse, as it permits devices in the
> ACPI namespace to call firmware routines that are mapped privileged in
> the OS address space in the same way. I objected to this at the time,
> and asked for a facility where we could at least mark such code as
> unprivileged (and run it as such) but this was ignored, as Intel and
> MS had already sealed the deal and put this into production. This is
> much worse than typical EFI routines, as the PRM code is ODM/OEM code
> rather than something that comes from the upstream EFI implementation.
> It is basically a dumping ground for code that used to run in SMM
> because it was too ugly to run anywhere else. </rant>

What the actual fuck!! And we support this garbage? Without
pr_err(FW_BUG ) notification?

How can one find such devices? I need to check my machine.

> It would be nice if we could
> 
> a) Get rid of SetVirtualAddressMap(), which is another insane hack
> that should never have been supported on 64-bit systems. On arm64, we
> no longer call it unless there is a specific need for it (some Ampere
> Altra systems with buggy firmware will crash otherwise). On x86,
> though, it might be tricky because there so much buggy firmware.
> Perhaps we should phase it out by checking for the UEFI version, so
> that future systems will avoid it. This would mean, however, that EFI
> code remains in the low user address space, which may not be what you
> want (unless we do c) perhaps?)
> 
> b) Run EFI runtime calls in a sandbox VM - there was a PoC implemented
> for arm64 a couple of years ago, but it was very intrusive and the ARM
> intern in question went on to do more satisyfing work.
> 
> c) Unmap the kernel KPTI style while the runtime calls are in
> progress? This should be rather straight-forward, although it might
> not help a lot as the code in question still runs privileged.

At the very least I think we should start printing scary messages about
disabling security to run untrusted code. This is all quite insane :/

Re: [PATCH v11 5/9] x86/efi: Disable LASS while mapping the EFI runtime services

Posted by Ard Biesheuvel 3 months ago

On Fri, 7 Nov 2025 at 10:40, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Nov 07, 2025 at 10:22:30AM +0100, Ard Biesheuvel wrote:
>
> > > But that's just the thing EFI is *NOT* trusted! We're basically
> > > disabling all security features (not listed above are CET and CFI) to
> > > run this random garbage we have no control over.
> > >
> > > How about we just flat out refuse EFI runtime services? What are they
> > > actually needed for? Why are we bending over backwards and subverting
> > > our security for this stuff?
> >
> > On x86, it is mostly the EFI variable services that user space has
> > come to rely on, not only for setting the boot path (which typically
> > happens only once at installation time, when the path to GRUB is set
> > as the first boot option). Unfortunately, the systemd folks have taken
> > a liking to this feature too, and have started storing things in
> > there.
>
> *groan*, so booting with noefi (I just went and found that option) will
> cause a modern Linux system to fail booting?
>

As long as you install with EFI enabled, the impact of efi=noruntime
should be limited, given that x86 does not rely on EFI runtime
services for the RTC or for reboot/poweroff. But you will lose access
to the EFI variable store. (Not sure what 'noefi' does in comparison,
but keeping EFI enabled at boot time for things like secure/measured
boot and storage encryption will probably result in a net positive
impact on security/hardening as long as you avoid calling into the
firmware after boot)

> > There is also PRM, which is much worse, as it permits devices in the
> > ACPI namespace to call firmware routines that are mapped privileged in
> > the OS address space in the same way. I objected to this at the time,
> > and asked for a facility where we could at least mark such code as
> > unprivileged (and run it as such) but this was ignored, as Intel and
> > MS had already sealed the deal and put this into production. This is
> > much worse than typical EFI routines, as the PRM code is ODM/OEM code
> > rather than something that comes from the upstream EFI implementation.
> > It is basically a dumping ground for code that used to run in SMM
> > because it was too ugly to run anywhere else. </rant>
>
> What the actual fuck!! And we support this garbage? Without
> pr_err(FW_BUG ) notification?
>
> How can one find such devices? I need to check my machine.
>

Unless you have a PRMT table in the list of ACPI tables, your system
shouldn't be affected by this.

> > It would be nice if we could
> >
> > a) Get rid of SetVirtualAddressMap(), which is another insane hack
> > that should never have been supported on 64-bit systems. On arm64, we
> > no longer call it unless there is a specific need for it (some Ampere
> > Altra systems with buggy firmware will crash otherwise). On x86,
> > though, it might be tricky because there so much buggy firmware.
> > Perhaps we should phase it out by checking for the UEFI version, so
> > that future systems will avoid it. This would mean, however, that EFI
> > code remains in the low user address space, which may not be what you
> > want (unless we do c) perhaps?)
> >
> > b) Run EFI runtime calls in a sandbox VM - there was a PoC implemented
> > for arm64 a couple of years ago, but it was very intrusive and the ARM
> > intern in question went on to do more satisyfing work.
> >
> > c) Unmap the kernel KPTI style while the runtime calls are in
> > progress? This should be rather straight-forward, although it might
> > not help a lot as the code in question still runs privileged.
>
> At the very least I think we should start printing scary messages about
> disabling security to run untrusted code. This is all quite insane :/

I agree in principle. However, calling it 'untrusted' is a bit
misleading here, given that you already rely on the same body of code
to boot your computer to begin with. I.e., if you suspect that the
code in question is conspiring against you, not calling it at runtime
to manipulate EFI variables is not going to help with that.

But from a robustness point of view, I agree - running vendor code at
the OS's privilege level at runtime that was only tested with Windows
is not great for stability, and it would be nice if we could leverage
the principle of least privilege and only permit it to access the
things that it actually needs to perform the task that we've asked it
to. This is why I asked for the ability to mark PRM services as
unprivileged, given that they typically only run some code and perhaps
poke some memory (either RAM or MMIO registers) that the OS never
accesses directly.

Question is though whether on x86, sandboxing is feasible: can VMs
call into SMM? Because that is where 95% of the EFI variable services
logic resides - the code running directly under the OS does very
little other than marshalling the arguments and passing them on.

Re: [PATCH v11 5/9] x86/efi: Disable LASS while mapping the EFI runtime services

Posted by Peter Zijlstra 3 months ago

On Fri, Nov 07, 2025 at 11:09:44AM +0100, Ard Biesheuvel wrote:

> As long as you install with EFI enabled, the impact of efi=noruntime
> should be limited, given that x86 does not rely on EFI runtime
> services for the RTC or for reboot/poweroff. But you will lose access
> to the EFI variable store. (Not sure what 'noefi' does in comparison,
> but keeping EFI enabled at boot time for things like secure/measured
> boot and storage encryption will probably result in a net positive
> impact on security/hardening as long as you avoid calling into the
> firmware after boot)

I would say it should all stay before we start userspace, because that's
where our trust boundary is. We definitely do not trust userspace.

Also, if they all think this is 'important' why not provide native
drivers for this service?

> > At the very least I think we should start printing scary messages about
> > disabling security to run untrusted code. This is all quite insane :/
> 
> I agree in principle. However, calling it 'untrusted' is a bit
> misleading here, given that you already rely on the same body of code
> to boot your computer to begin with. 

That PRM stuff really doesn't sound like its needed to boot. And it
sounds like it really should be part of the normal Linux driver, but
isn't for $corp reasons or something.

> I.e., if you suspect that the
> code in question is conspiring against you, not calling it at runtime
> to manipulate EFI variables is not going to help with that.

Well, the problem is the disabling of all the hardware and software
security measures to run this crap. This makes it a prime target to take
over stuff. Also, while EFI code might be good enough to boot the
machine, using it at runtime is a whole different league of security.

What if they have a 'bug' in the variable name parser and a variable
named "NSAWantsAccess" gets you a buffer overflow and random code
execution.

Trusting it to boot the machine and trusting it to be safe for general
runtime are two very different things.

> Question is though whether on x86, sandboxing is feasible: can VMs
> call into SMM? Because that is where 95% of the EFI variable services
> logic resides - the code running directly under the OS does very
> little other than marshalling the arguments and passing them on.

I just read in that PRM document that they *REALLY* want to get away
from SMM because it freezes all CPUs in the system for the duration of
the SMI. So this variable crud being in SMM would be inconsistent.

Anyway, I'm all for very aggressive runtime warnings and pushing vendors
that object to provide native drivers. I don't believe there is any real
technical reason for any of this.

Re: [PATCH v11 5/9] x86/efi: Disable LASS while mapping the EFI runtime services

Posted by H. Peter Anvin 3 months ago

On November 7, 2025 1:22:30 AM PST, Ard Biesheuvel <ardb@kernel.org> wrote:
>On Fri, 7 Nov 2025 at 10:04, Peter Zijlstra <peterz@infradead.org> wrote:
>>
>> On Fri, Oct 31, 2025 at 11:12:53AM -0700, Dave Hansen wrote:
>>
>> > But there's a pretty broad set of things that are for "security" that
>> > aren't necessary while you're just running trusted ring0 code:
>> >
>> >  * SMAP/SMEP
>> >  * CR pinning itself
>> >  * MSR_IA32_SPEC_CTRL
>> >  * MSR_IA32_TSX_CTRL
>> >
>> > They just haven't mattered until now because they don't have any
>> > practical effect until you actually have code running on _PAGE_USER
>> > mappings trying to attack the kernel.
>>
>> But that's just the thing EFI is *NOT* trusted! We're basically
>> disabling all security features (not listed above are CET and CFI) to
>> run this random garbage we have no control over.
>>
>> How about we just flat out refuse EFI runtime services? What are they
>> actually needed for? Why are we bending over backwards and subverting
>> our security for this stuff?
>
>On x86, it is mostly the EFI variable services that user space has
>come to rely on, not only for setting the boot path (which typically
>happens only once at installation time, when the path to GRUB is set
>as the first boot option). Unfortunately, the systemd folks have taken
>a liking to this feature too, and have started storing things in
>there.
>
>There is also PRM, which is much worse, as it permits devices in the
>ACPI namespace to call firmware routines that are mapped privileged in
>the OS address space in the same way. I objected to this at the time,
>and asked for a facility where we could at least mark such code as
>unprivileged (and run it as such) but this was ignored, as Intel and
>MS had already sealed the deal and put this into production. This is
>much worse than typical EFI routines, as the PRM code is ODM/OEM code
>rather than something that comes from the upstream EFI implementation.
>It is basically a dumping ground for code that used to run in SMM
>because it was too ugly to run anywhere else. </rant>
>
>It would be nice if we could
>
>a) Get rid of SetVirtualAddressMap(), which is another insane hack
>that should never have been supported on 64-bit systems. On arm64, we
>no longer call it unless there is a specific need for it (some Ampere
>Altra systems with buggy firmware will crash otherwise). On x86,
>though, it might be tricky because there so much buggy firmware.
>Perhaps we should phase it out by checking for the UEFI version, so
>that future systems will avoid it. This would mean, however, that EFI
>code remains in the low user address space, which may not be what you
>want (unless we do c) perhaps?)
>
>b) Run EFI runtime calls in a sandbox VM - there was a PoC implemented
>for arm64 a couple of years ago, but it was very intrusive and the ARM
>intern in question went on to do more satisyfing work.
>
>c) Unmap the kernel KPTI style while the runtime calls are in
>progress? This should be rather straight-forward, although it might
>not help a lot as the code in question still runs privileged.

Firmware update is a big one.

Re: [PATCH v11 5/9] x86/efi: Disable LASS while mapping the EFI runtime services

Posted by Ard Biesheuvel 3 months ago

On Fri, 7 Nov 2025 at 10:27, H. Peter Anvin <hpa@zytor.com> wrote:
>
> On November 7, 2025 1:22:30 AM PST, Ard Biesheuvel <ardb@kernel.org> wrote:
> >On Fri, 7 Nov 2025 at 10:04, Peter Zijlstra <peterz@infradead.org> wrote:
> >>
> >> On Fri, Oct 31, 2025 at 11:12:53AM -0700, Dave Hansen wrote:
> >>
> >> > But there's a pretty broad set of things that are for "security" that
> >> > aren't necessary while you're just running trusted ring0 code:
> >> >
> >> >  * SMAP/SMEP
> >> >  * CR pinning itself
> >> >  * MSR_IA32_SPEC_CTRL
> >> >  * MSR_IA32_TSX_CTRL
> >> >
> >> > They just haven't mattered until now because they don't have any
> >> > practical effect until you actually have code running on _PAGE_USER
> >> > mappings trying to attack the kernel.
> >>
> >> But that's just the thing EFI is *NOT* trusted! We're basically
> >> disabling all security features (not listed above are CET and CFI) to
> >> run this random garbage we have no control over.
> >>
> >> How about we just flat out refuse EFI runtime services? What are they
> >> actually needed for? Why are we bending over backwards and subverting
> >> our security for this stuff?
> >
> >On x86, it is mostly the EFI variable services that user space has
> >come to rely on, not only for setting the boot path (which typically
> >happens only once at installation time, when the path to GRUB is set
> >as the first boot option). Unfortunately, the systemd folks have taken
> >a liking to this feature too, and have started storing things in
> >there.
> >
> >There is also PRM, which is much worse, as it permits devices in the
> >ACPI namespace to call firmware routines that are mapped privileged in
> >the OS address space in the same way. I objected to this at the time,
> >and asked for a facility where we could at least mark such code as
> >unprivileged (and run it as such) but this was ignored, as Intel and
> >MS had already sealed the deal and put this into production. This is
> >much worse than typical EFI routines, as the PRM code is ODM/OEM code
> >rather than something that comes from the upstream EFI implementation.
> >It is basically a dumping ground for code that used to run in SMM
> >because it was too ugly to run anywhere else. </rant>
> >
> >It would be nice if we could
> >
> >a) Get rid of SetVirtualAddressMap(), which is another insane hack
> >that should never have been supported on 64-bit systems. On arm64, we
> >no longer call it unless there is a specific need for it (some Ampere
> >Altra systems with buggy firmware will crash otherwise). On x86,
> >though, it might be tricky because there so much buggy firmware.
> >Perhaps we should phase it out by checking for the UEFI version, so
> >that future systems will avoid it. This would mean, however, that EFI
> >code remains in the low user address space, which may not be what you
> >want (unless we do c) perhaps?)
> >
> >b) Run EFI runtime calls in a sandbox VM - there was a PoC implemented
> >for arm64 a couple of years ago, but it was very intrusive and the ARM
> >intern in question went on to do more satisyfing work.
> >
> >c) Unmap the kernel KPTI style while the runtime calls are in
> >progress? This should be rather straight-forward, although it might
> >not help a lot as the code in question still runs privileged.
>
> Firmware update is a big one.

Firmware update does not run under the OS.