From: Alexander Shishkin <alexander.shishkin@linux.intel.com>
While mapping EFI runtime services, set_virtual_address_map() is called
at its lower mapping, which LASS prohibits. Wrapping the EFI call with
lass_disable()/_enable() is not enough, because the AC flag only
controls data accesses, and not instruction fetches.
Use the big hammer and toggle the CR4.LASS bit to make this work.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
---
v11:
- No change.
v10:
- Reword code comments
---
arch/x86/platform/efi/efi.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 463b784499a8..ad9f76f90581 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -786,8 +786,8 @@ static void __init __efi_enter_virtual_mode(void)
{
int count = 0, pg_shift = 0;
void *new_memmap = NULL;
+ unsigned long pa, lass;
efi_status_t status;
- unsigned long pa;
if (efi_alloc_page_tables()) {
pr_err("Failed to allocate EFI page tables\n");
@@ -825,11 +825,23 @@ static void __init __efi_enter_virtual_mode(void)
efi_sync_low_kernel_mappings();
+ /*
+ * LASS complains because set_virtual_address_map() is located
+ * at a lower address. To pause enforcement, flipping RFLAGS.AC
+ * is not sufficient, as it only permits data access and not
+ * instruction fetch. Disable the entire LASS mechanism.
+ */
+ lass = cr4_read_shadow() & X86_CR4_LASS;
+ cr4_clear_bits(lass);
+
status = efi_set_virtual_address_map(efi.memmap.desc_size * count,
efi.memmap.desc_size,
efi.memmap.desc_version,
(efi_memory_desc_t *)pa,
efi_systab_phys);
+
+ cr4_set_bits(lass);
+
if (status != EFI_SUCCESS) {
pr_err("Unable to switch EFI into virtual mode (status=%lx)!\n",
status);
--
2.43.0
On 10/29/25 14:03, Sohil Mehta wrote: > From: Alexander Shishkin <alexander.shishkin@linux.intel.com> > > While mapping EFI runtime services, set_virtual_address_map() is called > at its lower mapping, which LASS prohibits. Wrapping the EFI call with > lass_disable()/_enable() is not enough, because the AC flag only > controls data accesses, and not instruction fetches. > > Use the big hammer and toggle the CR4.LASS bit to make this work. One thing that's actually missing here is an explanation on how it's OK to munge CR bits here. Why are preemption and interrupts not a problem? A reviewer would have to go off and figure this out on their own.
On 10/31/2025 10:11 AM, Dave Hansen wrote:
> One thing that's actually missing here is an explanation on how it's OK
> to munge CR bits here. Why are preemption and interrupts not a problem?
>
This is called pretty early on from the BSP init flow.
start_kernel()
arch_cpu_finalize_init()
efi_enter_virtual_mode()
__efi_enter_virtual_mode()
I had assumed we run with interrupts disabled. But, that's not true.
Interrupts are enabled midway during start_kernel(). So,
arch_cpu_finalize_init() is called with interrupts enabled.
We write to CR bits during FPU init which happens right before EFI
enters virtual mode. So I am probably missing something obvious that
makes it okay.
On Fri, Oct 31, 2025, at 10:11 AM, Dave Hansen wrote: > On 10/29/25 14:03, Sohil Mehta wrote: >> From: Alexander Shishkin <alexander.shishkin@linux.intel.com> >> >> While mapping EFI runtime services, set_virtual_address_map() is called >> at its lower mapping, which LASS prohibits. Wrapping the EFI call with >> lass_disable()/_enable() is not enough, because the AC flag only >> controls data accesses, and not instruction fetches. >> >> Use the big hammer and toggle the CR4.LASS bit to make this work. > > One thing that's actually missing here is an explanation on how it's OK > to munge CR bits here. Why are preemption and interrupts not a problem? > > A reviewer would have to go off and figure this out on their own. I have another question: why is this one specific call a problem as opposed to something more general? Wouldn’t any EFI call that touches the low EFI mapping be a problem? Are there any odd code paths that touch low mapped EFI *data* that would fault? Am I imagining an issue that doesn’t exist? Is there some way to be reasonably convinced that you haven’t missed another EFI code path? Would it be ridiculous to defer enabling LASS until we’re almost ready to run user code?
On 10/31/2025 10:38 AM, Andy Lutomirski wrote: > I have another question: why is this one specific call a problem as opposed to something more general? Wouldn’t any EFI call that touches the low EFI mapping be a problem? Are there any odd code paths that touch low mapped EFI *data* that would fault? > I assumed EFI is running in physical mode before this. efi_sync_low_kernel_mappings() is called right before calling set_virtual_address_map(). So, this is the only call that happens at the low mapping while switching to virtual mode. But, my EFI knowledge is fairly limited. I am realizing that there are some assumptions built into this patch that I may not be aware of. > Is there some way to be reasonably convinced that you haven’t missed another EFI code path? We have been running the patches on internal test platforms for a couple of years. But, that would only cover the common paths. I'll dig deeper to get you a convincing answer.
On 10/31/2025 12:04 PM, Sohil Mehta wrote: >> Is there some way to be reasonably convinced that you haven’t missed another EFI code path? > > We have been running the patches on internal test platforms for a couple > of years. But, that would only cover the common paths. I'll dig deeper > to get you a convincing answer. In summary, the current approach could work for BIOSes that behave well. But, the kernel makes lots of exceptions for broken firmware and odd implementations. We would need extra guardrails and changes to support those, or mark them unsupported. Please see my analysis below. For now, I am wondering if we should disable the EFI support as well (similar to vsyscall). if (IS_ENABLED(CONFIG_EFI)) // Do not enable LASS I think the rest of the patches are ready. I can post a new revision with the above change to collect additional reviews/acks. Even though, this would significantly restrict usage, it would make it easier to review EFI support (as well vsyscall support) in its independent, focussed series. My analysis ----------- After a 1-week crash course in EFI (mainly reading lkml archives) below is my understanding. Thanks Rick and Peter Anvin for the pointers and insights. I would highly appreciate it if folks can validate assumptions and help with some opens. 1) Does LASS affect EFI BootTimeServices? Contrary to my assumption, EFI_BOOT_SERVICES_CODE/_DATA could be accessed even after ExitBootServices() has been called. For example, early ACPI code in efi_bgrt_init() accesses it. efi_check_for_embedded_firmwares() accesses this memory even after SetVirtualAddressMap() has been called right before efi_free_boot_services(). At a minimum, we need to disable LASS around these special cases or enable LASS only after EFI has completely finished entering virtual mode (including freeing boot services). Ideally, we would enable LASS much later, right before enabling userspace. 2) How does SetVirtualAddressMap() impact LASS? SetVirtualAddressMap() is the first and only runtime service call that is made in EFI physical mode (at the lower mapping). After the call, firmware is expected to switch all its pointers to the high virtual address provided by the kernel. If LASS is enabled, it needs to be temporarily turned off during SetVirtualAddressMap() as done in this patch. Though, the resolution in #1 would likely make this patch moot. 3) Would LASS interfere with other runtime services? Unfortunately, some firmware tends to cling to the old physical addresses even after SetVirtualAddressMap() and doesn't completely switch over to using the new virtual addresses. To workaround, the kernel dual maps all the memory marked as EFI_RUNTIME under a separate efi_mm. First with a 1:1 map and second with the high virtual address. See efi_map_region(). Also, some runtime services expect to access the First 4kb of physical memory, which is also mapped 1:1 to avoid failures. To avoid any of these corner cases, LASS must be toggled everytime we make a runtime EFI call. Because efi_mm doesn't have real user mappings, disabling LASS after efi_enter_mm() should be fine. I am unsure whether the accesses are only data accesses, or could instruction fetch happen as well. Based on that, we would need a STAC/CLAC pair or a CR4.LASS toggle to disable LASS. Writing to CR4 might be the safest option, because performance is not a concern here, right? 4) What happens if an EFI runtime call trips LASS? If a LASS violation happens with EFI, the system would trigger a #GP and hang. For page faults, we seem to have introduced efi_crash_gracefully_on_page_fault() to attribute the fault to EFI. Do we require something similar for #GP? My inclination is to add the helpful prints after we run into an issue. 5) Is there any other aspect of EFI that should be considered? Please let me know if I have missed something. Another approach could be to support only limited (well behaving) firmware implementations with LASS. But, I am not sure how practical that would be given all the quirks we have in place. Thanks, Sohil
On 10/31/25 10:38, Andy Lutomirski wrote: > Am I imagining an issue that doesn’t exist? Is there some way to be > reasonably convinced that you haven’t missed another EFI code path? > Would it be ridiculous to defer enabling LASS until we’re almost > ready to run user code? Deferring is a good idea. I was just asking for the same thing for the CR pinning enforcement. The earlier we try to do these things, the more we just trip over ourselves.
On 10/31/2025 10:41 AM, Dave Hansen wrote: > On 10/31/25 10:38, Andy Lutomirski wrote: >> Am I imagining an issue that doesn’t exist? Is there some way to be >> reasonably convinced that you haven’t missed another EFI code path? >> Would it be ridiculous to defer enabling LASS until we’re almost >> ready to run user code? > Deferring is a good idea. I was just asking for the same thing for the > CR pinning enforcement. The earlier we try to do these things, the more > we just trip over ourselves. I had suggested deferring as well to Kirill when I was reviewing the series. He preferred to enable LASS with other similar features such as SMAP, SMEP. One other thing to consider: Doing it in identify_cpu() makes it easy for all the APs to program their CR4.LASS bit. If we were to defer it, we would need some additional work to setup all the APs. Do we already do this for something else? That would make it easier to tag along.
On 10/31/25 11:03, Sohil Mehta wrote: >> Deferring is a good idea. I was just asking for the same thing for the >> CR pinning enforcement. The earlier we try to do these things, the more >> we just trip over ourselves. > I had suggested deferring as well to Kirill when I was reviewing the > series. He preferred to enable LASS with other similar features such as > SMAP, SMEP. > > One other thing to consider: > > Doing it in identify_cpu() makes it easy for all the APs to program > their CR4.LASS bit. If we were to defer it, we would need some > additional work to setup all the APs. That's true. We'd need an smp_call_function() of some kind. *But*, once that is in place, it's hopefully just a matter of moving that one line of code per feature from identify_cpu() over to the new function. > Do we already do this for something else? That would make it easier to > tag along. We don't do it for anything else that I can think of. But there's a pretty broad set of things that are for "security" that aren't necessary while you're just running trusted ring0 code: * SMAP/SMEP * CR pinning itself * MSR_IA32_SPEC_CTRL * MSR_IA32_TSX_CTRL They just haven't mattered until now because they don't have any practical effect until you actually have code running on _PAGE_USER mappings trying to attack the kernel.
On Fri, Oct 31, 2025 at 11:12:53AM -0700, Dave Hansen wrote: > But there's a pretty broad set of things that are for "security" that > aren't necessary while you're just running trusted ring0 code: > > * SMAP/SMEP > * CR pinning itself > * MSR_IA32_SPEC_CTRL > * MSR_IA32_TSX_CTRL > > They just haven't mattered until now because they don't have any > practical effect until you actually have code running on _PAGE_USER > mappings trying to attack the kernel. But that's just the thing EFI is *NOT* trusted! We're basically disabling all security features (not listed above are CET and CFI) to run this random garbage we have no control over. How about we just flat out refuse EFI runtime services? What are they actually needed for? Why are we bending over backwards and subverting our security for this stuff?
On Fri, 7 Nov 2025 at 10:04, Peter Zijlstra <peterz@infradead.org> wrote: > > On Fri, Oct 31, 2025 at 11:12:53AM -0700, Dave Hansen wrote: > > > But there's a pretty broad set of things that are for "security" that > > aren't necessary while you're just running trusted ring0 code: > > > > * SMAP/SMEP > > * CR pinning itself > > * MSR_IA32_SPEC_CTRL > > * MSR_IA32_TSX_CTRL > > > > They just haven't mattered until now because they don't have any > > practical effect until you actually have code running on _PAGE_USER > > mappings trying to attack the kernel. > > But that's just the thing EFI is *NOT* trusted! We're basically > disabling all security features (not listed above are CET and CFI) to > run this random garbage we have no control over. > > How about we just flat out refuse EFI runtime services? What are they > actually needed for? Why are we bending over backwards and subverting > our security for this stuff? On x86, it is mostly the EFI variable services that user space has come to rely on, not only for setting the boot path (which typically happens only once at installation time, when the path to GRUB is set as the first boot option). Unfortunately, the systemd folks have taken a liking to this feature too, and have started storing things in there. There is also PRM, which is much worse, as it permits devices in the ACPI namespace to call firmware routines that are mapped privileged in the OS address space in the same way. I objected to this at the time, and asked for a facility where we could at least mark such code as unprivileged (and run it as such) but this was ignored, as Intel and MS had already sealed the deal and put this into production. This is much worse than typical EFI routines, as the PRM code is ODM/OEM code rather than something that comes from the upstream EFI implementation. It is basically a dumping ground for code that used to run in SMM because it was too ugly to run anywhere else. </rant> It would be nice if we could a) Get rid of SetVirtualAddressMap(), which is another insane hack that should never have been supported on 64-bit systems. On arm64, we no longer call it unless there is a specific need for it (some Ampere Altra systems with buggy firmware will crash otherwise). On x86, though, it might be tricky because there so much buggy firmware. Perhaps we should phase it out by checking for the UEFI version, so that future systems will avoid it. This would mean, however, that EFI code remains in the low user address space, which may not be what you want (unless we do c) perhaps?) b) Run EFI runtime calls in a sandbox VM - there was a PoC implemented for arm64 a couple of years ago, but it was very intrusive and the ARM intern in question went on to do more satisyfing work. c) Unmap the kernel KPTI style while the runtime calls are in progress? This should be rather straight-forward, although it might not help a lot as the code in question still runs privileged.
On Fri, Nov 07, 2025 at 10:22:30AM +0100, Ard Biesheuvel wrote: > There is also PRM, which is much worse, as it permits devices in the > ACPI namespace to call firmware routines that are mapped privileged in > the OS address space in the same way. I objected to this at the time, > and asked for a facility where we could at least mark such code as > unprivileged (and run it as such) but this was ignored, as Intel and > MS had already sealed the deal and put this into production. This is > much worse than typical EFI routines, as the PRM code is ODM/OEM code > rather than something that comes from the upstream EFI implementation. > It is basically a dumping ground for code that used to run in SMM > because it was too ugly to run anywhere else. </rant> 'https://uefi.org/sites/default/files/resources/Platform Runtime Mechanism - with legal notice.pdf' Has on page 16, section 3.1: 8. PRM handlers must not contain any privileged instructions. So we should be able to actually run this crap in ring3, right?
On Fri, 7 Nov 2025 at 11:10, Peter Zijlstra <peterz@infradead.org> wrote: > > On Fri, Nov 07, 2025 at 10:22:30AM +0100, Ard Biesheuvel wrote: > > > There is also PRM, which is much worse, as it permits devices in the > > ACPI namespace to call firmware routines that are mapped privileged in > > the OS address space in the same way. I objected to this at the time, > > and asked for a facility where we could at least mark such code as > > unprivileged (and run it as such) but this was ignored, as Intel and > > MS had already sealed the deal and put this into production. This is > > much worse than typical EFI routines, as the PRM code is ODM/OEM code > > rather than something that comes from the upstream EFI implementation. > > It is basically a dumping ground for code that used to run in SMM > > because it was too ugly to run anywhere else. </rant> > > 'https://uefi.org/sites/default/files/resources/Platform Runtime Mechanism - with legal notice.pdf' > > Has on page 16, section 3.1: > > 8. PRM handlers must not contain any privileged instructions. > > So we should be able to actually run this crap in ring3, right? How interesting! This wasn't in the draft that I reviewed at the time, so someone did listen. So it does seem feasible to drop privileges and reacquire them in principle, as long as we ensure that all the memory touched by the PRM services (stack, code, data, MMIO regions) is mapped appropriately in the EFI memory map.
On Fri, Nov 07, 2025 at 10:22:30AM +0100, Ard Biesheuvel wrote: > > But that's just the thing EFI is *NOT* trusted! We're basically > > disabling all security features (not listed above are CET and CFI) to > > run this random garbage we have no control over. > > > > How about we just flat out refuse EFI runtime services? What are they > > actually needed for? Why are we bending over backwards and subverting > > our security for this stuff? > > On x86, it is mostly the EFI variable services that user space has > come to rely on, not only for setting the boot path (which typically > happens only once at installation time, when the path to GRUB is set > as the first boot option). Unfortunately, the systemd folks have taken > a liking to this feature too, and have started storing things in > there. *groan*, so booting with noefi (I just went and found that option) will cause a modern Linux system to fail booting? > There is also PRM, which is much worse, as it permits devices in the > ACPI namespace to call firmware routines that are mapped privileged in > the OS address space in the same way. I objected to this at the time, > and asked for a facility where we could at least mark such code as > unprivileged (and run it as such) but this was ignored, as Intel and > MS had already sealed the deal and put this into production. This is > much worse than typical EFI routines, as the PRM code is ODM/OEM code > rather than something that comes from the upstream EFI implementation. > It is basically a dumping ground for code that used to run in SMM > because it was too ugly to run anywhere else. </rant> What the actual fuck!! And we support this garbage? Without pr_err(FW_BUG ) notification? How can one find such devices? I need to check my machine. > It would be nice if we could > > a) Get rid of SetVirtualAddressMap(), which is another insane hack > that should never have been supported on 64-bit systems. On arm64, we > no longer call it unless there is a specific need for it (some Ampere > Altra systems with buggy firmware will crash otherwise). On x86, > though, it might be tricky because there so much buggy firmware. > Perhaps we should phase it out by checking for the UEFI version, so > that future systems will avoid it. This would mean, however, that EFI > code remains in the low user address space, which may not be what you > want (unless we do c) perhaps?) > > b) Run EFI runtime calls in a sandbox VM - there was a PoC implemented > for arm64 a couple of years ago, but it was very intrusive and the ARM > intern in question went on to do more satisyfing work. > > c) Unmap the kernel KPTI style while the runtime calls are in > progress? This should be rather straight-forward, although it might > not help a lot as the code in question still runs privileged. At the very least I think we should start printing scary messages about disabling security to run untrusted code. This is all quite insane :/
On Fri, 7 Nov 2025 at 10:40, Peter Zijlstra <peterz@infradead.org> wrote: > > On Fri, Nov 07, 2025 at 10:22:30AM +0100, Ard Biesheuvel wrote: > > > > But that's just the thing EFI is *NOT* trusted! We're basically > > > disabling all security features (not listed above are CET and CFI) to > > > run this random garbage we have no control over. > > > > > > How about we just flat out refuse EFI runtime services? What are they > > > actually needed for? Why are we bending over backwards and subverting > > > our security for this stuff? > > > > On x86, it is mostly the EFI variable services that user space has > > come to rely on, not only for setting the boot path (which typically > > happens only once at installation time, when the path to GRUB is set > > as the first boot option). Unfortunately, the systemd folks have taken > > a liking to this feature too, and have started storing things in > > there. > > *groan*, so booting with noefi (I just went and found that option) will > cause a modern Linux system to fail booting? > As long as you install with EFI enabled, the impact of efi=noruntime should be limited, given that x86 does not rely on EFI runtime services for the RTC or for reboot/poweroff. But you will lose access to the EFI variable store. (Not sure what 'noefi' does in comparison, but keeping EFI enabled at boot time for things like secure/measured boot and storage encryption will probably result in a net positive impact on security/hardening as long as you avoid calling into the firmware after boot) > > There is also PRM, which is much worse, as it permits devices in the > > ACPI namespace to call firmware routines that are mapped privileged in > > the OS address space in the same way. I objected to this at the time, > > and asked for a facility where we could at least mark such code as > > unprivileged (and run it as such) but this was ignored, as Intel and > > MS had already sealed the deal and put this into production. This is > > much worse than typical EFI routines, as the PRM code is ODM/OEM code > > rather than something that comes from the upstream EFI implementation. > > It is basically a dumping ground for code that used to run in SMM > > because it was too ugly to run anywhere else. </rant> > > What the actual fuck!! And we support this garbage? Without > pr_err(FW_BUG ) notification? > > How can one find such devices? I need to check my machine. > Unless you have a PRMT table in the list of ACPI tables, your system shouldn't be affected by this. > > It would be nice if we could > > > > a) Get rid of SetVirtualAddressMap(), which is another insane hack > > that should never have been supported on 64-bit systems. On arm64, we > > no longer call it unless there is a specific need for it (some Ampere > > Altra systems with buggy firmware will crash otherwise). On x86, > > though, it might be tricky because there so much buggy firmware. > > Perhaps we should phase it out by checking for the UEFI version, so > > that future systems will avoid it. This would mean, however, that EFI > > code remains in the low user address space, which may not be what you > > want (unless we do c) perhaps?) > > > > b) Run EFI runtime calls in a sandbox VM - there was a PoC implemented > > for arm64 a couple of years ago, but it was very intrusive and the ARM > > intern in question went on to do more satisyfing work. > > > > c) Unmap the kernel KPTI style while the runtime calls are in > > progress? This should be rather straight-forward, although it might > > not help a lot as the code in question still runs privileged. > > At the very least I think we should start printing scary messages about > disabling security to run untrusted code. This is all quite insane :/ I agree in principle. However, calling it 'untrusted' is a bit misleading here, given that you already rely on the same body of code to boot your computer to begin with. I.e., if you suspect that the code in question is conspiring against you, not calling it at runtime to manipulate EFI variables is not going to help with that. But from a robustness point of view, I agree - running vendor code at the OS's privilege level at runtime that was only tested with Windows is not great for stability, and it would be nice if we could leverage the principle of least privilege and only permit it to access the things that it actually needs to perform the task that we've asked it to. This is why I asked for the ability to mark PRM services as unprivileged, given that they typically only run some code and perhaps poke some memory (either RAM or MMIO registers) that the OS never accesses directly. Question is though whether on x86, sandboxing is feasible: can VMs call into SMM? Because that is where 95% of the EFI variable services logic resides - the code running directly under the OS does very little other than marshalling the arguments and passing them on.
On Fri, Nov 07, 2025 at 11:09:44AM +0100, Ard Biesheuvel wrote: > As long as you install with EFI enabled, the impact of efi=noruntime > should be limited, given that x86 does not rely on EFI runtime > services for the RTC or for reboot/poweroff. But you will lose access > to the EFI variable store. (Not sure what 'noefi' does in comparison, > but keeping EFI enabled at boot time for things like secure/measured > boot and storage encryption will probably result in a net positive > impact on security/hardening as long as you avoid calling into the > firmware after boot) I would say it should all stay before we start userspace, because that's where our trust boundary is. We definitely do not trust userspace. Also, if they all think this is 'important' why not provide native drivers for this service? > > At the very least I think we should start printing scary messages about > > disabling security to run untrusted code. This is all quite insane :/ > > I agree in principle. However, calling it 'untrusted' is a bit > misleading here, given that you already rely on the same body of code > to boot your computer to begin with. That PRM stuff really doesn't sound like its needed to boot. And it sounds like it really should be part of the normal Linux driver, but isn't for $corp reasons or something. > I.e., if you suspect that the > code in question is conspiring against you, not calling it at runtime > to manipulate EFI variables is not going to help with that. Well, the problem is the disabling of all the hardware and software security measures to run this crap. This makes it a prime target to take over stuff. Also, while EFI code might be good enough to boot the machine, using it at runtime is a whole different league of security. What if they have a 'bug' in the variable name parser and a variable named "NSAWantsAccess" gets you a buffer overflow and random code execution. Trusting it to boot the machine and trusting it to be safe for general runtime are two very different things. > Question is though whether on x86, sandboxing is feasible: can VMs > call into SMM? Because that is where 95% of the EFI variable services > logic resides - the code running directly under the OS does very > little other than marshalling the arguments and passing them on. I just read in that PRM document that they *REALLY* want to get away from SMM because it freezes all CPUs in the system for the duration of the SMI. So this variable crud being in SMM would be inconsistent. Anyway, I'm all for very aggressive runtime warnings and pushing vendors that object to provide native drivers. I don't believe there is any real technical reason for any of this.
On November 7, 2025 1:22:30 AM PST, Ard Biesheuvel <ardb@kernel.org> wrote: >On Fri, 7 Nov 2025 at 10:04, Peter Zijlstra <peterz@infradead.org> wrote: >> >> On Fri, Oct 31, 2025 at 11:12:53AM -0700, Dave Hansen wrote: >> >> > But there's a pretty broad set of things that are for "security" that >> > aren't necessary while you're just running trusted ring0 code: >> > >> > * SMAP/SMEP >> > * CR pinning itself >> > * MSR_IA32_SPEC_CTRL >> > * MSR_IA32_TSX_CTRL >> > >> > They just haven't mattered until now because they don't have any >> > practical effect until you actually have code running on _PAGE_USER >> > mappings trying to attack the kernel. >> >> But that's just the thing EFI is *NOT* trusted! We're basically >> disabling all security features (not listed above are CET and CFI) to >> run this random garbage we have no control over. >> >> How about we just flat out refuse EFI runtime services? What are they >> actually needed for? Why are we bending over backwards and subverting >> our security for this stuff? > >On x86, it is mostly the EFI variable services that user space has >come to rely on, not only for setting the boot path (which typically >happens only once at installation time, when the path to GRUB is set >as the first boot option). Unfortunately, the systemd folks have taken >a liking to this feature too, and have started storing things in >there. > >There is also PRM, which is much worse, as it permits devices in the >ACPI namespace to call firmware routines that are mapped privileged in >the OS address space in the same way. I objected to this at the time, >and asked for a facility where we could at least mark such code as >unprivileged (and run it as such) but this was ignored, as Intel and >MS had already sealed the deal and put this into production. This is >much worse than typical EFI routines, as the PRM code is ODM/OEM code >rather than something that comes from the upstream EFI implementation. >It is basically a dumping ground for code that used to run in SMM >because it was too ugly to run anywhere else. </rant> > >It would be nice if we could > >a) Get rid of SetVirtualAddressMap(), which is another insane hack >that should never have been supported on 64-bit systems. On arm64, we >no longer call it unless there is a specific need for it (some Ampere >Altra systems with buggy firmware will crash otherwise). On x86, >though, it might be tricky because there so much buggy firmware. >Perhaps we should phase it out by checking for the UEFI version, so >that future systems will avoid it. This would mean, however, that EFI >code remains in the low user address space, which may not be what you >want (unless we do c) perhaps?) > >b) Run EFI runtime calls in a sandbox VM - there was a PoC implemented >for arm64 a couple of years ago, but it was very intrusive and the ARM >intern in question went on to do more satisyfing work. > >c) Unmap the kernel KPTI style while the runtime calls are in >progress? This should be rather straight-forward, although it might >not help a lot as the code in question still runs privileged. Firmware update is a big one.
On Fri, 7 Nov 2025 at 10:27, H. Peter Anvin <hpa@zytor.com> wrote: > > On November 7, 2025 1:22:30 AM PST, Ard Biesheuvel <ardb@kernel.org> wrote: > >On Fri, 7 Nov 2025 at 10:04, Peter Zijlstra <peterz@infradead.org> wrote: > >> > >> On Fri, Oct 31, 2025 at 11:12:53AM -0700, Dave Hansen wrote: > >> > >> > But there's a pretty broad set of things that are for "security" that > >> > aren't necessary while you're just running trusted ring0 code: > >> > > >> > * SMAP/SMEP > >> > * CR pinning itself > >> > * MSR_IA32_SPEC_CTRL > >> > * MSR_IA32_TSX_CTRL > >> > > >> > They just haven't mattered until now because they don't have any > >> > practical effect until you actually have code running on _PAGE_USER > >> > mappings trying to attack the kernel. > >> > >> But that's just the thing EFI is *NOT* trusted! We're basically > >> disabling all security features (not listed above are CET and CFI) to > >> run this random garbage we have no control over. > >> > >> How about we just flat out refuse EFI runtime services? What are they > >> actually needed for? Why are we bending over backwards and subverting > >> our security for this stuff? > > > >On x86, it is mostly the EFI variable services that user space has > >come to rely on, not only for setting the boot path (which typically > >happens only once at installation time, when the path to GRUB is set > >as the first boot option). Unfortunately, the systemd folks have taken > >a liking to this feature too, and have started storing things in > >there. > > > >There is also PRM, which is much worse, as it permits devices in the > >ACPI namespace to call firmware routines that are mapped privileged in > >the OS address space in the same way. I objected to this at the time, > >and asked for a facility where we could at least mark such code as > >unprivileged (and run it as such) but this was ignored, as Intel and > >MS had already sealed the deal and put this into production. This is > >much worse than typical EFI routines, as the PRM code is ODM/OEM code > >rather than something that comes from the upstream EFI implementation. > >It is basically a dumping ground for code that used to run in SMM > >because it was too ugly to run anywhere else. </rant> > > > >It would be nice if we could > > > >a) Get rid of SetVirtualAddressMap(), which is another insane hack > >that should never have been supported on 64-bit systems. On arm64, we > >no longer call it unless there is a specific need for it (some Ampere > >Altra systems with buggy firmware will crash otherwise). On x86, > >though, it might be tricky because there so much buggy firmware. > >Perhaps we should phase it out by checking for the UEFI version, so > >that future systems will avoid it. This would mean, however, that EFI > >code remains in the low user address space, which may not be what you > >want (unless we do c) perhaps?) > > > >b) Run EFI runtime calls in a sandbox VM - there was a PoC implemented > >for arm64 a couple of years ago, but it was very intrusive and the ARM > >intern in question went on to do more satisyfing work. > > > >c) Unmap the kernel KPTI style while the runtime calls are in > >progress? This should be rather straight-forward, although it might > >not help a lot as the code in question still runs privileged. > > Firmware update is a big one. Firmware update does not run under the OS.
© 2016 - 2025 Red Hat, Inc.