Currently memremap(MEMREMAP_WB) produces decrypted/shared mapping:
memremap(MEMREMAP_WB)
arch_memremap_wb()
ioremap_cache()
__ioremap_caller(.encrytped = false)
It is a bad default. On TDX guests, access via shared mapping can be
destructive[1].
Kernel already provides a way to request decrypted mapping explicitly
via MEMREMAP_DEC flag.
Make memremap(MEMREMAP_WB) produce encrypted/private mapping by default
unless MEMREMAP_DEC is specified.
It fixes crash on kexec in TDX guests if CONFIG_EISA is enabled.
[1] https://lore.kernel.org/all/20240822095122.736522-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Ashish Kalra <ashish.kalra@amd.com>
Cc: "Maciej W. Rozycki" <macro@orcam.me.uk>
---
arch/x86/include/asm/io.h | 3 +++
arch/x86/mm/ioremap.c | 8 ++++++++
2 files changed, 11 insertions(+)
diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 1d60427379c9..1a3a34b40598 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -180,6 +180,9 @@ extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long size, un
extern void __iomem *ioremap_encrypted(resource_size_t phys_addr, unsigned long size);
#define ioremap_encrypted ioremap_encrypted
+void *arch_memremap_wb(phys_addr_t phys_addr, size_t size, unsigned long flags);
+#define arch_memremap_wb arch_memremap_wb
+
/**
* ioremap - map bus memory into CPU space
* @offset: bus address of the memory
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 70b02fc61d93..f04df30f0a27 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -503,6 +503,14 @@ void iounmap(volatile void __iomem *addr)
}
EXPORT_SYMBOL(iounmap);
+void *arch_memremap_wb(phys_addr_t phys_addr, size_t size, unsigned long flags)
+{
+ if (flags & MEMREMAP_DEC)
+ return (void __force *)ioremap_cache(phys_addr, size);
+
+ return (void __force *)ioremap_encrypted(phys_addr, size);
+}
+
/*
* Convert a physical pointer to a virtual kernel pointer for /dev/mem
* access
--
2.45.2
On Mon, Oct 21, 2024 at 01:57:23PM +0300, Kirill A. Shutemov wrote:
> It fixes crash on kexec in TDX guests if CONFIG_EISA is enabled.
Do TDX guests even need EISA?
"The EISA bus saw limited use between 1988 and 1995 when it was made obsolete
by the PCI bus."
Sounds like no to me.
We usually simply stop coco guests from touching such legacy ranges:
f30470c190c2 ("x86/boot: Skip video memory access in the decompressor for SEV-ES/SNP")
so can you do that too?
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Mon, Nov 18, 2024 at 05:46:16PM +0100, Borislav Petkov wrote:
> On Mon, Oct 21, 2024 at 01:57:23PM +0300, Kirill A. Shutemov wrote:
> > It fixes crash on kexec in TDX guests if CONFIG_EISA is enabled.
>
> Do TDX guests even need EISA?
>
> "The EISA bus saw limited use between 1988 and 1995 when it was made obsolete
> by the PCI bus."
>
> Sounds like no to me.
>
> We usually simply stop coco guests from touching such legacy ranges:
>
> f30470c190c2 ("x86/boot: Skip video memory access in the decompressor for SEV-ES/SNP")
>
> so can you do that too?
Sure, we can workaround every place that touches such ranges. Or we can
address problem at the root and make creating decrypted/shared mappings
explicit.
Such mappings have both functional (as we see here) and security
implications (VMM can manipulate the guest memory range). We should not
create decrypted mappings by default on legacy interfaces.
--
Kiryl Shutsemau / Kirill A. Shutemov
On Tue, Nov 19, 2024 at 10:21:05AM +0200, Kirill A. Shutemov wrote:
> Sure, we can workaround every place that touches such ranges.
Every place? Which every place? I thought this is only an EISA issue?
Then clearly your changelogs need to expand considerably more what we're
*really* addressing here.
> Or we can address problem at the root and make creating decrypted/shared
> mappings explicit.
What is the problem? That KVM implicitly converts memory to shared? Why does
KVM do that an can it be fixed not to?
Doesn't sound like the guest's problem.
Or maybe this needs a lot more explanation what we're fixing here.
> Such mappings have both functional (as we see here) and security
> implications (VMM can manipulate the guest memory range). We should not
> create decrypted mappings by default on legacy interfaces.
So we're getting closer.
The changes themselves are fine but your text is missing a lot about what
we're fixing here. When I asked, I barely scratched the surface. So can we
elaborate here pls?
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Thu, Nov 21, 2024 at 12:49:52PM +0100, Borislav Petkov wrote:
> On Tue, Nov 19, 2024 at 10:21:05AM +0200, Kirill A. Shutemov wrote:
> > Sure, we can workaround every place that touches such ranges.
>
> Every place? Which every place? I thought this is only an EISA issue?
I looked at other places where we call memremap(MEMREMAP_WB) such as
acpi_wakeup_cpu(). We actually get encrypted/private mapping for this
callsite despite __ioremap_caller() being called encrypted == false.
This happens because of IORES_MAP_ENCRYPTED check in __ioremap_caller().
So we depend on the BIOS here. The EISA problem happens because the
target memory is in !IORES_MAP_ENCRYPTED memory.
It's hard to say if any other memremap(MEMREMAP_WB) would trigger the
issue. And what will happen after next BIOS update.
> Then clearly your changelogs need to expand considerably more what we're
> *really* addressing here.
>
> > Or we can address problem at the root and make creating decrypted/shared
> > mappings explicit.
>
> What is the problem? That KVM implicitly converts memory to shared? Why does
> KVM do that an can it be fixed not to?
>
> Doesn't sound like the guest's problem.
Well, the problem is on the both sides.
VMM behaviour on such accesses is not specified in any spec. AFAIK all
current VMM implementations do this implicit conversion.
I think it has to be fixed. VMMs (not only KVM) should not silently
convert memory to shared. But VMMs cannot make memory access to go away.
The only option they have is to inject #VE instead indicating bogus
access. At this point it becomes a guest problem.
It will get fixed in VMMs naturally when TDX Connect gets enabled.
With a secure device assigned to a TD, VMM would loose the ability to
convert memory on its own. The guest would have to unlock the memory
first. This will make implicit conversion impossible.
But it also means guest should never initiate shared access without
explicit conversion. Otherwise #VE will crash it.
> Or maybe this needs a lot more explanation what we're fixing here.
>
> > Such mappings have both functional (as we see here) and security
> > implications (VMM can manipulate the guest memory range). We should not
> > create decrypted mappings by default on legacy interfaces.
>
> So we're getting closer.
>
> The changes themselves are fine but your text is missing a lot about what
> we're fixing here. When I asked, I barely scratched the surface. So can we
> elaborate here pls?
What about this:
x86/mm: Make memremap(MEMREMAP_WB) map memory as encrypted by default
Currently memremap(MEMREMAP_WB) can produce decrypted/shared mapping:
memremap(MEMREMAP_WB)
arch_memremap_wb()
ioremap_cache()
__ioremap_caller(.encrytped = false)
In such cases, the IORES_MAP_ENCRYPTED flag on the memory will determine
if the resulting mapping is encrypted or decrypted.
Creating a decrypted mapping without explicit request from the caller is
risky:
- It can inadvertently expose the guest's data and compromise the
guest.
- Accessing private memory via shared/decrypted mapping on TDX will
either trigger implicit conversion to shared or #VE (depending on
VMM implementation).
Implicit conversion is destructive: subsequent access to the same
memory via private mapping will trigger a hard-to-debug #VE crash.
The kernel already provides a way to request decrypted mapping
explicitly via the MEMREMAP_DEC flag.
Modify memremap(MEMREMAP_WB) to produce encrypted/private mapping by
default unless MEMREMAP_DEC is specified.
This change fixes the crash on kexec in TDX guests if CONFIG_EISA is
enabled.
--
Kiryl Shutsemau / Kirill A. Shutemov
Borislav, Is there anything I need to do to get this patchset merged? -- Kiryl Shutsemau / Kirill A. Shutemov
On Mon, Jan 13, 2025 at 12:03:22PM +0200, Kirill A. Shutemov wrote:
> Borislav,
>
> Is there anything I need to do to get this patchset merged?
Your updated commit message here:
https://lore.kernel.org/r/bz3vlh3lri7yfckdkddopwhsgvkmizhw5q6ecomgeba7q22ufp@ntu2kugiho4o
Reads better, yes.
> This change fixes the crash on kexec in TDX guests if CONFIG_EISA is
> enabled.
s/This change fixes/Fix/
In imperative tone pls.
And that sentence begs the question: does this need to go to stable?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Mon, Jan 13, 2025 at 11:32:38AM +0100, Borislav Petkov wrote: > On Mon, Jan 13, 2025 at 12:03:22PM +0200, Kirill A. Shutemov wrote: > > Borislav, > > > > Is there anything I need to do to get this patchset merged? > > Your updated commit message here: > > https://lore.kernel.org/r/bz3vlh3lri7yfckdkddopwhsgvkmizhw5q6ecomgeba7q22ufp@ntu2kugiho4o > > Reads better, yes. > > > This change fixes the crash on kexec in TDX guests if CONFIG_EISA is > > enabled. > > s/This change fixes/Fix/ > > In imperative tone pls. Okay. > And that sentence begs the question: does this need to go to stable? Right. It is hard to pin down Fixes commit as the bug is inherited from initial encrypted/decrypted handling implementation, but it is not useful to backport it all the way. I'll add Cc: stable@vger.kernel.org # 6.11+ 6.11 is where kexec handling for TDX was introduced. -- Kiryl Shutsemau / Kirill A. Shutemov
On 12/10/24 07:26, Kirill A. Shutemov wrote: > On Thu, Nov 21, 2024 at 12:49:52PM +0100, Borislav Petkov wrote: >> On Tue, Nov 19, 2024 at 10:21:05AM +0200, Kirill A. Shutemov wrote: >>> Sure, we can workaround every place that touches such ranges. >> >> Every place? Which every place? I thought this is only an EISA issue? > > I looked at other places where we call memremap(MEMREMAP_WB) such as > acpi_wakeup_cpu(). We actually get encrypted/private mapping for this > callsite despite __ioremap_caller() being called encrypted == false. > This happens because of IORES_MAP_ENCRYPTED check in __ioremap_caller(). > > So we depend on the BIOS here. The EISA problem happens because the > target memory is in !IORES_MAP_ENCRYPTED memory. > > It's hard to say if any other memremap(MEMREMAP_WB) would trigger the > issue. And what will happen after next BIOS update. > >> Then clearly your changelogs need to expand considerably more what we're >> *really* addressing here. >> >>> Or we can address problem at the root and make creating decrypted/shared >>> mappings explicit. >> >> What is the problem? That KVM implicitly converts memory to shared? Why does >> KVM do that an can it be fixed not to? >> >> Doesn't sound like the guest's problem. > > Well, the problem is on the both sides. > > VMM behaviour on such accesses is not specified in any spec. AFAIK all > current VMM implementations do this implicit conversion. > > I think it has to be fixed. VMMs (not only KVM) should not silently > convert memory to shared. But VMMs cannot make memory access to go away. > The only option they have is to inject #VE instead indicating bogus > access. At this point it becomes a guest problem. > > It will get fixed in VMMs naturally when TDX Connect gets enabled. > With a secure device assigned to a TD, VMM would loose the ability to > convert memory on its own. The guest would have to unlock the memory > first. This will make implicit conversion impossible. > > But it also means guest should never initiate shared access without > explicit conversion. Otherwise #VE will crash it. > >> Or maybe this needs a lot more explanation what we're fixing here. >> >>> Such mappings have both functional (as we see here) and security >>> implications (VMM can manipulate the guest memory range). We should not >>> create decrypted mappings by default on legacy interfaces. >> >> So we're getting closer. >> >> The changes themselves are fine but your text is missing a lot about what >> we're fixing here. When I asked, I barely scratched the surface. So can we >> elaborate here pls? > > What about this: > > x86/mm: Make memremap(MEMREMAP_WB) map memory as encrypted by default > > Currently memremap(MEMREMAP_WB) can produce decrypted/shared mapping: > > memremap(MEMREMAP_WB) > arch_memremap_wb() > ioremap_cache() > __ioremap_caller(.encrytped = false) That's because try_ram_remap() invokes arch_memremap_can_ram_remap() which is returning false (for some reason). When arch_memremap_can_ram_remap() returns false, ioremap_cache() is invoked. ioremap() should provide shared mappings unless specifically requested to provide an encrypted mapping (via encrypted parameter) or if __ioremap_check_mem() determines that an encrypted mapping is needed. Can logic be added to arch_memremap_can_ram_remap() to return true for the cases that TDX is having issues with? Thanks, Tom > > In such cases, the IORES_MAP_ENCRYPTED flag on the memory will determine > if the resulting mapping is encrypted or decrypted. > > Creating a decrypted mapping without explicit request from the caller is > risky: > > - It can inadvertently expose the guest's data and compromise the > guest. > > - Accessing private memory via shared/decrypted mapping on TDX will > either trigger implicit conversion to shared or #VE (depending on > VMM implementation). > > Implicit conversion is destructive: subsequent access to the same > memory via private mapping will trigger a hard-to-debug #VE crash. > > The kernel already provides a way to request decrypted mapping > explicitly via the MEMREMAP_DEC flag. > > Modify memremap(MEMREMAP_WB) to produce encrypted/private mapping by > default unless MEMREMAP_DEC is specified. > > This change fixes the crash on kexec in TDX guests if CONFIG_EISA is > enabled. >
On Thu, Dec 12, 2024 at 08:23:07AM -0600, Tom Lendacky wrote: > On 12/10/24 07:26, Kirill A. Shutemov wrote: > > On Thu, Nov 21, 2024 at 12:49:52PM +0100, Borislav Petkov wrote: > >> On Tue, Nov 19, 2024 at 10:21:05AM +0200, Kirill A. Shutemov wrote: > >>> Sure, we can workaround every place that touches such ranges. > >> > >> Every place? Which every place? I thought this is only an EISA issue? > > > > I looked at other places where we call memremap(MEMREMAP_WB) such as > > acpi_wakeup_cpu(). We actually get encrypted/private mapping for this > > callsite despite __ioremap_caller() being called encrypted == false. > > This happens because of IORES_MAP_ENCRYPTED check in __ioremap_caller(). > > > > So we depend on the BIOS here. The EISA problem happens because the > > target memory is in !IORES_MAP_ENCRYPTED memory. > > > > It's hard to say if any other memremap(MEMREMAP_WB) would trigger the > > issue. And what will happen after next BIOS update. > > > >> Then clearly your changelogs need to expand considerably more what we're > >> *really* addressing here. > >> > >>> Or we can address problem at the root and make creating decrypted/shared > >>> mappings explicit. > >> > >> What is the problem? That KVM implicitly converts memory to shared? Why does > >> KVM do that an can it be fixed not to? > >> > >> Doesn't sound like the guest's problem. > > > > Well, the problem is on the both sides. > > > > VMM behaviour on such accesses is not specified in any spec. AFAIK all > > current VMM implementations do this implicit conversion. > > > > I think it has to be fixed. VMMs (not only KVM) should not silently > > convert memory to shared. But VMMs cannot make memory access to go away. > > The only option they have is to inject #VE instead indicating bogus > > access. At this point it becomes a guest problem. > > > > It will get fixed in VMMs naturally when TDX Connect gets enabled. > > With a secure device assigned to a TD, VMM would loose the ability to > > convert memory on its own. The guest would have to unlock the memory > > first. This will make implicit conversion impossible. > > > > But it also means guest should never initiate shared access without > > explicit conversion. Otherwise #VE will crash it. > > > >> Or maybe this needs a lot more explanation what we're fixing here. > >> > >>> Such mappings have both functional (as we see here) and security > >>> implications (VMM can manipulate the guest memory range). We should not > >>> create decrypted mappings by default on legacy interfaces. > >> > >> So we're getting closer. > >> > >> The changes themselves are fine but your text is missing a lot about what > >> we're fixing here. When I asked, I barely scratched the surface. So can we > >> elaborate here pls? > > > > What about this: > > > > x86/mm: Make memremap(MEMREMAP_WB) map memory as encrypted by default > > > > Currently memremap(MEMREMAP_WB) can produce decrypted/shared mapping: > > > > memremap(MEMREMAP_WB) > > arch_memremap_wb() > > ioremap_cache() > > __ioremap_caller(.encrytped = false) > > That's because try_ram_remap() invokes arch_memremap_can_ram_remap() > which is returning false (for some reason). In EISA case try_ram_remap() is not called. is_ram is REGION_DISJOINT > When arch_memremap_can_ram_remap() returns false, ioremap_cache() is > invoked. ioremap() should provide shared mappings unless specifically > requested to provide an encrypted mapping (via encrypted parameter) or > if __ioremap_check_mem() determines that an encrypted mapping is needed. I don't propose changing ioremap() behaviour. It's about memremap(). > Can logic be added to arch_memremap_can_ram_remap() to return true for > the cases that TDX is having issues with? It will not help EISA case, because we don't get this path. -- Kiryl Shutsemau / Kirill A. Shutemov
© 2016 - 2026 Red Hat, Inc.