[v4] x86/snp: Add kexec support

[PATCH v4 0/4] x86/snp: Add kexec support

Posted by Ashish Kalra 1 year, 10 months ago

From: Ashish Kalra <ashish.kalra@amd.com>

The patchset adds bits and pieces to get kexec (and crashkernel) work on
SNP guest.

v4:
- Rebased to current tip/master.
- Reviewed-bys from Sathya.
- Remove snp_kexec_unprep_rom_memory() as it is not needed any more as 
  SEV-SNP code is not validating the ROM range in probe_roms() anymore.
- Fix kernel test robot build error/warnings.

v3:
- Rebased;
- moved Keep page tables that maps E820_TYPE_ACPI patch to Kirill's tdx
  guest kexec patch series.
- checking the md attribute instead of checking the efi_setup for
  detecting if running under kexec kernel.
- added new sev_es_enabled() function.
- skip video memory access in decompressor for SEV-ES/SNP systems to 
  prevent guest termination as boot stage2 #VC handler does not handle
  MMIO.

v2:
- address zeroing of unaccepted memory table mappings at all page table levels
  adding phys_pte_init(), phys_pud_init() and phys_p4d_init().
- include skip efi_arch_mem_reserve() in case of kexec as part of this 
  patch set.
- rename last_address_shd_kexec to a more appropriate 
  kexec_last_address_to_make_private.
- remove duplicate code shared with TDX and use common interfaces
  defined for SNP and TDX for kexec/kdump.
- remove set_pte_enc() dependency on pg_level_to_pfn() and make the 
  function simpler.
- rename unshare_pte() to make_pte_private().
- clarify and make the comment for using kexec_last_address_to_make_private  
  more understandable.
- general cleanup. 

Ashish Kalra (4):
  efi/x86: skip efi_arch_mem_reserve() in case of kexec.
  x86/sev: add sev_es_enabled() function.
  x86/boot/compressed: Skip Video Memory access in Decompressor for
    SEV-ES/SNP.
  x86/snp: Convert shared memory back to private on kexec

 arch/x86/boot/compressed/misc.c |   6 +-
 arch/x86/boot/compressed/misc.h |   1 +
 arch/x86/boot/compressed/sev.c  |   5 +
 arch/x86/boot/compressed/sev.h  |   2 +
 arch/x86/include/asm/sev.h      |   4 +
 arch/x86/kernel/sev.c           | 161 ++++++++++++++++++++++++++++++++
 arch/x86/mm/mem_encrypt_amd.c   |   3 +
 arch/x86/platform/efi/quirks.c  |  20 +++-
 8 files changed, 198 insertions(+), 4 deletions(-)

-- 
2.34.1

Re: [PATCH v4 0/4] x86/snp: Add kexec support

Posted by Alexander Graf 1 year, 9 months ago

Hey Ashish,

On 09.04.24 22:42, Ashish Kalra wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
>
> The patchset adds bits and pieces to get kexec (and crashkernel) work on
> SNP guest.

With this patch set (and similar for the TDX one), you enable the 
typical kdump case, which is great!

However, if a user is running with direct kernel boot - which is very 
typical in SEV-SNP setup, especially for Kata Containers and similar - 
the initial launch measurement is a natural indicator of the target 
environment. Kexec basically allows them to completely bypass that: You 
would be able to run a completely different environment than the one you 
measure through the launch digest. I'm not sure it's a good idea to even 
allow that by default in CoCo environments - at least not if the kernel 
is locked down.

Do you have any plans to build a CoCo native kexec where you allow a VM 
to create a new VM context with a guest provided seed? The new context 
could rerun all of the attestation and so enable users to generate a new 
launch digest. If you then atomically swap into the new context, it 
would in turn enable them to natively "kexec" into a completely new VM 
context including measurements.

I understand that an SVSM + TPM implementation may help to some extent 
here by integrating with IMA and adding the new kernel into the IMA log. 
But that quickly becomes very convoluted (hence difficult to assess 
correctness for) and the same measurement question arises just one level 
up then: How do you update your SVSM while maintaining a full 
measurement and trust chain?

Thanks,

Alex

Re: [PATCH v4 0/4] x86/snp: Add kexec support

Posted by Kalra, Ashish 1 year, 9 months ago

Hello Alexander,

On 5/2/2024 7:01 AM, Alexander Graf wrote:
> Hey Ashish,
>
> On 09.04.24 22:42, Ashish Kalra wrote:
>> From: Ashish Kalra <ashish.kalra@amd.com>
>>
>> The patchset adds bits and pieces to get kexec (and crashkernel) work on
>> SNP guest.
>
>
> With this patch set (and similar for the TDX one), you enable the 
> typical kdump case, which is great!
>
> However, if a user is running with direct kernel boot - which is very 
> typical in SEV-SNP setup, especially for Kata Containers and similar - 
> the initial launch measurement is a natural indicator of the target 
> environment. Kexec basically allows them to completely bypass that: 
> You would be able to run a completely different environment than the 
> one you measure through the launch digest. I'm not sure it's a good 
> idea to even allow that by default in CoCo environments - at least not 
> if the kernel is locked down.
>
I thought that kexec is disabled if kernel is in locked-down mode.

Or is it that KEXEC_LOAD syscall is not supported/disabled in kernel 
locked-down mode and KEXEC_FILE_LOAD syscall is supported ?

> Do you have any plans to build a CoCo native kexec where you allow a 
> VM to create a new VM context with a guest provided seed? The new 
> context could rerun all of the attestation and so enable users to 
> generate a new launch digest. If you then atomically swap into the new 
> context, it would in turn enable them to natively "kexec" into a 
> completely new VM context including measurements.

No, currently i don't think there any any such plans.

Thanks, Ashish

>
> I understand that an SVSM + TPM implementation may help to some extent 
> here by integrating with IMA and adding the new kernel into the IMA 
> log. But that quickly becomes very convoluted (hence difficult to 
> assess correctness for) and the same measurement question arises just 
> one level up then: How do you update your SVSM while maintaining a 
> full measurement and trust chain?
>
>
> Thanks,
>
> Alex
>

Re: [PATCH v4 0/4] x86/snp: Add kexec support

Posted by Vitaly Kuznetsov 1 year, 9 months ago

Alexander Graf <graf@amazon.com> writes:

> Hey Ashish,
>
> On 09.04.24 22:42, Ashish Kalra wrote:
>> From: Ashish Kalra <ashish.kalra@amd.com>
>>
>> The patchset adds bits and pieces to get kexec (and crashkernel) work on
>> SNP guest.
>
>
> With this patch set (and similar for the TDX one), you enable the 
> typical kdump case, which is great!
>
> However, if a user is running with direct kernel boot - which is very 
> typical in SEV-SNP setup, especially for Kata Containers and similar - 
> the initial launch measurement is a natural indicator of the target 
> environment. Kexec basically allows them to completely bypass that: You 
> would be able to run a completely different environment than the one you 
> measure through the launch digest. I'm not sure it's a good idea to even 
> allow that by default in CoCo environments - at least not if the kernel 
> is locked down.

Isn't it the same when we just allow loading kernel modules? I'm sure
you can also achieve a 'completely different environment' with that :-)
With SecureBoot / lockdown we normally require modules to pass signature
check, I guess we can employ the same mechanism for kexec. I.e. in
lockdown, we require signature check on the kexec-ed kernel. Also, it
may make sense to check initramfs too (with direct kernel boot it's also
part of launch measurements, right?) and there's UKI for that already).

Personally, I believe that if we simply forbid kexec for CoCo in
lockdown mode, the feature will become mostly useless in 'full stack'
(which boot through firmware) production envrironments.

-- 
Vitaly

Re: [PATCH v4 0/4] x86/snp: Add kexec support

Posted by Alexander Graf 1 year, 9 months ago

On 02.05.24 14:18, Vitaly Kuznetsov wrote:
> Alexander Graf <graf@amazon.com> writes:
>
>> Hey Ashish,
>>
>> On 09.04.24 22:42, Ashish Kalra wrote:
>>> From: Ashish Kalra <ashish.kalra@amd.com>
>>>
>>> The patchset adds bits and pieces to get kexec (and crashkernel) work on
>>> SNP guest.
>>
>> With this patch set (and similar for the TDX one), you enable the
>> typical kdump case, which is great!
>>
>> However, if a user is running with direct kernel boot - which is very
>> typical in SEV-SNP setup, especially for Kata Containers and similar -
>> the initial launch measurement is a natural indicator of the target
>> environment. Kexec basically allows them to completely bypass that: You
>> would be able to run a completely different environment than the one you
>> measure through the launch digest. I'm not sure it's a good idea to even
>> allow that by default in CoCo environments - at least not if the kernel
>> is locked down.
> Isn't it the same when we just allow loading kernel modules? I'm sure
> you can also achieve a 'completely different environment' with that :-)
> With SecureBoot / lockdown we normally require modules to pass signature
> check, I guess we can employ the same mechanism for kexec. I.e. in
> lockdown, we require signature check on the kexec-ed kernel. Also, it
> may make sense to check initramfs too (with direct kernel boot it's also
> part of launch measurements, right?) and there's UKI for that already).

Correct. With IMA, you even do exactly that: Enforce a signature check 
of the next binary with kexec.

The problem is that you typically want to update the system because 
something is broken; most likely your original environment had a 
security issue somewhere. From a pure SEV-SNP attestation point of view, 
you can not distinguish between the patched and unpatched environment: 
Both look the same.

So while kexec isn't the problem, it's the fact that you can't tell 
anyone that you're now running a fixed version of the code :).

> Personally, I believe that if we simply forbid kexec for CoCo in
> lockdown mode, the feature will become mostly useless in 'full stack'
> (which boot through firmware) production envrironments.

I'm happy for CoCo to stay smoke and mirrors :). But I believe that if 
you want to genuinely draw a trust chain back to an AMD/Intel 
certificate, we need to come up with a good way of making updates work 
with a working trust chain so that whoever checks whether you're running 
sanctioned code is able to validate the claim.

Alex

Re: [PATCH v4 0/4] x86/snp: Add kexec support

Posted by Vitaly Kuznetsov 1 year, 9 months ago

Alexander Graf <graf@amazon.com> writes:

> Correct. With IMA, you even do exactly that: Enforce a signature check 
> of the next binary with kexec.
>
> The problem is that you typically want to update the system because 
> something is broken; most likely your original environment had a 
> security issue somewhere. From a pure SEV-SNP attestation point of view, 
> you can not distinguish between the patched and unpatched environment: 
> Both look the same.
>
> So while kexec isn't the problem, it's the fact that you can't tell 
> anyone that you're now running a fixed version of the code :).

...

>
> I'm happy for CoCo to stay smoke and mirrors :). 

"Only a Sith deals in absolutes" :-)

> But I believe that if 
> you want to genuinely draw a trust chain back to an AMD/Intel 
> certificate, we need to come up with a good way of making updates work 
> with a working trust chain so that whoever checks whether you're running 
> sanctioned code is able to validate the claim.

Launch measurements are what they are, they describe the state of your
guest before it started booting. There are multiple mechanisms in Linux
which change CPL0 code already: self-modifying code like static keys,
loadable modules, runtime patching, kexec,... In case some specific
deployment requires stronger guarantees we can probably introduce
something like 'full lockdown' mode (as a compile time option, I guess)
which would disable all of the aforementioned mechanisms. It will still
not be a hard proof that the running code matches launch measurements
(because vulnerabilities/bugs may still exist) I guess but could be an
improvement.

Basically, what I wanted to argue is that kexec does not need to be
treated 'specially' for CVMs if we keep all other ways to modify kernel
code. Making these methods 'attestable' is currently a challenge indeed.

-- 
Vitaly

[PATCH v5 0/3] x86/snp: Add kexec support

Posted by Ashish Kalra 1 year, 9 months ago

From: Ashish Kalra <ashish.kalra@amd.com>

The patchset adds bits and pieces to get kexec (and crashkernel) work on
SNP guest.

The series is based off of and tested against Kirill Shutemov's tree:
  https://github.com/intel/tdx.git guest-kexec

----

v5:
- Removed sev_es_enabled() function and using sev_status directly to
  check for SEV-ES/SEV-SNP guest.
- used --base option to generate patches to specify Kirill's TDX guest
  kexec patches as prerequisite patches to fix kernel test robot
  build errors.

v4:
- Rebased to current tip/master.
- Reviewed-bys from Sathya.
- Remove snp_kexec_unprep_rom_memory() as it is not needed any more as 
  SEV-SNP code is not validating the ROM range in probe_roms() anymore.
- Fix kernel test robot build error/warnings.

v3:
- Rebased;
- moved Keep page tables that maps E820_TYPE_ACPI patch to Kirill's tdx
  guest kexec patch series.
- checking the md attribute instead of checking the efi_setup for
  detecting if running under kexec kernel.
- added new sev_es_enabled() function.
- skip video memory access in decompressor for SEV-ES/SNP systems to 
  prevent guest termination as boot stage2 #VC handler does not handle
  MMIO.

v2:
- address zeroing of unaccepted memory table mappings at all page table levels
  adding phys_pte_init(), phys_pud_init() and phys_p4d_init().
- include skip efi_arch_mem_reserve() in case of kexec as part of this 
  patch set.
- rename last_address_shd_kexec to a more appropriate 
  kexec_last_address_to_make_private.
- remove duplicate code shared with TDX and use common interfaces
  defined for SNP and TDX for kexec/kdump.
- remove set_pte_enc() dependency on pg_level_to_pfn() and make the 
  function simpler.
- rename unshare_pte() to make_pte_private().
- clarify and make the comment for using kexec_last_address_to_make_private  
  more understandable.
- general cleanup. 


Ashish Kalra (3):
  efi/x86: skip efi_arch_mem_reserve() in case of kexec.
  x86/boot/compressed: Skip Video Memory access in Decompressor for
    SEV-ES/SNP.
  x86/snp: Convert shared memory back to private on kexec

 arch/x86/boot/compressed/misc.c |   6 +-
 arch/x86/include/asm/sev.h      |   4 +
 arch/x86/kernel/sev.c           | 161 ++++++++++++++++++++++++++++++++
 arch/x86/mm/mem_encrypt_amd.c   |   3 +
 arch/x86/platform/efi/quirks.c  |  20 +++-
 5 files changed, 190 insertions(+), 4 deletions(-)


base-commit: a18b42d8997abfd77aa1637c0de6850b0c30b1fe
prerequisite-patch-id: bd8e77f0f12223d21cb2f35b77bfcbdd9ad80b0f
prerequisite-patch-id: bfe2fa046349978ac1825275eb205acecfbc22f3
prerequisite-patch-id: 5e60d292457c7cd98fd3e45c23127e9463b56a69
prerequisite-patch-id: 1f97d0a2edb7509dd58276f628d1a4bda62c154c
prerequisite-patch-id: 8db559385c44e8b6670d74196e8d83d2dfad2f40
prerequisite-patch-id: cbdfea1e50ecb3b4cee3a25a27df4d35bd95d532
prerequisite-patch-id: 1cea0996e0dc3bb9f0059c927c405ca31003791e
prerequisite-patch-id: 469a0a3c78b0eca82527cd85e2205fb8fb89d645
prerequisite-patch-id: 2974ef211db5253d9782018e352d2a6ff0b0ef54
prerequisite-patch-id: 2cfffd80947941892421dae99b7fa0f9f9715884
prerequisite-patch-id: 466c2cb9f0a107bbd1dbd8526f4eff2bdb55f1ce
prerequisite-patch-id: d4966ae63e86d24b0bf578da4dae871cd9002b12
prerequisite-patch-id: fccde6f1fa385b5af0195f81fcb95acd71822428
prerequisite-patch-id: 16048ee15e392b0b9217b8923939b0059311abd2
prerequisite-patch-id: 5c9ae9aa294f72f63ae2c3551507dfbd92525803
prerequisite-patch-id: 758bdb686290c018cbd5b7d005354019f9d15248
prerequisite-patch-id: 4125b799fc9577b1a46427e45618fa0174f7a4b3
prerequisite-patch-id: 60760e0c98ab7ccd2ca22ae3e9f20ff5a94c6e91
-- 
2.34.1

[PATCH v5 1/3] efi/x86: skip efi_arch_mem_reserve() in case of kexec.

Posted by Ashish Kalra 1 year, 9 months ago

From: Ashish Kalra <ashish.kalra@amd.com>

For kexec use case, need to use and stick to the EFI memmap passed
from the first kernel via boot-params/setup data, hence,
skip efi_arch_mem_reserve() during kexec.

Additionally during SNP guest kexec testing discovered that EFI memmap
is corrupted during chained kexec. kexec_enter_virtual_mode() during
late init will remap the efi_memmap physical pages allocated in
efi_arch_mem_reserve() via memblock & then subsequently cause random
EFI memmap corruption once memblock is freed/teared-down.

Suggested-by: Dave Young <dyoung@redhat.com>
[Dave Young: checking the md attribute instead of checking the efi_setup]
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/platform/efi/quirks.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index f0cc00032751..982f5e50a4b3 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -258,12 +258,28 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
 	int num_entries;
 	void *new;
 
-	if (efi_mem_desc_lookup(addr, &md) ||
-	    md.type != EFI_BOOT_SERVICES_DATA) {
+	/*
+	 * For kexec use case, we need to use the EFI memmap passed from the first
+	 * kernel via setup data, so we need to skip this.
+	 * Additionally kexec_enter_virtual_mode() during late init will remap
+	 * the efi_memmap physical pages allocated here via memboot & then
+	 * subsequently cause random EFI memmap corruption once memblock is freed.
+	 */
+
+	if (efi_mem_desc_lookup(addr, &md)) {
 		pr_err("Failed to lookup EFI memory descriptor for %pa\n", &addr);
 		return;
 	}
 
+	if (md.type != EFI_BOOT_SERVICES_DATA) {
+		pr_err("Skip reserving non EFI Boot Service Data memory for %pa\n", &addr);
+		return;
+	}
+
+	/* Kexec copied the efi memmap from the first kernel, thus skip the case */
+	if (md.attribute & EFI_MEMORY_RUNTIME)
+		return;
+
 	if (addr + size > md.phys_addr + (md.num_pages << EFI_PAGE_SHIFT)) {
 		pr_err("Region spans EFI memory descriptors, %pa\n", &addr);
 		return;
-- 
2.34.1