OvmfPkg/OvmfPkg.dec | 11 + OvmfPkg/AmdSev/AmdSevX64.dsc | 2 + OvmfPkg/AmdSev/AmdSevX64.fdf | 13 +- .../ConfidentialMigrationDxe.inf | 45 +++ .../ConfidentialMigrationPei.inf | 35 ++ .../DxeMemEncryptSevLib.inf | 1 + .../PeiMemEncryptSevLib.inf | 1 + OvmfPkg/PlatformDxe/Platform.inf | 2 + OvmfPkg/PlatformPei/PlatformPei.inf | 2 + UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf | 2 + UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf | 2 + OvmfPkg/AmdSev/ConfidentialMigration/MpLib.h | 235 +++++++++++++ .../ConfidentialMigration/VirtualMemory.h | 177 ++++++++++ OvmfPkg/Include/Guid/MemEncryptLib.h | 16 + OvmfPkg/PlatformDxe/PlatformConfig.h | 5 + .../ConfidentialMigrationDxe.c | 325 ++++++++++++++++++ .../ConfidentialMigrationPei.c | 25 ++ .../X64/PeiDxeVirtualMemory.c | 18 + OvmfPkg/PlatformDxe/AmdSev.c | 99 ++++++ OvmfPkg/PlatformDxe/Platform.c | 6 + OvmfPkg/PlatformPei/AmdSev.c | 10 + OvmfPkg/PlatformPei/Platform.c | 10 + .../CpuExceptionHandlerLib/DxeException.c | 8 +- UefiCpuPkg/Library/MpInitLib/DxeMpLib.c | 21 +- UefiCpuPkg/Library/MpInitLib/MpLib.c | 7 +- 25 files changed, 1061 insertions(+), 17 deletions(-) create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationDxe.inf create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationPei.inf create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/MpLib.h create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/VirtualMemory.h create mode 100644 OvmfPkg/Include/Guid/MemEncryptLib.h create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationDxe.c create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationPei.c create mode 100644 OvmfPkg/PlatformDxe/AmdSev.c
This is a demonstration of fast migration for encrypted virtual machines using a Migration Handler that lives in OVMF. This demo uses AMD SEV, but the ideas may generalize to other confidential computing platforms. With AMD SEV, guest memory is encrypted and the hypervisor cannot access or move it. This makes migration tricky. In this demo, we show how the HV can ask a Migration Handler (MH) in the firmware for an encrypted page. The MH encrypts the page with a transport key prior to releasing it to the HV. The target machine also runs an MH that decrypts the page once it is passed in by the target HV. These patches are not ready for production, but the are a full end-to-end solution that facilitates a fast live migration between two SEV VMs. Corresponding patches for QEMU have been posted my colleague Dov Murik on qemu-devel. Our approach needs little kernel support, requiring only one hypercall that the guest can use to mark a page as encrypted or shared. This series includes updated patches from Ashish Kalra and Brijesh Singh that allow OVMF to use this hypercall. The MH runs continuously in the guest, waiting for communication from the HV. The HV starts an additional vCPU for the MH but does not expose it to the guest OS via ACPI. We use the MpService to start the MH. The MpService is only available at runtime and processes that are started by it are usually cleaned up on ExitBootServices. Since we need the MH to run continuously, we had to make some modifications. Ideally a feature could be added to the MpService to allow for the starting of long-running processes. Besides migration, this could support other background processes that need to operate within the encryption boundary. For now, we have included a handful of patches that modify the MpService to allow the MH to keep running after ExitBootServices. These are temporary. Ashish Kalra (2): OvmfPkg/PlatformPei: Mark SEC GHCB page in the page encrpytion bitmap. OvmfPkg/PlatformDxe: Add support for SEV live migration. Brijesh Singh (1): OvmfPkg/BaseMemEncryptLib: Support to issue unencrypted hypercall Dov Murik (1): OvmfPkg/AmdSev: Build page table for migration handler Tobin Feldman-Fitzthum (10): OvmfPkg/AmdSev: Base for Confidential Migration Handler OvmfPkg/PlatfomPei: Set Confidential Migration PCD OvmfPkg/AmdSev: Setup Migration Handler Mailbox OvmfPkg/AmdSev: MH support for mailbox protocol UefiCpuPkg/MpInitLib: temp removal of MpLib cleanup UefiCpuPkg/MpInitLib: Allocate MP buffer as runtime memory UefiCpuPkg/CpuExceptionHandlerLib: Exception handling as runtime memory OvmfPkg/AmdSev: Don't overwrite mailbox or pagetables OvmfPkg/AmdSev: Don't overwrite MH stack OvmfPkg/AmdSev: MH page encryption POC OvmfPkg/OvmfPkg.dec | 11 + OvmfPkg/AmdSev/AmdSevX64.dsc | 2 + OvmfPkg/AmdSev/AmdSevX64.fdf | 13 +- .../ConfidentialMigrationDxe.inf | 45 +++ .../ConfidentialMigrationPei.inf | 35 ++ .../DxeMemEncryptSevLib.inf | 1 + .../PeiMemEncryptSevLib.inf | 1 + OvmfPkg/PlatformDxe/Platform.inf | 2 + OvmfPkg/PlatformPei/PlatformPei.inf | 2 + UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf | 2 + UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf | 2 + OvmfPkg/AmdSev/ConfidentialMigration/MpLib.h | 235 +++++++++++++ .../ConfidentialMigration/VirtualMemory.h | 177 ++++++++++ OvmfPkg/Include/Guid/MemEncryptLib.h | 16 + OvmfPkg/PlatformDxe/PlatformConfig.h | 5 + .../ConfidentialMigrationDxe.c | 325 ++++++++++++++++++ .../ConfidentialMigrationPei.c | 25 ++ .../X64/PeiDxeVirtualMemory.c | 18 + OvmfPkg/PlatformDxe/AmdSev.c | 99 ++++++ OvmfPkg/PlatformDxe/Platform.c | 6 + OvmfPkg/PlatformPei/AmdSev.c | 10 + OvmfPkg/PlatformPei/Platform.c | 10 + .../CpuExceptionHandlerLib/DxeException.c | 8 +- UefiCpuPkg/Library/MpInitLib/DxeMpLib.c | 21 +- UefiCpuPkg/Library/MpInitLib/MpLib.c | 7 +- 25 files changed, 1061 insertions(+), 17 deletions(-) create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationDxe.inf create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationPei.inf create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/MpLib.h create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/VirtualMemory.h create mode 100644 OvmfPkg/Include/Guid/MemEncryptLib.h create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationDxe.c create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationPei.c create mode 100644 OvmfPkg/PlatformDxe/AmdSev.c -- 2.20.1 -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#72353): https://edk2.groups.io/g/devel/message/72353 Mute This Topic: https://groups.io/mt/81036365/1787277 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org] -=-=-=-=-=-=-=-=-=-=-=-
Hi Tobin, On 03/02/21 21:48, Tobin Feldman-Fitzthum wrote: > This is a demonstration of fast migration for encrypted virtual machines > using a Migration Handler that lives in OVMF. This demo uses AMD SEV, > but the ideas may generalize to other confidential computing platforms. > With AMD SEV, guest memory is encrypted and the hypervisor cannot access > or move it. This makes migration tricky. In this demo, we show how the > HV can ask a Migration Handler (MH) in the firmware for an encrypted > page. The MH encrypts the page with a transport key prior to releasing > it to the HV. The target machine also runs an MH that decrypts the page > once it is passed in by the target HV. These patches are not ready for > production, but the are a full end-to-end solution that facilitates a > fast live migration between two SEV VMs. > > Corresponding patches for QEMU have been posted my colleague Dov Murik > on qemu-devel. Our approach needs little kernel support, requiring only > one hypercall that the guest can use to mark a page as encrypted or > shared. This series includes updated patches from Ashish Kalra and > Brijesh Singh that allow OVMF to use this hypercall. > > The MH runs continuously in the guest, waiting for communication from > the HV. The HV starts an additional vCPU for the MH but does not expose > it to the guest OS via ACPI. We use the MpService to start the MH. The > MpService is only available at runtime and processes that are started by > it are usually cleaned up on ExitBootServices. Since we need the MH to > run continuously, we had to make some modifications. Ideally a feature > could be added to the MpService to allow for the starting of > long-running processes. Besides migration, this could support other > background processes that need to operate within the encryption > boundary. For now, we have included a handful of patches that modify the > MpService to allow the MH to keep running after ExitBootServices. These > are temporary. I plan to do a lightweight review for this series. (My understanding is that it's an RFC and not actually being proposed for merging.) Regarding the MH's availability at runtime -- does that necessarily require the isolation of an AP? Because in the current approach, allowing the MP Services to survive into OS runtime (in some form or another) seems critical, and I don't think it's going to fly. I agree that the UefiCpuPkg patches have been well separated from the rest of the series, but I'm somewhat doubtful the "firmware-initiated background process" idea will be accepted. Have you investigated exposing a new "runtime service" (a function pointer) via the UEFI Configuration table, and calling that (perhaps periodically?) from the guest kernel? It would be a form of polling I guess. Or maybe, poll the mailbox directly in the kernel, and call the new firmware runtime service when there's an actual command to process. (You do spell out "little kernel support", and I'm not sure if that's a technical benefit, or a political / community benefit.) I'm quite uncomfortable with an attempt to hide a CPU from the OS via ACPI. The OS has other ways to learn (for example, a boot loader could use the MP services itself, stash the information, and hand it to the OS kernel -- this would minimally allow for detecting an inconsistency in the OS). What about "all-but-self" IPIs too -- the kernel might think all the processors it's poking like that were under its control. Also, as far as I can tell from patch #7, the AP seems to be busy-looping (with a CpuPause() added in), for the entire lifetime of the OS. Do I understand right? If so -- is it a temporary trait as well? Sorry if my questions are "premature", in the sense that I could get my own answers as well if I actually read the patches in detail -- however, I wouldn't like to do that at once, because then I'll be distracted by many style issues and other "trivial" stuff. Examples for the latter: - patch#1 calls SetMemoryEncDecHypercall3(), but there is no such function in edk2, so minimally it's a patch ordering bug in the series, - in patch#1, there's minimally one whitespace error (no whitespace right after "EFI_SIZE_TO_PAGES") - in patch#1, the alphabetical ordering in the [LibraryClasses] section, and in the matching #include directives, gets broken, - I'd prefer if the "SevLiveMigrationEnabled" UEFI variable were set in ConfidentialMigrationDxe, rather than PlatformDxe (patch #3), or at least another AMD SEV related DXE driver (OvmfPkg/AmdSevDxe etc). - any particular reasonf or making the UEFI variable non-volatile? I don't think it should survive any particular boot of the guest. - Why do we need a variable in the first place? etc etc Thanks! Laszlo > > Ashish Kalra (2): > OvmfPkg/PlatformPei: Mark SEC GHCB page in the page encrpytion bitmap. > OvmfPkg/PlatformDxe: Add support for SEV live migration. > > Brijesh Singh (1): > OvmfPkg/BaseMemEncryptLib: Support to issue unencrypted hypercall > > Dov Murik (1): > OvmfPkg/AmdSev: Build page table for migration handler > > Tobin Feldman-Fitzthum (10): > OvmfPkg/AmdSev: Base for Confidential Migration Handler > OvmfPkg/PlatfomPei: Set Confidential Migration PCD > OvmfPkg/AmdSev: Setup Migration Handler Mailbox > OvmfPkg/AmdSev: MH support for mailbox protocol > UefiCpuPkg/MpInitLib: temp removal of MpLib cleanup > UefiCpuPkg/MpInitLib: Allocate MP buffer as runtime memory > UefiCpuPkg/CpuExceptionHandlerLib: Exception handling as runtime > memory > OvmfPkg/AmdSev: Don't overwrite mailbox or pagetables > OvmfPkg/AmdSev: Don't overwrite MH stack > OvmfPkg/AmdSev: MH page encryption POC > > OvmfPkg/OvmfPkg.dec | 11 + > OvmfPkg/AmdSev/AmdSevX64.dsc | 2 + > OvmfPkg/AmdSev/AmdSevX64.fdf | 13 +- > .../ConfidentialMigrationDxe.inf | 45 +++ > .../ConfidentialMigrationPei.inf | 35 ++ > .../DxeMemEncryptSevLib.inf | 1 + > .../PeiMemEncryptSevLib.inf | 1 + > OvmfPkg/PlatformDxe/Platform.inf | 2 + > OvmfPkg/PlatformPei/PlatformPei.inf | 2 + > UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf | 2 + > UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf | 2 + > OvmfPkg/AmdSev/ConfidentialMigration/MpLib.h | 235 +++++++++++++ > .../ConfidentialMigration/VirtualMemory.h | 177 ++++++++++ > OvmfPkg/Include/Guid/MemEncryptLib.h | 16 + > OvmfPkg/PlatformDxe/PlatformConfig.h | 5 + > .../ConfidentialMigrationDxe.c | 325 ++++++++++++++++++ > .../ConfidentialMigrationPei.c | 25 ++ > .../X64/PeiDxeVirtualMemory.c | 18 + > OvmfPkg/PlatformDxe/AmdSev.c | 99 ++++++ > OvmfPkg/PlatformDxe/Platform.c | 6 + > OvmfPkg/PlatformPei/AmdSev.c | 10 + > OvmfPkg/PlatformPei/Platform.c | 10 + > .../CpuExceptionHandlerLib/DxeException.c | 8 +- > UefiCpuPkg/Library/MpInitLib/DxeMpLib.c | 21 +- > UefiCpuPkg/Library/MpInitLib/MpLib.c | 7 +- > 25 files changed, 1061 insertions(+), 17 deletions(-) > create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationDxe.inf > create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationPei.inf > create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/MpLib.h > create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/VirtualMemory.h > create mode 100644 OvmfPkg/Include/Guid/MemEncryptLib.h > create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationDxe.c > create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationPei.c > create mode 100644 OvmfPkg/PlatformDxe/AmdSev.c > -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#72393): https://edk2.groups.io/g/devel/message/72393 Mute This Topic: https://groups.io/mt/81036365/1787277 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org] -=-=-=-=-=-=-=-=-=-=-=-
> Hi Tobin, > > On 03/02/21 21:48, Tobin Feldman-Fitzthum wrote: >> This is a demonstration of fast migration for encrypted virtual machines >> using a Migration Handler that lives in OVMF. This demo uses AMD SEV, >> but the ideas may generalize to other confidential computing platforms. >> With AMD SEV, guest memory is encrypted and the hypervisor cannot access >> or move it. This makes migration tricky. In this demo, we show how the >> HV can ask a Migration Handler (MH) in the firmware for an encrypted >> page. The MH encrypts the page with a transport key prior to releasing >> it to the HV. The target machine also runs an MH that decrypts the page >> once it is passed in by the target HV. These patches are not ready for >> production, but the are a full end-to-end solution that facilitates a >> fast live migration between two SEV VMs. >> >> Corresponding patches for QEMU have been posted my colleague Dov Murik >> on qemu-devel. Our approach needs little kernel support, requiring only >> one hypercall that the guest can use to mark a page as encrypted or >> shared. This series includes updated patches from Ashish Kalra and >> Brijesh Singh that allow OVMF to use this hypercall. >> >> The MH runs continuously in the guest, waiting for communication from >> the HV. The HV starts an additional vCPU for the MH but does not expose >> it to the guest OS via ACPI. We use the MpService to start the MH. The >> MpService is only available at runtime and processes that are started by >> it are usually cleaned up on ExitBootServices. Since we need the MH to >> run continuously, we had to make some modifications. Ideally a feature >> could be added to the MpService to allow for the starting of >> long-running processes. Besides migration, this could support other >> background processes that need to operate within the encryption >> boundary. For now, we have included a handful of patches that modify the >> MpService to allow the MH to keep running after ExitBootServices. These >> are temporary. > I plan to do a lightweight review for this series. (My understanding is > that it's an RFC and not actually being proposed for merging.) > > Regarding the MH's availability at runtime -- does that necessarily > require the isolation of an AP? Because in the current approach, > allowing the MP Services to survive into OS runtime (in some form or > another) seems critical, and I don't think it's going to fly. > > I agree that the UefiCpuPkg patches have been well separated from the > rest of the series, but I'm somewhat doubtful the "firmware-initiated > background process" idea will be accepted. Have you investigated > exposing a new "runtime service" (a function pointer) via the UEFI > Configuration table, and calling that (perhaps periodically?) from the > guest kernel? It would be a form of polling I guess. Or maybe, poll the > mailbox directly in the kernel, and call the new firmware runtime > service when there's an actual command to process. Continuous runtime availability for the MH is almost certainly the most controversial part of this proposal, which is why I put it in the cover letter and why it's good to discuss. > (You do spell out "little kernel support", and I'm not sure if that's a > technical benefit, or a political / community benefit.) As you allude to, minimal kernel support is really one of the main things that shapes our approach. This is partly a political and practical benefit, but there are also technical benefits. Having the MH in firmware likely leads to higher availability. It can be accessed when the OS is unreachable, perhaps during boot or when the OS is hung. There are also potential portability advantages although we do currently require support for one hypercall. The cost of implementing this hypercall is low. Generally speaking, our task is to find a home for functionality that was traditionally provided by the hypervisor, but that needs to be inside the trust domain, but that isn't really part of a guest. A meta-goal of this project is to figure out the best way to do this. > > I'm quite uncomfortable with an attempt to hide a CPU from the OS via > ACPI. The OS has other ways to learn (for example, a boot loader could > use the MP services itself, stash the information, and hand it to the OS > kernel -- this would minimally allow for detecting an inconsistency in > the OS). What about "all-but-self" IPIs too -- the kernel might think > all the processors it's poking like that were under its control. This might be the second most controversial piece. Here's a question: if we could successfully hide the MH vCPU from the OS, would it still make you uncomfortable? In other words, is the worry that there might be some inconsistency or more generally that there is something hidden from the OS? One thing to think about is that the guest owner should generally be aware that there is a migration handler running. The way I see it, a guest owner of an SEV VM would need to opt-in to migration and should then expect that there is an MH running even if they aren't able to see it. Of course we need to be certain that the MH isn't going to break the OS. > Also, as far as I can tell from patch #7, the AP seems to be > busy-looping (with a CpuPause() added in), for the entire lifetime of > the OS. Do I understand right? If so -- is it a temporary trait as well? In our approach the MH continuously checks for commands from the hypervisor. There are potentially ways to optimize this, such as having the hypervisor de-schedule the MH vCPU while not migrating. You could potentially shut down down the MH on the target after receiving the MH_RESET command (when the migration finishes), but what if you want to migrate that VM somewhere else? > > Sorry if my questions are "premature", in the sense that I could get my > own answers as well if I actually read the patches in detail -- however, > I wouldn't like to do that at once, because then I'll be distracted by > many style issues and other "trivial" stuff. Examples for the latter: Not premature at all. I think you hit the nail on the head with everything you raised. -Tobin > > - patch#1 calls SetMemoryEncDecHypercall3(), but there is no such > function in edk2, so minimally it's a patch ordering bug in the series, > > - in patch#1, there's minimally one whitespace error (no whitespace > right after "EFI_SIZE_TO_PAGES") > > - in patch#1, the alphabetical ordering in the [LibraryClasses] section, > and in the matching #include directives, gets broken, > > - I'd prefer if the "SevLiveMigrationEnabled" UEFI variable were set in > ConfidentialMigrationDxe, rather than PlatformDxe (patch #3), or at > least another AMD SEV related DXE driver (OvmfPkg/AmdSevDxe etc). > > - any particular reasonf or making the UEFI variable non-volatile? I > don't think it should survive any particular boot of the guest. > > - Why do we need a variable in the first place? > > etc etc > > Thanks! > Laszlo > > > > >> Ashish Kalra (2): >> OvmfPkg/PlatformPei: Mark SEC GHCB page in the page encrpytion bitmap. >> OvmfPkg/PlatformDxe: Add support for SEV live migration. >> >> Brijesh Singh (1): >> OvmfPkg/BaseMemEncryptLib: Support to issue unencrypted hypercall >> >> Dov Murik (1): >> OvmfPkg/AmdSev: Build page table for migration handler >> >> Tobin Feldman-Fitzthum (10): >> OvmfPkg/AmdSev: Base for Confidential Migration Handler >> OvmfPkg/PlatfomPei: Set Confidential Migration PCD >> OvmfPkg/AmdSev: Setup Migration Handler Mailbox >> OvmfPkg/AmdSev: MH support for mailbox protocol >> UefiCpuPkg/MpInitLib: temp removal of MpLib cleanup >> UefiCpuPkg/MpInitLib: Allocate MP buffer as runtime memory >> UefiCpuPkg/CpuExceptionHandlerLib: Exception handling as runtime >> memory >> OvmfPkg/AmdSev: Don't overwrite mailbox or pagetables >> OvmfPkg/AmdSev: Don't overwrite MH stack >> OvmfPkg/AmdSev: MH page encryption POC >> >> OvmfPkg/OvmfPkg.dec | 11 + >> OvmfPkg/AmdSev/AmdSevX64.dsc | 2 + >> OvmfPkg/AmdSev/AmdSevX64.fdf | 13 +- >> .../ConfidentialMigrationDxe.inf | 45 +++ >> .../ConfidentialMigrationPei.inf | 35 ++ >> .../DxeMemEncryptSevLib.inf | 1 + >> .../PeiMemEncryptSevLib.inf | 1 + >> OvmfPkg/PlatformDxe/Platform.inf | 2 + >> OvmfPkg/PlatformPei/PlatformPei.inf | 2 + >> UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf | 2 + >> UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf | 2 + >> OvmfPkg/AmdSev/ConfidentialMigration/MpLib.h | 235 +++++++++++++ >> .../ConfidentialMigration/VirtualMemory.h | 177 ++++++++++ >> OvmfPkg/Include/Guid/MemEncryptLib.h | 16 + >> OvmfPkg/PlatformDxe/PlatformConfig.h | 5 + >> .../ConfidentialMigrationDxe.c | 325 ++++++++++++++++++ >> .../ConfidentialMigrationPei.c | 25 ++ >> .../X64/PeiDxeVirtualMemory.c | 18 + >> OvmfPkg/PlatformDxe/AmdSev.c | 99 ++++++ >> OvmfPkg/PlatformDxe/Platform.c | 6 + >> OvmfPkg/PlatformPei/AmdSev.c | 10 + >> OvmfPkg/PlatformPei/Platform.c | 10 + >> .../CpuExceptionHandlerLib/DxeException.c | 8 +- >> UefiCpuPkg/Library/MpInitLib/DxeMpLib.c | 21 +- >> UefiCpuPkg/Library/MpInitLib/MpLib.c | 7 +- >> 25 files changed, 1061 insertions(+), 17 deletions(-) >> create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationDxe.inf >> create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationPei.inf >> create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/MpLib.h >> create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/VirtualMemory.h >> create mode 100644 OvmfPkg/Include/Guid/MemEncryptLib.h >> create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationDxe.c >> create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationPei.c >> create mode 100644 OvmfPkg/PlatformDxe/AmdSev.c >> -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#72399): https://edk2.groups.io/g/devel/message/72399 Mute This Topic: https://groups.io/mt/81036365/1787277 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org] -=-=-=-=-=-=-=-=-=-=-=-
On 03/03/21 19:25, Tobin Feldman-Fitzthum wrote: >> Laszlo wrote: >> I'm quite uncomfortable with an attempt to hide a CPU from the OS via >> ACPI. The OS has other ways to learn (for example, a boot loader could >> use the MP services itself, stash the information, and hand it to the OS >> kernel -- this would minimally allow for detecting an inconsistency in >> the OS). What about "all-but-self" IPIs too -- the kernel might think >> all the processors it's poking like that were under its control. > > This might be the second most controversial piece. Here's a question: if > we could successfully hide the MH vCPU from the OS, would it still make > you uncomfortable? In other words, is the worry that there might be some > inconsistency or more generally that there is something hidden from the > OS? (1) My personal concern is the consistency aspect. In *some* parts of the firmware, we'd rely on the hidden CPU to behave as a "logical execution unit" (because we want it to run the MH), but in other parts of the firmware, we'd expect it to be hidden. (Consider what EFI_MP_SERVICES_PROTOCOL.StartupAllAPs() should do while the MH is running!) And then the CPU should be hidden from the OS completely, even if the OS doesn't rely on ACPI, but massages LAPIC stuff that is architecturally specified. In other words, we'd have to treat this processor as a "service processor", outside of the "normal" (?) processor domain -- basically what the PSP is right now. I don't have the slightest idea how physical firmware deals with service processors in general. I'm really scared of the many possible corner cases (CPU hot(un)plug, NUMA proximity, ...) (2) I expect kernel developers to have concerns about a firmware-level "background job" at OS runtime. SMM does something similar (periodic or otherwise hardware-initiated async SMIs etc), and kernel developers already dislike those (latency spikes, messing with hardware state...). > One thing to think about is that the guest owner should generally be > aware that there is a migration handler running. The way I see it, a > guest owner of an SEV VM would need to opt-in to migration and should > then expect that there is an MH running even if they aren't able to see > it. Of course we need to be certain that the MH isn't going to break the > OS. I didn't think of the guest owner, but the developers that work on (possibly unrelated parts of) the guest kernel. > >> Also, as far as I can tell from patch #7, the AP seems to be >> busy-looping (with a CpuPause() added in), for the entire lifetime of >> the OS. Do I understand right? If so -- is it a temporary trait as well? > > In our approach the MH continuously checks for commands from the > hypervisor. There are potentially ways to optimize this, such as having > the hypervisor de-schedule the MH vCPU while not migrating. You could > potentially shut down down the MH on the target after receiving the > MH_RESET command (when the migration finishes), but what if you want to > migrate that VM somewhere else? I have no idea. In the current world, de-scheduling a particular VCPU for extended periods of time is a bad idea (stolen time goes up, ticks get lost, ...) So I guess this would depend on how well you could "hide" the service processor from the guest kernel. I'd really like if we could rely on an established "service processor" methodology, in the guest. Physical platform vendors have used service processors for ages, the firmwares on those platforms (on the main boards) do manage the service processors, and the service processors are hidden from the OS too (beyond specified access methods, if any). My understanding (or assumption) is that such a service processor is primarily a separate entity (you cannot talk to them "unintentionally", for example with an All-But-Self IPI), and that it's reachable only with specific access methods. I think the AMD PSP itself might follow this approach (AIUI it's an aarch64 CPU on an otherwise Intel/AMD arch platform). I'd like us to benefit from a crystallized "service processor" abstraction, if possible. I apologize that I'm this vague -- I've never seen such firmware code that deals with a service processor, I just assume it exists. Thanks Laszlo > >> >> Sorry if my questions are "premature", in the sense that I could get my >> own answers as well if I actually read the patches in detail -- however, >> I wouldn't like to do that at once, because then I'll be distracted by >> many style issues and other "trivial" stuff. Examples for the latter: > > Not premature at all. I think you hit the nail on the head with > everything you raised. > > -Tobin > >> >> - patch#1 calls SetMemoryEncDecHypercall3(), but there is no such >> function in edk2, so minimally it's a patch ordering bug in the series, >> >> - in patch#1, there's minimally one whitespace error (no whitespace >> right after "EFI_SIZE_TO_PAGES") >> >> - in patch#1, the alphabetical ordering in the [LibraryClasses] section, >> and in the matching #include directives, gets broken, >> >> - I'd prefer if the "SevLiveMigrationEnabled" UEFI variable were set in >> ConfidentialMigrationDxe, rather than PlatformDxe (patch #3), or at >> least another AMD SEV related DXE driver (OvmfPkg/AmdSevDxe etc). >> >> - any particular reasonf or making the UEFI variable non-volatile? I >> don't think it should survive any particular boot of the guest. >> >> - Why do we need a variable in the first place? >> >> etc etc >> >> Thanks! >> Laszlo >> >> >> >> >>> Ashish Kalra (2): >>> OvmfPkg/PlatformPei: Mark SEC GHCB page in the page encrpytion >>> bitmap. >>> OvmfPkg/PlatformDxe: Add support for SEV live migration. >>> >>> Brijesh Singh (1): >>> OvmfPkg/BaseMemEncryptLib: Support to issue unencrypted hypercall >>> >>> Dov Murik (1): >>> OvmfPkg/AmdSev: Build page table for migration handler >>> >>> Tobin Feldman-Fitzthum (10): >>> OvmfPkg/AmdSev: Base for Confidential Migration Handler >>> OvmfPkg/PlatfomPei: Set Confidential Migration PCD >>> OvmfPkg/AmdSev: Setup Migration Handler Mailbox >>> OvmfPkg/AmdSev: MH support for mailbox protocol >>> UefiCpuPkg/MpInitLib: temp removal of MpLib cleanup >>> UefiCpuPkg/MpInitLib: Allocate MP buffer as runtime memory >>> UefiCpuPkg/CpuExceptionHandlerLib: Exception handling as runtime >>> memory >>> OvmfPkg/AmdSev: Don't overwrite mailbox or pagetables >>> OvmfPkg/AmdSev: Don't overwrite MH stack >>> OvmfPkg/AmdSev: MH page encryption POC >>> >>> OvmfPkg/OvmfPkg.dec | 11 + >>> OvmfPkg/AmdSev/AmdSevX64.dsc | 2 + >>> OvmfPkg/AmdSev/AmdSevX64.fdf | 13 +- >>> .../ConfidentialMigrationDxe.inf | 45 +++ >>> .../ConfidentialMigrationPei.inf | 35 ++ >>> .../DxeMemEncryptSevLib.inf | 1 + >>> .../PeiMemEncryptSevLib.inf | 1 + >>> OvmfPkg/PlatformDxe/Platform.inf | 2 + >>> OvmfPkg/PlatformPei/PlatformPei.inf | 2 + >>> UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf | 2 + >>> UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf | 2 + >>> OvmfPkg/AmdSev/ConfidentialMigration/MpLib.h | 235 +++++++++++++ >>> .../ConfidentialMigration/VirtualMemory.h | 177 ++++++++++ >>> OvmfPkg/Include/Guid/MemEncryptLib.h | 16 + >>> OvmfPkg/PlatformDxe/PlatformConfig.h | 5 + >>> .../ConfidentialMigrationDxe.c | 325 ++++++++++++++++++ >>> .../ConfidentialMigrationPei.c | 25 ++ >>> .../X64/PeiDxeVirtualMemory.c | 18 + >>> OvmfPkg/PlatformDxe/AmdSev.c | 99 ++++++ >>> OvmfPkg/PlatformDxe/Platform.c | 6 + >>> OvmfPkg/PlatformPei/AmdSev.c | 10 + >>> OvmfPkg/PlatformPei/Platform.c | 10 + >>> .../CpuExceptionHandlerLib/DxeException.c | 8 +- >>> UefiCpuPkg/Library/MpInitLib/DxeMpLib.c | 21 +- >>> UefiCpuPkg/Library/MpInitLib/MpLib.c | 7 +- >>> 25 files changed, 1061 insertions(+), 17 deletions(-) >>> create mode 100644 >>> OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationDxe.inf >>> create mode 100644 >>> OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationPei.inf >>> create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/MpLib.h >>> create mode 100644 >>> OvmfPkg/AmdSev/ConfidentialMigration/VirtualMemory.h >>> create mode 100644 OvmfPkg/Include/Guid/MemEncryptLib.h >>> create mode 100644 >>> OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationDxe.c >>> create mode 100644 >>> OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationPei.c >>> create mode 100644 OvmfPkg/PlatformDxe/AmdSev.c >>> > -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#72436): https://edk2.groups.io/g/devel/message/72436 Mute This Topic: https://groups.io/mt/81036365/1787277 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org] -=-=-=-=-=-=-=-=-=-=-=-
On Wed, Mar 03, 2021 at 01:25:40PM -0500, Tobin Feldman-Fitzthum wrote: > > > Hi Tobin, > > > > On 03/02/21 21:48, Tobin Feldman-Fitzthum wrote: > > > This is a demonstration of fast migration for encrypted virtual machines > > > using a Migration Handler that lives in OVMF. This demo uses AMD SEV, > > > but the ideas may generalize to other confidential computing platforms. > > > With AMD SEV, guest memory is encrypted and the hypervisor cannot access > > > or move it. This makes migration tricky. In this demo, we show how the > > > HV can ask a Migration Handler (MH) in the firmware for an encrypted > > > page. The MH encrypts the page with a transport key prior to releasing > > > it to the HV. The target machine also runs an MH that decrypts the page > > > once it is passed in by the target HV. These patches are not ready for > > > production, but the are a full end-to-end solution that facilitates a > > > fast live migration between two SEV VMs. > > > > > > Corresponding patches for QEMU have been posted my colleague Dov Murik > > > on qemu-devel. Our approach needs little kernel support, requiring only > > > one hypercall that the guest can use to mark a page as encrypted or > > > shared. This series includes updated patches from Ashish Kalra and > > > Brijesh Singh that allow OVMF to use this hypercall. > > > > > > The MH runs continuously in the guest, waiting for communication from > > > the HV. The HV starts an additional vCPU for the MH but does not expose > > > it to the guest OS via ACPI. We use the MpService to start the MH. The > > > MpService is only available at runtime and processes that are started by > > > it are usually cleaned up on ExitBootServices. Since we need the MH to > > > run continuously, we had to make some modifications. Ideally a feature > > > could be added to the MpService to allow for the starting of > > > long-running processes. Besides migration, this could support other > > > background processes that need to operate within the encryption > > > boundary. For now, we have included a handful of patches that modify the > > > MpService to allow the MH to keep running after ExitBootServices. These > > > are temporary. > > I plan to do a lightweight review for this series. (My understanding is > > that it's an RFC and not actually being proposed for merging.) > > > > Regarding the MH's availability at runtime -- does that necessarily > > require the isolation of an AP? Because in the current approach, > > allowing the MP Services to survive into OS runtime (in some form or > > another) seems critical, and I don't think it's going to fly. > > > > I agree that the UefiCpuPkg patches have been well separated from the > > rest of the series, but I'm somewhat doubtful the "firmware-initiated > > background process" idea will be accepted. Have you investigated > > exposing a new "runtime service" (a function pointer) via the UEFI > > Configuration table, and calling that (perhaps periodically?) from the > > guest kernel? It would be a form of polling I guess. Or maybe, poll the > > mailbox directly in the kernel, and call the new firmware runtime > > service when there's an actual command to process. > Continuous runtime availability for the MH is almost certainly the most > controversial part of this proposal, which is why I put it in the cover > letter and why it's good to discuss. > > (You do spell out "little kernel support", and I'm not sure if that's a > > technical benefit, or a political / community benefit.) > > As you allude to, minimal kernel support is really one of the main things > that shapes our approach. This is partly a political and practical benefit, > but there are also technical benefits. Having the MH in firmware likely > leads to higher availability. It can be accessed when the OS is unreachable, > perhaps during boot or when the OS is hung. There are also potential > portability advantages although we do currently require support for one > hypercall. The cost of implementing this hypercall is low. > > Generally speaking, our task is to find a home for functionality that was > traditionally provided by the hypervisor, but that needs to be inside the > trust domain, but that isn't really part of a guest. A meta-goal of this > project is to figure out the best way to do this. > > > > > I'm quite uncomfortable with an attempt to hide a CPU from the OS via > > ACPI. The OS has other ways to learn (for example, a boot loader could > > use the MP services itself, stash the information, and hand it to the OS > > kernel -- this would minimally allow for detecting an inconsistency in > > the OS). What about "all-but-self" IPIs too -- the kernel might think > > all the processors it's poking like that were under its control. > > This might be the second most controversial piece. Here's a question: if we > could successfully hide the MH vCPU from the OS, would it still make you > uncomfortable? In other words, is the worry that there might be some > inconsistency or more generally that there is something hidden from the OS? > One thing to think about is that the guest owner should generally be aware > that there is a migration handler running. The way I see it, a guest owner > of an SEV VM would need to opt-in to migration and should then expect that > there is an MH running even if they aren't able to see it. Of course we need > to be certain that the MH isn't going to break the OS. > > > Also, as far as I can tell from patch #7, the AP seems to be > > busy-looping (with a CpuPause() added in), for the entire lifetime of > > the OS. Do I understand right? If so -- is it a temporary trait as well? > > In our approach the MH continuously checks for commands from the hypervisor. > There are potentially ways to optimize this, such as having the hypervisor > de-schedule the MH vCPU while not migrating. You could potentially shut down > down the MH on the target after receiving the MH_RESET command (when the > migration finishes), but what if you want to migrate that VM somewhere else? > I think another approach can be considered here, why not implement MH vCPU(s) as hot-plugged vCPU(s), basically hot-plug a new vCPU when migration is started and hot unplug the vCPU when migration is completed, then we won't need a vCPU running (and potentially consuming cycles) forever and busy-looping with CpuPause(). Thanks, Ashish -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#72488): https://edk2.groups.io/g/devel/message/72488 Mute This Topic: https://groups.io/mt/81036365/1787277 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org] -=-=-=-=-=-=-=-=-=-=-=-
On Fri, Mar 05, 2021 at 10:44:23AM +0000, Ashish Kalra wrote: > On Wed, Mar 03, 2021 at 01:25:40PM -0500, Tobin Feldman-Fitzthum wrote: > > > > > Hi Tobin, > > > > > > On 03/02/21 21:48, Tobin Feldman-Fitzthum wrote: > > > > This is a demonstration of fast migration for encrypted virtual machines > > > > using a Migration Handler that lives in OVMF. This demo uses AMD SEV, > > > > but the ideas may generalize to other confidential computing platforms. > > > > With AMD SEV, guest memory is encrypted and the hypervisor cannot access > > > > or move it. This makes migration tricky. In this demo, we show how the > > > > HV can ask a Migration Handler (MH) in the firmware for an encrypted > > > > page. The MH encrypts the page with a transport key prior to releasing > > > > it to the HV. The target machine also runs an MH that decrypts the page > > > > once it is passed in by the target HV. These patches are not ready for > > > > production, but the are a full end-to-end solution that facilitates a > > > > fast live migration between two SEV VMs. > > > > > > > > Corresponding patches for QEMU have been posted my colleague Dov Murik > > > > on qemu-devel. Our approach needs little kernel support, requiring only > > > > one hypercall that the guest can use to mark a page as encrypted or > > > > shared. This series includes updated patches from Ashish Kalra and > > > > Brijesh Singh that allow OVMF to use this hypercall. > > > > > > > > The MH runs continuously in the guest, waiting for communication from > > > > the HV. The HV starts an additional vCPU for the MH but does not expose > > > > it to the guest OS via ACPI. We use the MpService to start the MH. The > > > > MpService is only available at runtime and processes that are started by > > > > it are usually cleaned up on ExitBootServices. Since we need the MH to > > > > run continuously, we had to make some modifications. Ideally a feature > > > > could be added to the MpService to allow for the starting of > > > > long-running processes. Besides migration, this could support other > > > > background processes that need to operate within the encryption > > > > boundary. For now, we have included a handful of patches that modify the > > > > MpService to allow the MH to keep running after ExitBootServices. These > > > > are temporary. > > > I plan to do a lightweight review for this series. (My understanding is > > > that it's an RFC and not actually being proposed for merging.) > > > > > > Regarding the MH's availability at runtime -- does that necessarily > > > require the isolation of an AP? Because in the current approach, > > > allowing the MP Services to survive into OS runtime (in some form or > > > another) seems critical, and I don't think it's going to fly. > > > > > > I agree that the UefiCpuPkg patches have been well separated from the > > > rest of the series, but I'm somewhat doubtful the "firmware-initiated > > > background process" idea will be accepted. Have you investigated > > > exposing a new "runtime service" (a function pointer) via the UEFI > > > Configuration table, and calling that (perhaps periodically?) from the > > > guest kernel? It would be a form of polling I guess. Or maybe, poll the > > > mailbox directly in the kernel, and call the new firmware runtime > > > service when there's an actual command to process. > > Continuous runtime availability for the MH is almost certainly the most > > controversial part of this proposal, which is why I put it in the cover > > letter and why it's good to discuss. > > > (You do spell out "little kernel support", and I'm not sure if that's a > > > technical benefit, or a political / community benefit.) > > > > As you allude to, minimal kernel support is really one of the main things > > that shapes our approach. This is partly a political and practical benefit, > > but there are also technical benefits. Having the MH in firmware likely > > leads to higher availability. It can be accessed when the OS is unreachable, > > perhaps during boot or when the OS is hung. There are also potential > > portability advantages although we do currently require support for one > > hypercall. The cost of implementing this hypercall is low. > > > > Generally speaking, our task is to find a home for functionality that was > > traditionally provided by the hypervisor, but that needs to be inside the > > trust domain, but that isn't really part of a guest. A meta-goal of this > > project is to figure out the best way to do this. > > > > > > > > I'm quite uncomfortable with an attempt to hide a CPU from the OS via > > > ACPI. The OS has other ways to learn (for example, a boot loader could > > > use the MP services itself, stash the information, and hand it to the OS > > > kernel -- this would minimally allow for detecting an inconsistency in > > > the OS). What about "all-but-self" IPIs too -- the kernel might think > > > all the processors it's poking like that were under its control. > > > > This might be the second most controversial piece. Here's a question: if we > > could successfully hide the MH vCPU from the OS, would it still make you > > uncomfortable? In other words, is the worry that there might be some > > inconsistency or more generally that there is something hidden from the OS? > > One thing to think about is that the guest owner should generally be aware > > that there is a migration handler running. The way I see it, a guest owner > > of an SEV VM would need to opt-in to migration and should then expect that > > there is an MH running even if they aren't able to see it. Of course we need > > to be certain that the MH isn't going to break the OS. > > > > > Also, as far as I can tell from patch #7, the AP seems to be > > > busy-looping (with a CpuPause() added in), for the entire lifetime of > > > the OS. Do I understand right? If so -- is it a temporary trait as well? > > > > In our approach the MH continuously checks for commands from the hypervisor. > > There are potentially ways to optimize this, such as having the hypervisor > > de-schedule the MH vCPU while not migrating. You could potentially shut down > > down the MH on the target after receiving the MH_RESET command (when the > > migration finishes), but what if you want to migrate that VM somewhere else? > > > > I think another approach can be considered here, why not implement MH > vCPU(s) as hot-plugged vCPU(s), basically hot-plug a new vCPU when migration > is started and hot unplug the vCPU when migration is completed, then we > won't need a vCPU running (and potentially consuming cycles) forever and > busy-looping with CpuPause(). > After internal discussions, realized that this approach will not work as vCPU hotplug will not work for SEV-ES, SNP. As the VMSA has to be encrypted as part of the LAUNCH command, therefore we can't create/add a new vCPU after LAUNCH has completed. Thanks, Ashish -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#72500): https://edk2.groups.io/g/devel/message/72500 Mute This Topic: https://groups.io/mt/81036365/1787277 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org] -=-=-=-=-=-=-=-=-=-=-=-
> On Fri, Mar 05, 2021 at 10:44:23AM +0000, Ashish Kalra wrote: >> On Wed, Mar 03, 2021 at 01:25:40PM -0500, Tobin Feldman-Fitzthum wrote: >>>> Hi Tobin, >>>> >>>> On 03/02/21 21:48, Tobin Feldman-Fitzthum wrote: >>>>> This is a demonstration of fast migration for encrypted virtual machines >>>>> using a Migration Handler that lives in OVMF. This demo uses AMD SEV, >>>>> but the ideas may generalize to other confidential computing platforms. >>>>> With AMD SEV, guest memory is encrypted and the hypervisor cannot access >>>>> or move it. This makes migration tricky. In this demo, we show how the >>>>> HV can ask a Migration Handler (MH) in the firmware for an encrypted >>>>> page. The MH encrypts the page with a transport key prior to releasing >>>>> it to the HV. The target machine also runs an MH that decrypts the page >>>>> once it is passed in by the target HV. These patches are not ready for >>>>> production, but the are a full end-to-end solution that facilitates a >>>>> fast live migration between two SEV VMs. >>>>> >>>>> Corresponding patches for QEMU have been posted my colleague Dov Murik >>>>> on qemu-devel. Our approach needs little kernel support, requiring only >>>>> one hypercall that the guest can use to mark a page as encrypted or >>>>> shared. This series includes updated patches from Ashish Kalra and >>>>> Brijesh Singh that allow OVMF to use this hypercall. >>>>> >>>>> The MH runs continuously in the guest, waiting for communication from >>>>> the HV. The HV starts an additional vCPU for the MH but does not expose >>>>> it to the guest OS via ACPI. We use the MpService to start the MH. The >>>>> MpService is only available at runtime and processes that are started by >>>>> it are usually cleaned up on ExitBootServices. Since we need the MH to >>>>> run continuously, we had to make some modifications. Ideally a feature >>>>> could be added to the MpService to allow for the starting of >>>>> long-running processes. Besides migration, this could support other >>>>> background processes that need to operate within the encryption >>>>> boundary. For now, we have included a handful of patches that modify the >>>>> MpService to allow the MH to keep running after ExitBootServices. These >>>>> are temporary. >>>> I plan to do a lightweight review for this series. (My understanding is >>>> that it's an RFC and not actually being proposed for merging.) >>>> >>>> Regarding the MH's availability at runtime -- does that necessarily >>>> require the isolation of an AP? Because in the current approach, >>>> allowing the MP Services to survive into OS runtime (in some form or >>>> another) seems critical, and I don't think it's going to fly. >>>> >>>> I agree that the UefiCpuPkg patches have been well separated from the >>>> rest of the series, but I'm somewhat doubtful the "firmware-initiated >>>> background process" idea will be accepted. Have you investigated >>>> exposing a new "runtime service" (a function pointer) via the UEFI >>>> Configuration table, and calling that (perhaps periodically?) from the >>>> guest kernel? It would be a form of polling I guess. Or maybe, poll the >>>> mailbox directly in the kernel, and call the new firmware runtime >>>> service when there's an actual command to process. >>> Continuous runtime availability for the MH is almost certainly the most >>> controversial part of this proposal, which is why I put it in the cover >>> letter and why it's good to discuss. >>>> (You do spell out "little kernel support", and I'm not sure if that's a >>>> technical benefit, or a political / community benefit.) >>> As you allude to, minimal kernel support is really one of the main things >>> that shapes our approach. This is partly a political and practical benefit, >>> but there are also technical benefits. Having the MH in firmware likely >>> leads to higher availability. It can be accessed when the OS is unreachable, >>> perhaps during boot or when the OS is hung. There are also potential >>> portability advantages although we do currently require support for one >>> hypercall. The cost of implementing this hypercall is low. >>> >>> Generally speaking, our task is to find a home for functionality that was >>> traditionally provided by the hypervisor, but that needs to be inside the >>> trust domain, but that isn't really part of a guest. A meta-goal of this >>> project is to figure out the best way to do this. >>> >>>> I'm quite uncomfortable with an attempt to hide a CPU from the OS via >>>> ACPI. The OS has other ways to learn (for example, a boot loader could >>>> use the MP services itself, stash the information, and hand it to the OS >>>> kernel -- this would minimally allow for detecting an inconsistency in >>>> the OS). What about "all-but-self" IPIs too -- the kernel might think >>>> all the processors it's poking like that were under its control. >>> This might be the second most controversial piece. Here's a question: if we >>> could successfully hide the MH vCPU from the OS, would it still make you >>> uncomfortable? In other words, is the worry that there might be some >>> inconsistency or more generally that there is something hidden from the OS? >>> One thing to think about is that the guest owner should generally be aware >>> that there is a migration handler running. The way I see it, a guest owner >>> of an SEV VM would need to opt-in to migration and should then expect that >>> there is an MH running even if they aren't able to see it. Of course we need >>> to be certain that the MH isn't going to break the OS. >>> >>>> Also, as far as I can tell from patch #7, the AP seems to be >>>> busy-looping (with a CpuPause() added in), for the entire lifetime of >>>> the OS. Do I understand right? If so -- is it a temporary trait as well? >>> In our approach the MH continuously checks for commands from the hypervisor. >>> There are potentially ways to optimize this, such as having the hypervisor >>> de-schedule the MH vCPU while not migrating. You could potentially shut down >>> down the MH on the target after receiving the MH_RESET command (when the >>> migration finishes), but what if you want to migrate that VM somewhere else? >>> >> I think another approach can be considered here, why not implement MH >> vCPU(s) as hot-plugged vCPU(s), basically hot-plug a new vCPU when migration >> is started and hot unplug the vCPU when migration is completed, then we >> won't need a vCPU running (and potentially consuming cycles) forever and >> busy-looping with CpuPause(). >> > After internal discussions, realized that this approach will not work as > vCPU hotplug will not work for SEV-ES, SNP. As the VMSA has to be > encrypted as part of the LAUNCH command, therefore we can't create/add a > new vCPU after LAUNCH has completed. > > Thanks, > Ashish Hm yeah we talked about hotplug a bit. It was never clear how it would square with OVMF. -Tobin -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#72508): https://edk2.groups.io/g/devel/message/72508 Mute This Topic: https://groups.io/mt/81036365/1787277 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org] -=-=-=-=-=-=-=-=-=-=-=-
Hi Tobin, as mentioned in the reply to the QEMU patches posted by Tobin, I think the firmware helper approach is very good, but there are some disadvantages in the idea of auxiliary vCPUs. These are especially true in the VMM, where it's much nicer to have a separate VM that goes through a specialized run loop; however, even in the firmware level there are some complications (as you pointed out) in letting MpService workers run after ExitBootServices. My idea would be that the firmware would start the VM as usual using the same launch data; then, the firmware would detect it was running as a migration helper VM during the SEC or PEI phases (for example via the GHCB or some other unencrypted communication area), and divert execution to the migration helper instead of proceeding to the next boot phase. This would be somewhat similar in spirit to how edk2 performs S3 resume, if my memory serves correctly. What do you think? Thanks, Paolo -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#72431): https://edk2.groups.io/g/devel/message/72431 Mute This Topic: https://groups.io/mt/81036365/1787277 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org] -=-=-=-=-=-=-=-=-=-=-=-
On 03/04/21 10:21, Paolo Bonzini wrote: > Hi Tobin, > > as mentioned in the reply to the QEMU patches posted by Tobin, I > think the firmware helper approach is very good, but there are some > disadvantages in the idea of auxiliary vCPUs. These are especially > true in the VMM, where it's much nicer to have a separate VM that > goes through a specialized run loop; however, even in the firmware > level there are some complications (as you pointed out) in letting > MpService workers run after ExitBootServices. > > My idea would be that the firmware would start the VM as usual using > the same launch data; then, the firmware would detect it was running > as a migration helper VM during the SEC or PEI phases (for example > via the GHCB or some other unencrypted communication area), and > divert execution to the migration helper instead of proceeding to the > next boot phase. This would be somewhat similar in spirit to how edk2 > performs S3 resume, if my memory serves correctly. Very cool. You'd basically warm-reboot the virtual machine into a new boot mode (cf. BOOT_WITH_FULL_CONFIGURATION vs. BOOT_ON_S3_RESUME in OvmfPkg/PlatformPei). To me that's much more attractive than a "background job". The S3 parallel is great. What I'm missing is: - Is it possible to warm-reboot an SEV VM? (I vaguely recall that it's not possible for SEV-ES at least.) Because, that's how we'd transfer control to the early parts of the firmware again, IIUC your idea, while preserving the memory contents. - Who would initiate this process? S3 suspend is guest-initiated. (Not that we couldn't use the guest agent, if needed.) (In case the idea is really about a separate VM, and not about rebooting the already running VM, then I don't understand -- how would a separate VM access the guest RAM that needs to be migrated?) NB in the X64 PEI phase of OVMF, only the first 4GB of RAM is mapped, so the migration handler would have to build its own page table under this approach too. Thanks! Laszlo -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#72452): https://edk2.groups.io/g/devel/message/72452 Mute This Topic: https://groups.io/mt/81036365/1787277 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org] -=-=-=-=-=-=-=-=-=-=-=-
On 03/04/21 21:45, Laszlo Ersek wrote: > On 03/04/21 10:21, Paolo Bonzini wrote: >> Hi Tobin, >> >> as mentioned in the reply to the QEMU patches posted by Tobin, I >> think the firmware helper approach is very good, but there are some >> disadvantages in the idea of auxiliary vCPUs. These are especially >> true in the VMM, where it's much nicer to have a separate VM that >> goes through a specialized run loop; however, even in the firmware >> level there are some complications (as you pointed out) in letting >> MpService workers run after ExitBootServices. >> >> My idea would be that the firmware would start the VM as usual using >> the same launch data; then, the firmware would detect it was running >> as a migration helper VM during the SEC or PEI phases (for example >> via the GHCB or some other unencrypted communication area), and >> divert execution to the migration helper instead of proceeding to the >> next boot phase. This would be somewhat similar in spirit to how edk2 >> performs S3 resume, if my memory serves correctly. > > Very cool. You'd basically warm-reboot the virtual machine into a new > boot mode (cf. BOOT_WITH_FULL_CONFIGURATION vs. BOOT_ON_S3_RESUME in > OvmfPkg/PlatformPei). > > To me that's much more attractive than a "background job". > > The S3 parallel is great. What I'm missing is: > > - Is it possible to warm-reboot an SEV VM? (I vaguely recall that it's > not possible for SEV-ES at least.) Because, that's how we'd transfer > control to the early parts of the firmware again, IIUC your idea, while > preserving the memory contents. > > - Who would initiate this process? S3 suspend is guest-initiated. (Not > that we couldn't use the guest agent, if needed.) > > (In case the idea is really about a separate VM, and not about rebooting > the already running VM, then I don't understand -- how would a separate > VM access the guest RAM that needs to be migrated?) Sorry -- I've just caught up with the QEMU thread. Your message there: https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg01220.html says: Patches were posted recently to the KVM mailing list to create secondary VMs sharing the encryption context (ASID) with a primary VM I did think of VMs sharing memory, but the goal of SEV seemed to be to prevent exactly that, so I didn't think that was possible. I stand corrected, and yes, this way I understand -- and welcome -- a completely separate VM snooping the migration subject VM's memory. My question would be then whether the migration helper VM would run on its own memory, and just read out the other VM's memory -- or the MH VM would run somewhere inside the original VM's memory (which sounds a lot riskier). But your message explains that too: The main advantage would be that the migration VM would not have to share the address space with the primary VM This sounds ideal; it should allow for a completely independent firmware platform -- we wouldn't even have to call it "OVMF", and it might not even have to contain the DXE Core and later-phase components. (Of course if it's more convenient to keep the stuff in OVMF, that works too.) (For some unsolicited personal information, now I feel less bad about this idea never occurring to me -- I never knew about the KVM patch set that would enable encryption context sharing. (TBH I thought that was prevented, by design, in the SEV hardware...)) A workflow request to Tobin and Dov -- when posting closely interfacing QEMU and edk2 series, it's best to cross-post both series to both lists, and to CC everybody on everything. Feel free to use subject prefixes like [qemu PATCH] and [edk2 PATCH] for clarity. It's been difficult for me to follow both discussions (it doesn't help that I've been CC'd on neither). Thanks! Laszlo > > NB in the X64 PEI phase of OVMF, only the first 4GB of RAM is mapped, so > the migration handler would have to build its own page table under this > approach too. > > Thanks! > Laszlo > -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#72454): https://edk2.groups.io/g/devel/message/72454 Mute This Topic: https://groups.io/mt/81036365/1787277 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org] -=-=-=-=-=-=-=-=-=-=-=-
On 04/03/21 21:45, Laszlo Ersek wrote: > On 03/04/21 10:21, Paolo Bonzini wrote: >> Hi Tobin, >> >> as mentioned in the reply to the QEMU patches posted by Tobin, I >> think the firmware helper approach is very good, but there are some >> disadvantages in the idea of auxiliary vCPUs. These are especially >> true in the VMM, where it's much nicer to have a separate VM that >> goes through a specialized run loop; however, even in the firmware >> level there are some complications (as you pointed out) in letting >> MpService workers run after ExitBootServices. >> >> My idea would be that the firmware would start the VM as usual using >> the same launch data; then, the firmware would detect it was running >> as a migration helper VM during the SEC or PEI phases (for example >> via the GHCB or some other unencrypted communication area), and >> divert execution to the migration helper instead of proceeding to the >> next boot phase. This would be somewhat similar in spirit to how edk2 >> performs S3 resume, if my memory serves correctly. > > Very cool. You'd basically warm-reboot the virtual machine into a new > boot mode (cf. BOOT_WITH_FULL_CONFIGURATION vs. BOOT_ON_S3_RESUME in > OvmfPkg/PlatformPei). > > To me that's much more attractive than a "background job". > > The S3 parallel is great. What I'm missing is: > > - Is it possible to warm-reboot an SEV VM? (I vaguely recall that it's > not possible for SEV-ES at least.) Because, that's how we'd transfer > control to the early parts of the firmware again, IIUC your idea, while > preserving the memory contents. It's not exactly a warm reboot. It's two VMs booted at the same time, with exactly the same contents as far as encrypted RAM goes, but different unencrypted RAM. The difference makes one VM boot regularly and the other end up in the migration helper. The migration helper can be entirely contained in PEI, or it can even be its own OS, stored as a flat binary in the firmware. Whatever is easier. The divergence would happen much earlier than S3 though. It would have to happen before the APs are brought up, for example, and essentially before the first fw_cfg access if (as is likely) the migration helper VM does not have fw_cfg at all. That's why I brought up the possibility of diverging as soon as SEC. > - Who would initiate this process? S3 suspend is guest-initiated. (Not > that we couldn't use the guest agent, if needed.) > > (In case the idea is really about a separate VM, and not about rebooting > the already running VM, then I don't understand -- how would a separate > VM access the guest RAM that needs to be migrated?) Answering the other message: > (For some unsolicited personal information, now I feel less bad about > this idea never occurring to me -- I never knew about the KVM patch set > that would enable encryption context sharing. (TBH I thought that was > prevented, by design, in the SEV hardware...)) As far as the SEV hardware is concerned, a "VM" is defined by the ASID. The VM would be separate at the KVM level, but it would share the ASID (and thus the guest RAM) with the primary VM. So as far as the SEV hardware and the processor are concerned, the separate VM would be just one more VMCB that runs with that ASID. Only KVM knows that they are backed by different file descriptors etc. In fact, another advantage is that it would be much easier to scale the migration helper to multiple vCPUs. This is probably also a case for diverging much earlier than PEI, because a multi-processor migration helper running in PEI or DXE would require ACPI tables and a lot of infrastructure that is probably undesirable. Paolo -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#72486): https://edk2.groups.io/g/devel/message/72486 Mute This Topic: https://groups.io/mt/81036365/1787277 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org] -=-=-=-=-=-=-=-=-=-=-=-
Hi Tobin Thanks for your patch. You may that Intel is working on TDX for the same live migration feature. Please give me some time (about 1 work week) to digest and evaluate the patch and impact. Then I will provide feedback. Thank you Yao Jiewen > -----Original Message----- > From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Tobin > Feldman-Fitzthum > Sent: Wednesday, March 3, 2021 4:48 AM > To: devel@edk2.groups.io > Cc: Dov Murik <dovmurik@linux.vnet.ibm.com>; Tobin Feldman-Fitzthum > <tobin@ibm.com>; Tobin Feldman-Fitzthum <tobin@linux.ibm.com>; James > Bottomley <jejb@linux.ibm.com>; Hubertus Franke <frankeh@us.ibm.com>; > Brijesh Singh <brijesh.singh@amd.com>; Ashish Kalra <ashish.kalra@amd.com>; > Jon Grimm <jon.grimm@amd.com>; Tom Lendacky > <thomas.lendacky@amd.com> > Subject: [edk2-devel] [RFC PATCH 00/14] Firmware Support for Fast Live > Migration for AMD SEV > > This is a demonstration of fast migration for encrypted virtual machines > using a Migration Handler that lives in OVMF. This demo uses AMD SEV, > but the ideas may generalize to other confidential computing platforms. > With AMD SEV, guest memory is encrypted and the hypervisor cannot access > or move it. This makes migration tricky. In this demo, we show how the > HV can ask a Migration Handler (MH) in the firmware for an encrypted > page. The MH encrypts the page with a transport key prior to releasing > it to the HV. The target machine also runs an MH that decrypts the page > once it is passed in by the target HV. These patches are not ready for > production, but the are a full end-to-end solution that facilitates a > fast live migration between two SEV VMs. > > Corresponding patches for QEMU have been posted my colleague Dov Murik > on qemu-devel. Our approach needs little kernel support, requiring only > one hypercall that the guest can use to mark a page as encrypted or > shared. This series includes updated patches from Ashish Kalra and > Brijesh Singh that allow OVMF to use this hypercall. > > The MH runs continuously in the guest, waiting for communication from > the HV. The HV starts an additional vCPU for the MH but does not expose > it to the guest OS via ACPI. We use the MpService to start the MH. The > MpService is only available at runtime and processes that are started by > it are usually cleaned up on ExitBootServices. Since we need the MH to > run continuously, we had to make some modifications. Ideally a feature > could be added to the MpService to allow for the starting of > long-running processes. Besides migration, this could support other > background processes that need to operate within the encryption > boundary. For now, we have included a handful of patches that modify the > MpService to allow the MH to keep running after ExitBootServices. These > are temporary. > > Ashish Kalra (2): > OvmfPkg/PlatformPei: Mark SEC GHCB page in the page encrpytion bitmap. > OvmfPkg/PlatformDxe: Add support for SEV live migration. > > Brijesh Singh (1): > OvmfPkg/BaseMemEncryptLib: Support to issue unencrypted hypercall > > Dov Murik (1): > OvmfPkg/AmdSev: Build page table for migration handler > > Tobin Feldman-Fitzthum (10): > OvmfPkg/AmdSev: Base for Confidential Migration Handler > OvmfPkg/PlatfomPei: Set Confidential Migration PCD > OvmfPkg/AmdSev: Setup Migration Handler Mailbox > OvmfPkg/AmdSev: MH support for mailbox protocol > UefiCpuPkg/MpInitLib: temp removal of MpLib cleanup > UefiCpuPkg/MpInitLib: Allocate MP buffer as runtime memory > UefiCpuPkg/CpuExceptionHandlerLib: Exception handling as runtime > memory > OvmfPkg/AmdSev: Don't overwrite mailbox or pagetables > OvmfPkg/AmdSev: Don't overwrite MH stack > OvmfPkg/AmdSev: MH page encryption POC > > OvmfPkg/OvmfPkg.dec | 11 + > OvmfPkg/AmdSev/AmdSevX64.dsc | 2 + > OvmfPkg/AmdSev/AmdSevX64.fdf | 13 +- > .../ConfidentialMigrationDxe.inf | 45 +++ > .../ConfidentialMigrationPei.inf | 35 ++ > .../DxeMemEncryptSevLib.inf | 1 + > .../PeiMemEncryptSevLib.inf | 1 + > OvmfPkg/PlatformDxe/Platform.inf | 2 + > OvmfPkg/PlatformPei/PlatformPei.inf | 2 + > UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf | 2 + > UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf | 2 + > OvmfPkg/AmdSev/ConfidentialMigration/MpLib.h | 235 +++++++++++++ > .../ConfidentialMigration/VirtualMemory.h | 177 ++++++++++ > OvmfPkg/Include/Guid/MemEncryptLib.h | 16 + > OvmfPkg/PlatformDxe/PlatformConfig.h | 5 + > .../ConfidentialMigrationDxe.c | 325 ++++++++++++++++++ > .../ConfidentialMigrationPei.c | 25 ++ > .../X64/PeiDxeVirtualMemory.c | 18 + > OvmfPkg/PlatformDxe/AmdSev.c | 99 ++++++ > OvmfPkg/PlatformDxe/Platform.c | 6 + > OvmfPkg/PlatformPei/AmdSev.c | 10 + > OvmfPkg/PlatformPei/Platform.c | 10 + > .../CpuExceptionHandlerLib/DxeException.c | 8 +- > UefiCpuPkg/Library/MpInitLib/DxeMpLib.c | 21 +- > UefiCpuPkg/Library/MpInitLib/MpLib.c | 7 +- > 25 files changed, 1061 insertions(+), 17 deletions(-) > create mode 100644 > OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationDxe.inf > create mode 100644 > OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationPei.inf > create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/MpLib.h > create mode 100644 > OvmfPkg/AmdSev/ConfidentialMigration/VirtualMemory.h > create mode 100644 OvmfPkg/Include/Guid/MemEncryptLib.h > create mode 100644 > OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationDxe.c > create mode 100644 > OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationPei.c > create mode 100644 OvmfPkg/PlatformDxe/AmdSev.c > > -- > 2.20.1 > > > > > -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#72413): https://edk2.groups.io/g/devel/message/72413 Mute This Topic: https://groups.io/mt/81036365/1787277 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org] -=-=-=-=-=-=-=-=-=-=-=-
© 2016 - 2024 Red Hat, Inc.