MdePkg/Include/Library/BaseLib.h | 37 +++ MdePkg/Include/Register/Amd/Fam17Msr.h | 31 ++- MdePkg/Include/Register/Amd/Ghcb.h | 39 ++- MdePkg/Library/BaseLib/BaseLib.inf | 1 + MdePkg/Library/BaseLib/X64/Pvalidate.nasm | 43 +++ OvmfPkg/Include/Library/GhcbRegisterLib.h | 27 ++ OvmfPkg/Include/Library/MemEncryptSevLib.h | 30 +++ .../DxeMemEncryptSevLib.inf | 7 + .../DxeMemEncryptSevLibInternal.c | 27 ++ .../Ia32/SnpPageStateChange.c | 17 ++ .../PeiMemEncryptSevLib.inf | 9 + .../PeiMemEncryptSevLibInternal.c | 47 ++++ .../SecMemEncryptSevLib.inf | 4 + .../SecMemEncryptSevLibInternal.c | 39 +++ .../BaseMemEncryptSevLib/SnpPageStateChange.h | 37 +++ .../X64/PeiDxeSnpSetPageState.c | 63 +++++ .../X64/PeiDxeVirtualMemory.c | 151 ++++++++++- .../X64/PeiSnpSystemRamValidate.c | 129 +++++++++ .../X64/SecSnpSystemRamValidate.c | 23 ++ .../X64/SnpPageStateChangeInternal.c | 254 ++++++++++++++++++ .../X64/SnpPageStateTrack.c | 119 ++++++++ .../X64/SnpPageStateTrack.h | 36 +++ .../X64/SnpSetPageState.h | 27 ++ .../BaseMemEncryptSevLib/X64/VirtualMemory.h | 19 ++ .../Library/GhcbRegisterLib/GhcbRegisterLib.c | 97 +++++++ .../GhcbRegisterLib/GhcbRegisterLib.inf | 33 +++ OvmfPkg/Library/VmgExitLib/SecVmgExitLib.inf | 4 + OvmfPkg/Library/VmgExitLib/VmgExitLib.inf | 7 + OvmfPkg/Library/VmgExitLib/VmgExitVcHandler.c | 45 ++++ OvmfPkg/OvmfPkg.dec | 12 + OvmfPkg/OvmfPkgX64.dsc | 1 + OvmfPkg/OvmfPkgX64.fdf | 33 ++- OvmfPkg/PlatformPei/AmdSev.c | 52 ++++ OvmfPkg/PlatformPei/PlatformPei.inf | 2 + OvmfPkg/ResetVector/Ia16/ResetVectorVtf0.asm | 24 ++ OvmfPkg/ResetVector/Ia32/PageTables64.asm | 106 ++++++++ OvmfPkg/ResetVector/ResetVector.inf | 5 + OvmfPkg/ResetVector/ResetVector.nasmb | 4 + OvmfPkg/Sec/SecMain.c | 102 +++++++ OvmfPkg/Sec/SecMain.inf | 2 + UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf | 1 + UefiCpuPkg/Library/MpInitLib/MpEqu.inc | 1 + UefiCpuPkg/Library/MpInitLib/MpLib.c | 2 + UefiCpuPkg/Library/MpInitLib/MpLib.h | 2 + UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf | 1 + UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm | 51 ++++ UefiCpuPkg/UefiCpuPkg.dec | 6 + 47 files changed, 1790 insertions(+), 19 deletions(-) create mode 100644 MdePkg/Library/BaseLib/X64/Pvalidate.nasm create mode 100644 OvmfPkg/Include/Library/GhcbRegisterLib.h create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/Ia32/SnpPageStateChange.c create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/SnpPageStateChange.h create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/PeiDxeSnpSetPageState.c create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/PeiSnpSystemRamValidate.c create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SecSnpSystemRamValidate.c create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpPageStateChangeInternal.c create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpPageStateTrack.c create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpPageStateTrack.h create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpSetPageState.h create mode 100644 OvmfPkg/Library/GhcbRegisterLib/GhcbRegisterLib.c create mode 100644 OvmfPkg/Library/GhcbRegisterLib/GhcbRegisterLib.inf
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=3275 SEV-SNP builds upon existing SEV and SEV-ES functionality while adding new hardware-based memory protections. SEV-SNP adds strong memory integrity protection to help prevent malicious hypervisor-based attacks like data replay, memory re-mapping and more in order to create an isolated memory encryption environment. This series provides the basic building blocks to support booting the SEV-SNP VMs, it does not cover all the security enhancement introduced by the SEV-SNP such as interrupt protection. Many of the integrity guarantees of SEV-SNP are enforced through a new structure called the Reverse Map Table (RMP). Adding a new page to SEV-SNP VM requires a 2-step process. First, the hypervisor assigns a page to the guest using the new RMPUPDATE instruction. This transitions the page to guest-invalid. Second, the guest validates the page using the new PVALIDATE instruction. The SEV-SNP VMs can use the new "Page State Change Request NAE" defined in the GHCB specification to ask hypervisor to add or remove page from the RMP table. Each page assigned to the SEV-SNP VM can either be validated or unvalidated, as indicated by the Validated flag in the page's RMP entry. There are two approaches that can be taken for the page validation: Pre-validation and Lazy Validation. Under pre-validation, the pages are validated prior to first use. And under lazy validation, pages are validated when first accessed. An access to a unvalidated page results in a #VC exception, at which time the exception handler may validate the page. Lazy validation requires careful tracking of the validated pages to avoid validating the same GPA more than once. The recently introduced "Unaccepted" memory type can be used to communicate the unvalidated memory ranges to the Guest OS. At this time we only support the pre-validation. OVMF detects all the available system RAM in the PEI phase. When SEV-SNP is enabled, the memory is validated before it is made available to the EDK2 core. This series does not implements the following SEV-SNP features yet: * CPUID filtering * AP bring up using the new SEV-SNP NAE * Lazy validation * Interrupt security The series is based on commit: e542e05d4f UefiCpuPkg/SmmCpuFeaturesLib: Abstract PcdCpuMaxLogicalProcessorNumber Additional resources --------------------- SEV-SNP whitepaper https://www.amd.com/system/files/TechDocs/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf APM 2: https://www.amd.com/system/files/TechDocs/24593.pdf (section 15.36) The complete source is available at https://github.com/AMDESE/ovmf/tree/sev-snp-rfc-1 GHCB spec v2: The draft specification is posted on AMD-SEV-SNP mailing list: https://lists.suse.com/mailman/private/amd-sev-snp/ Copy of the spec is also available at https://github.com/AMDESE/AMDSEV/blob/sev-snp-devel/docs/56421-Guest_Hypervisor_Communication_Block_Standardization.pdf GHCB spec v1: SEV-SNP firmware specification: https://developer.amd.com/sev/ Cc: James Bottomley <jejb@linux.ibm.com> Cc: Min Xu <min.m.xu@intel.com> Cc: Jiewen Yao <jiewen.yao@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Jordan Justen <jordan.l.justen@intel.com> Cc: Ard Biesheuvel <ardb+tianocore@kernel.org> Cc: Laszlo Ersek <lersek@redhat.com> Brijesh Singh (19): OvmfPkg: Reserve the Secrets and Cpuid page for the SEV-SNP guest OvmfPkg: validate the data pages used in the SEC phase MdePkg: Expand the SEV MSR to include the SNP definition OvmfPkg/MemEncryptSevLib: add MemEncryptSevSnpEnabled() MdePkg: Define the GHCB GPA structure UefiCpuPkg/MpLib: add support to register GHCB GPA when SEV-SNP is enabled OvmfPkg: Add a library to support registering GHCB GPA OvmfPkg: register GHCB gpa for the SEV-SNP guest MdePkg: Add AsmPvalidate() support OvmfPkg: Define the Page State Change VMGEXIT structures OvmfPkg/ResetVector: Invalidate the GHCB page OvmfPkg/MemEncryptSevLib: Add support to validate system RAM OvmfPkg/SecMain: Validate the data/code pages used for the PEI phase OvmfPkg/MemEncryptSevLib: Add support to validate RAM in PEI phase OvmfPkg/PlatformPei: Validate the system RAM when SNP is active OvmfPkg/MemEncryptSevLib: Add support to validate > 4GB memory in PEI phase OvmfPkg/VmgExitLib: Allow PMBASE register access in Dxe phase OvmfPkg/MemEncryptSevLib: Validate the memory during set or clear enc attribute OvmfPkg/MemEncryptSevLib: Skip page state change for non RAM region MdePkg/Include/Library/BaseLib.h | 37 +++ MdePkg/Include/Register/Amd/Fam17Msr.h | 31 ++- MdePkg/Include/Register/Amd/Ghcb.h | 39 ++- MdePkg/Library/BaseLib/BaseLib.inf | 1 + MdePkg/Library/BaseLib/X64/Pvalidate.nasm | 43 +++ OvmfPkg/Include/Library/GhcbRegisterLib.h | 27 ++ OvmfPkg/Include/Library/MemEncryptSevLib.h | 30 +++ .../DxeMemEncryptSevLib.inf | 7 + .../DxeMemEncryptSevLibInternal.c | 27 ++ .../Ia32/SnpPageStateChange.c | 17 ++ .../PeiMemEncryptSevLib.inf | 9 + .../PeiMemEncryptSevLibInternal.c | 47 ++++ .../SecMemEncryptSevLib.inf | 4 + .../SecMemEncryptSevLibInternal.c | 39 +++ .../BaseMemEncryptSevLib/SnpPageStateChange.h | 37 +++ .../X64/PeiDxeSnpSetPageState.c | 63 +++++ .../X64/PeiDxeVirtualMemory.c | 151 ++++++++++- .../X64/PeiSnpSystemRamValidate.c | 129 +++++++++ .../X64/SecSnpSystemRamValidate.c | 23 ++ .../X64/SnpPageStateChangeInternal.c | 254 ++++++++++++++++++ .../X64/SnpPageStateTrack.c | 119 ++++++++ .../X64/SnpPageStateTrack.h | 36 +++ .../X64/SnpSetPageState.h | 27 ++ .../BaseMemEncryptSevLib/X64/VirtualMemory.h | 19 ++ .../Library/GhcbRegisterLib/GhcbRegisterLib.c | 97 +++++++ .../GhcbRegisterLib/GhcbRegisterLib.inf | 33 +++ OvmfPkg/Library/VmgExitLib/SecVmgExitLib.inf | 4 + OvmfPkg/Library/VmgExitLib/VmgExitLib.inf | 7 + OvmfPkg/Library/VmgExitLib/VmgExitVcHandler.c | 45 ++++ OvmfPkg/OvmfPkg.dec | 12 + OvmfPkg/OvmfPkgX64.dsc | 1 + OvmfPkg/OvmfPkgX64.fdf | 33 ++- OvmfPkg/PlatformPei/AmdSev.c | 52 ++++ OvmfPkg/PlatformPei/PlatformPei.inf | 2 + OvmfPkg/ResetVector/Ia16/ResetVectorVtf0.asm | 24 ++ OvmfPkg/ResetVector/Ia32/PageTables64.asm | 106 ++++++++ OvmfPkg/ResetVector/ResetVector.inf | 5 + OvmfPkg/ResetVector/ResetVector.nasmb | 4 + OvmfPkg/Sec/SecMain.c | 102 +++++++ OvmfPkg/Sec/SecMain.inf | 2 + UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf | 1 + UefiCpuPkg/Library/MpInitLib/MpEqu.inc | 1 + UefiCpuPkg/Library/MpInitLib/MpLib.c | 2 + UefiCpuPkg/Library/MpInitLib/MpLib.h | 2 + UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf | 1 + UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm | 51 ++++ UefiCpuPkg/UefiCpuPkg.dec | 6 + 47 files changed, 1790 insertions(+), 19 deletions(-) create mode 100644 MdePkg/Library/BaseLib/X64/Pvalidate.nasm create mode 100644 OvmfPkg/Include/Library/GhcbRegisterLib.h create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/Ia32/SnpPageStateChange.c create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/SnpPageStateChange.h create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/PeiDxeSnpSetPageState.c create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/PeiSnpSystemRamValidate.c create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SecSnpSystemRamValidate.c create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpPageStateChangeInternal.c create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpPageStateTrack.c create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpPageStateTrack.h create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpSetPageState.h create mode 100644 OvmfPkg/Library/GhcbRegisterLib/GhcbRegisterLib.c create mode 100644 OvmfPkg/Library/GhcbRegisterLib/GhcbRegisterLib.inf -- 2.17.1 -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#73216): https://edk2.groups.io/g/devel/message/73216 Mute This Topic: https://groups.io/mt/81584576/1787277 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org] -=-=-=-=-=-=-=-=-=-=-=-
On 03/24/21 16:31, Brijesh Singh wrote: > BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=3275 > > SEV-SNP builds upon existing SEV and SEV-ES functionality while adding > new hardware-based memory protections. SEV-SNP adds strong memory integrity > protection to help prevent malicious hypervisor-based attacks like data > replay, memory re-mapping and more in order to create an isolated memory > encryption environment. > > This series provides the basic building blocks to support booting the SEV-SNP > VMs, it does not cover all the security enhancement introduced by the SEV-SNP > such as interrupt protection. Thanks, Brijesh. I'm adding the series to my review queue. Due to some PTO coming up, I'll probably start reviewing this work only in April. Other reviewers, please feel free to have at it. Cheers Laszlo > > Many of the integrity guarantees of SEV-SNP are enforced through a new > structure called the Reverse Map Table (RMP). Adding a new page to SEV-SNP > VM requires a 2-step process. First, the hypervisor assigns a page to the > guest using the new RMPUPDATE instruction. This transitions the page to > guest-invalid. Second, the guest validates the page using the new PVALIDATE > instruction. The SEV-SNP VMs can use the new "Page State Change Request NAE" > defined in the GHCB specification to ask hypervisor to add or remove page > from the RMP table. > > Each page assigned to the SEV-SNP VM can either be validated or unvalidated, > as indicated by the Validated flag in the page's RMP entry. There are two > approaches that can be taken for the page validation: Pre-validation and > Lazy Validation. > > Under pre-validation, the pages are validated prior to first use. And under > lazy validation, pages are validated when first accessed. An access to a > unvalidated page results in a #VC exception, at which time the exception > handler may validate the page. Lazy validation requires careful tracking of > the validated pages to avoid validating the same GPA more than once. The > recently introduced "Unaccepted" memory type can be used to communicate the > unvalidated memory ranges to the Guest OS. > > At this time we only support the pre-validation. OVMF detects all the available > system RAM in the PEI phase. When SEV-SNP is enabled, the memory is validated > before it is made available to the EDK2 core. > > This series does not implements the following SEV-SNP features yet: > > * CPUID filtering > * AP bring up using the new SEV-SNP NAE > * Lazy validation > * Interrupt security > > The series is based on commit: > e542e05d4f UefiCpuPkg/SmmCpuFeaturesLib: Abstract PcdCpuMaxLogicalProcessorNumber > > Additional resources > --------------------- > SEV-SNP whitepaper > https://www.amd.com/system/files/TechDocs/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf > > APM 2: https://www.amd.com/system/files/TechDocs/24593.pdf (section 15.36) > > The complete source is available at > https://github.com/AMDESE/ovmf/tree/sev-snp-rfc-1 > > GHCB spec v2: > The draft specification is posted on AMD-SEV-SNP mailing list: > https://lists.suse.com/mailman/private/amd-sev-snp/ > > Copy of the spec is also available at > https://github.com/AMDESE/AMDSEV/blob/sev-snp-devel/docs/56421-Guest_Hypervisor_Communication_Block_Standardization.pdf > > GHCB spec v1: > SEV-SNP firmware specification: > https://developer.amd.com/sev/ > > Cc: James Bottomley <jejb@linux.ibm.com> > Cc: Min Xu <min.m.xu@intel.com> > Cc: Jiewen Yao <jiewen.yao@intel.com> > Cc: Tom Lendacky <thomas.lendacky@amd.com> > Cc: Jordan Justen <jordan.l.justen@intel.com> > Cc: Ard Biesheuvel <ardb+tianocore@kernel.org> > Cc: Laszlo Ersek <lersek@redhat.com> > > Brijesh Singh (19): > OvmfPkg: Reserve the Secrets and Cpuid page for the SEV-SNP guest > OvmfPkg: validate the data pages used in the SEC phase > MdePkg: Expand the SEV MSR to include the SNP definition > OvmfPkg/MemEncryptSevLib: add MemEncryptSevSnpEnabled() > MdePkg: Define the GHCB GPA structure > UefiCpuPkg/MpLib: add support to register GHCB GPA when SEV-SNP is > enabled > OvmfPkg: Add a library to support registering GHCB GPA > OvmfPkg: register GHCB gpa for the SEV-SNP guest > MdePkg: Add AsmPvalidate() support > OvmfPkg: Define the Page State Change VMGEXIT structures > OvmfPkg/ResetVector: Invalidate the GHCB page > OvmfPkg/MemEncryptSevLib: Add support to validate system RAM > OvmfPkg/SecMain: Validate the data/code pages used for the PEI phase > OvmfPkg/MemEncryptSevLib: Add support to validate RAM in PEI phase > OvmfPkg/PlatformPei: Validate the system RAM when SNP is active > OvmfPkg/MemEncryptSevLib: Add support to validate > 4GB memory in PEI > phase > OvmfPkg/VmgExitLib: Allow PMBASE register access in Dxe phase > OvmfPkg/MemEncryptSevLib: Validate the memory during set or clear enc > attribute > OvmfPkg/MemEncryptSevLib: Skip page state change for non RAM region > > MdePkg/Include/Library/BaseLib.h | 37 +++ > MdePkg/Include/Register/Amd/Fam17Msr.h | 31 ++- > MdePkg/Include/Register/Amd/Ghcb.h | 39 ++- > MdePkg/Library/BaseLib/BaseLib.inf | 1 + > MdePkg/Library/BaseLib/X64/Pvalidate.nasm | 43 +++ > OvmfPkg/Include/Library/GhcbRegisterLib.h | 27 ++ > OvmfPkg/Include/Library/MemEncryptSevLib.h | 30 +++ > .../DxeMemEncryptSevLib.inf | 7 + > .../DxeMemEncryptSevLibInternal.c | 27 ++ > .../Ia32/SnpPageStateChange.c | 17 ++ > .../PeiMemEncryptSevLib.inf | 9 + > .../PeiMemEncryptSevLibInternal.c | 47 ++++ > .../SecMemEncryptSevLib.inf | 4 + > .../SecMemEncryptSevLibInternal.c | 39 +++ > .../BaseMemEncryptSevLib/SnpPageStateChange.h | 37 +++ > .../X64/PeiDxeSnpSetPageState.c | 63 +++++ > .../X64/PeiDxeVirtualMemory.c | 151 ++++++++++- > .../X64/PeiSnpSystemRamValidate.c | 129 +++++++++ > .../X64/SecSnpSystemRamValidate.c | 23 ++ > .../X64/SnpPageStateChangeInternal.c | 254 ++++++++++++++++++ > .../X64/SnpPageStateTrack.c | 119 ++++++++ > .../X64/SnpPageStateTrack.h | 36 +++ > .../X64/SnpSetPageState.h | 27 ++ > .../BaseMemEncryptSevLib/X64/VirtualMemory.h | 19 ++ > .../Library/GhcbRegisterLib/GhcbRegisterLib.c | 97 +++++++ > .../GhcbRegisterLib/GhcbRegisterLib.inf | 33 +++ > OvmfPkg/Library/VmgExitLib/SecVmgExitLib.inf | 4 + > OvmfPkg/Library/VmgExitLib/VmgExitLib.inf | 7 + > OvmfPkg/Library/VmgExitLib/VmgExitVcHandler.c | 45 ++++ > OvmfPkg/OvmfPkg.dec | 12 + > OvmfPkg/OvmfPkgX64.dsc | 1 + > OvmfPkg/OvmfPkgX64.fdf | 33 ++- > OvmfPkg/PlatformPei/AmdSev.c | 52 ++++ > OvmfPkg/PlatformPei/PlatformPei.inf | 2 + > OvmfPkg/ResetVector/Ia16/ResetVectorVtf0.asm | 24 ++ > OvmfPkg/ResetVector/Ia32/PageTables64.asm | 106 ++++++++ > OvmfPkg/ResetVector/ResetVector.inf | 5 + > OvmfPkg/ResetVector/ResetVector.nasmb | 4 + > OvmfPkg/Sec/SecMain.c | 102 +++++++ > OvmfPkg/Sec/SecMain.inf | 2 + > UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf | 1 + > UefiCpuPkg/Library/MpInitLib/MpEqu.inc | 1 + > UefiCpuPkg/Library/MpInitLib/MpLib.c | 2 + > UefiCpuPkg/Library/MpInitLib/MpLib.h | 2 + > UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf | 1 + > UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm | 51 ++++ > UefiCpuPkg/UefiCpuPkg.dec | 6 + > 47 files changed, 1790 insertions(+), 19 deletions(-) > create mode 100644 MdePkg/Library/BaseLib/X64/Pvalidate.nasm > create mode 100644 OvmfPkg/Include/Library/GhcbRegisterLib.h > create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/Ia32/SnpPageStateChange.c > create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/SnpPageStateChange.h > create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/PeiDxeSnpSetPageState.c > create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/PeiSnpSystemRamValidate.c > create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SecSnpSystemRamValidate.c > create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpPageStateChangeInternal.c > create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpPageStateTrack.c > create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpPageStateTrack.h > create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpSetPageState.h > create mode 100644 OvmfPkg/Library/GhcbRegisterLib/GhcbRegisterLib.c > create mode 100644 OvmfPkg/Library/GhcbRegisterLib/GhcbRegisterLib.inf > -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#73236): https://edk2.groups.io/g/devel/message/73236 Mute This Topic: https://groups.io/mt/81584576/1787277 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org] -=-=-=-=-=-=-=-=-=-=-=-
Hi Brijesh, On 03/24/21 16:31, Brijesh Singh wrote: > BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=3275 > > SEV-SNP builds upon existing SEV and SEV-ES functionality while adding > new hardware-based memory protections. SEV-SNP adds strong memory integrity > protection to help prevent malicious hypervisor-based attacks like data > replay, memory re-mapping and more in order to create an isolated memory > encryption environment. > > This series provides the basic building blocks to support booting the SEV-SNP > VMs, it does not cover all the security enhancement introduced by the SEV-SNP > such as interrupt protection. > > Many of the integrity guarantees of SEV-SNP are enforced through a new > structure called the Reverse Map Table (RMP). Adding a new page to SEV-SNP > VM requires a 2-step process. First, the hypervisor assigns a page to the > guest using the new RMPUPDATE instruction. This transitions the page to > guest-invalid. Second, the guest validates the page using the new PVALIDATE > instruction. The SEV-SNP VMs can use the new "Page State Change Request NAE" > defined in the GHCB specification to ask hypervisor to add or remove page > from the RMP table. > > Each page assigned to the SEV-SNP VM can either be validated or unvalidated, > as indicated by the Validated flag in the page's RMP entry. There are two > approaches that can be taken for the page validation: Pre-validation and > Lazy Validation. > > Under pre-validation, the pages are validated prior to first use. And under > lazy validation, pages are validated when first accessed. An access to a > unvalidated page results in a #VC exception, at which time the exception > handler may validate the page. Lazy validation requires careful tracking of > the validated pages to avoid validating the same GPA more than once. The > recently introduced "Unaccepted" memory type can be used to communicate the > unvalidated memory ranges to the Guest OS. > > At this time we only support the pre-validation. OVMF detects all the available > system RAM in the PEI phase. When SEV-SNP is enabled, the memory is validated > before it is made available to the EDK2 core. Can you describe this in a bit more detail, before I look at the individual patches? Specifically, what existing logic in the PEI phase was taken, and extended, and how? If there is a particular patch whose commit message is closely related to my question, can you point it out? Patch#15 perhaps? (Doesn't seem like a big patch; for some reason I'd expect something more complex, but perhaps that's only because it builds upon the many earlier patches.) Thanks, Laszlo > > This series does not implements the following SEV-SNP features yet: > > * CPUID filtering > * AP bring up using the new SEV-SNP NAE > * Lazy validation > * Interrupt security > > The series is based on commit: > e542e05d4f UefiCpuPkg/SmmCpuFeaturesLib: Abstract PcdCpuMaxLogicalProcessorNumber > > Additional resources > --------------------- > SEV-SNP whitepaper > https://www.amd.com/system/files/TechDocs/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf > > APM 2: https://www.amd.com/system/files/TechDocs/24593.pdf (section 15.36) > > The complete source is available at > https://github.com/AMDESE/ovmf/tree/sev-snp-rfc-1 > > GHCB spec v2: > The draft specification is posted on AMD-SEV-SNP mailing list: > https://lists.suse.com/mailman/private/amd-sev-snp/ > > Copy of the spec is also available at > https://github.com/AMDESE/AMDSEV/blob/sev-snp-devel/docs/56421-Guest_Hypervisor_Communication_Block_Standardization.pdf > > GHCB spec v1: > SEV-SNP firmware specification: > https://developer.amd.com/sev/ > > Cc: James Bottomley <jejb@linux.ibm.com> > Cc: Min Xu <min.m.xu@intel.com> > Cc: Jiewen Yao <jiewen.yao@intel.com> > Cc: Tom Lendacky <thomas.lendacky@amd.com> > Cc: Jordan Justen <jordan.l.justen@intel.com> > Cc: Ard Biesheuvel <ardb+tianocore@kernel.org> > Cc: Laszlo Ersek <lersek@redhat.com> > > Brijesh Singh (19): > OvmfPkg: Reserve the Secrets and Cpuid page for the SEV-SNP guest > OvmfPkg: validate the data pages used in the SEC phase > MdePkg: Expand the SEV MSR to include the SNP definition > OvmfPkg/MemEncryptSevLib: add MemEncryptSevSnpEnabled() > MdePkg: Define the GHCB GPA structure > UefiCpuPkg/MpLib: add support to register GHCB GPA when SEV-SNP is > enabled > OvmfPkg: Add a library to support registering GHCB GPA > OvmfPkg: register GHCB gpa for the SEV-SNP guest > MdePkg: Add AsmPvalidate() support > OvmfPkg: Define the Page State Change VMGEXIT structures > OvmfPkg/ResetVector: Invalidate the GHCB page > OvmfPkg/MemEncryptSevLib: Add support to validate system RAM > OvmfPkg/SecMain: Validate the data/code pages used for the PEI phase > OvmfPkg/MemEncryptSevLib: Add support to validate RAM in PEI phase > OvmfPkg/PlatformPei: Validate the system RAM when SNP is active > OvmfPkg/MemEncryptSevLib: Add support to validate > 4GB memory in PEI > phase > OvmfPkg/VmgExitLib: Allow PMBASE register access in Dxe phase > OvmfPkg/MemEncryptSevLib: Validate the memory during set or clear enc > attribute > OvmfPkg/MemEncryptSevLib: Skip page state change for non RAM region > > MdePkg/Include/Library/BaseLib.h | 37 +++ > MdePkg/Include/Register/Amd/Fam17Msr.h | 31 ++- > MdePkg/Include/Register/Amd/Ghcb.h | 39 ++- > MdePkg/Library/BaseLib/BaseLib.inf | 1 + > MdePkg/Library/BaseLib/X64/Pvalidate.nasm | 43 +++ > OvmfPkg/Include/Library/GhcbRegisterLib.h | 27 ++ > OvmfPkg/Include/Library/MemEncryptSevLib.h | 30 +++ > .../DxeMemEncryptSevLib.inf | 7 + > .../DxeMemEncryptSevLibInternal.c | 27 ++ > .../Ia32/SnpPageStateChange.c | 17 ++ > .../PeiMemEncryptSevLib.inf | 9 + > .../PeiMemEncryptSevLibInternal.c | 47 ++++ > .../SecMemEncryptSevLib.inf | 4 + > .../SecMemEncryptSevLibInternal.c | 39 +++ > .../BaseMemEncryptSevLib/SnpPageStateChange.h | 37 +++ > .../X64/PeiDxeSnpSetPageState.c | 63 +++++ > .../X64/PeiDxeVirtualMemory.c | 151 ++++++++++- > .../X64/PeiSnpSystemRamValidate.c | 129 +++++++++ > .../X64/SecSnpSystemRamValidate.c | 23 ++ > .../X64/SnpPageStateChangeInternal.c | 254 ++++++++++++++++++ > .../X64/SnpPageStateTrack.c | 119 ++++++++ > .../X64/SnpPageStateTrack.h | 36 +++ > .../X64/SnpSetPageState.h | 27 ++ > .../BaseMemEncryptSevLib/X64/VirtualMemory.h | 19 ++ > .../Library/GhcbRegisterLib/GhcbRegisterLib.c | 97 +++++++ > .../GhcbRegisterLib/GhcbRegisterLib.inf | 33 +++ > OvmfPkg/Library/VmgExitLib/SecVmgExitLib.inf | 4 + > OvmfPkg/Library/VmgExitLib/VmgExitLib.inf | 7 + > OvmfPkg/Library/VmgExitLib/VmgExitVcHandler.c | 45 ++++ > OvmfPkg/OvmfPkg.dec | 12 + > OvmfPkg/OvmfPkgX64.dsc | 1 + > OvmfPkg/OvmfPkgX64.fdf | 33 ++- > OvmfPkg/PlatformPei/AmdSev.c | 52 ++++ > OvmfPkg/PlatformPei/PlatformPei.inf | 2 + > OvmfPkg/ResetVector/Ia16/ResetVectorVtf0.asm | 24 ++ > OvmfPkg/ResetVector/Ia32/PageTables64.asm | 106 ++++++++ > OvmfPkg/ResetVector/ResetVector.inf | 5 + > OvmfPkg/ResetVector/ResetVector.nasmb | 4 + > OvmfPkg/Sec/SecMain.c | 102 +++++++ > OvmfPkg/Sec/SecMain.inf | 2 + > UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf | 1 + > UefiCpuPkg/Library/MpInitLib/MpEqu.inc | 1 + > UefiCpuPkg/Library/MpInitLib/MpLib.c | 2 + > UefiCpuPkg/Library/MpInitLib/MpLib.h | 2 + > UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf | 1 + > UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm | 51 ++++ > UefiCpuPkg/UefiCpuPkg.dec | 6 + > 47 files changed, 1790 insertions(+), 19 deletions(-) > create mode 100644 MdePkg/Library/BaseLib/X64/Pvalidate.nasm > create mode 100644 OvmfPkg/Include/Library/GhcbRegisterLib.h > create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/Ia32/SnpPageStateChange.c > create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/SnpPageStateChange.h > create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/PeiDxeSnpSetPageState.c > create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/PeiSnpSystemRamValidate.c > create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SecSnpSystemRamValidate.c > create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpPageStateChangeInternal.c > create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpPageStateTrack.c > create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpPageStateTrack.h > create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpSetPageState.h > create mode 100644 OvmfPkg/Library/GhcbRegisterLib/GhcbRegisterLib.c > create mode 100644 OvmfPkg/Library/GhcbRegisterLib/GhcbRegisterLib.inf > -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#73839): https://edk2.groups.io/g/devel/message/73839 Mute This Topic: https://groups.io/mt/81584576/1787277 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org] -=-=-=-=-=-=-=-=-=-=-=-
Hi Laszlo, On 4/8/21 4:58 AM, Laszlo Ersek wrote: > Hi Brijesh, > > On 03/24/21 16:31, Brijesh Singh wrote: >> BZ: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D3275&data=04%7C01%7Cbrijesh.singh%40amd.com%7Cb8ee3abf81aa4e5b1e6008d8fa74dbbc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637534728122143364%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=G2AQ%2FCks3%2BbczHJXMwqlqpWpoBJmb0pmxb1VNLw6t%2BA%3D&reserved=0 >> >> SEV-SNP builds upon existing SEV and SEV-ES functionality while adding >> new hardware-based memory protections. SEV-SNP adds strong memory integrity >> protection to help prevent malicious hypervisor-based attacks like data >> replay, memory re-mapping and more in order to create an isolated memory >> encryption environment. >> >> This series provides the basic building blocks to support booting the SEV-SNP >> VMs, it does not cover all the security enhancement introduced by the SEV-SNP >> such as interrupt protection. >> >> Many of the integrity guarantees of SEV-SNP are enforced through a new >> structure called the Reverse Map Table (RMP). Adding a new page to SEV-SNP >> VM requires a 2-step process. First, the hypervisor assigns a page to the >> guest using the new RMPUPDATE instruction. This transitions the page to >> guest-invalid. Second, the guest validates the page using the new PVALIDATE >> instruction. The SEV-SNP VMs can use the new "Page State Change Request NAE" >> defined in the GHCB specification to ask hypervisor to add or remove page >> from the RMP table. >> >> Each page assigned to the SEV-SNP VM can either be validated or unvalidated, >> as indicated by the Validated flag in the page's RMP entry. There are two >> approaches that can be taken for the page validation: Pre-validation and >> Lazy Validation. >> >> Under pre-validation, the pages are validated prior to first use. And under >> lazy validation, pages are validated when first accessed. An access to a >> unvalidated page results in a #VC exception, at which time the exception >> handler may validate the page. Lazy validation requires careful tracking of >> the validated pages to avoid validating the same GPA more than once. The >> recently introduced "Unaccepted" memory type can be used to communicate the >> unvalidated memory ranges to the Guest OS. >> >> At this time we only support the pre-validation. OVMF detects all the available >> system RAM in the PEI phase. When SEV-SNP is enabled, the memory is validated >> before it is made available to the EDK2 core. > Can you describe this in a bit more detail, before I look at the > individual patches? Specifically, what existing logic in the PEI phase > was taken, and extended, and how? One of the key requirement is that the guest private pages much be validated before the access. If guest tries to access the pages before the validation then it will result in #VC (page-not-validated) exception. To avoid the #VC, we propose the validating the memory before the access. We will incrementally add the support to lazy validate (i.e validate on access). Let me try explaining a bit, the page validation process consist of two steps: 1. Add the pages in the RMP table -- must be done by the hypervisor using the RMPUPDATE instruction. The guest can use VMGEXIT NAEs to ask hypervisor to add or remove pages from the RMP table. 2. Guest issue the PVALIDATE instruction -- this sets the validate bit in the RMP table. Similar to SEV, the OVMF_CODE.fd is encrypted through the SNP firmware before the launch. The SNP firmware also validates the memory page after encrypting. This allows us to boot the initial entry code without guest going through the validation process. The OVMF reset vector uses few data pages (e.g page table, early Sec stack). Access to these data pages will result in #VC. There are two approaches we can take to validate these data pages: 1. Ask SNP firmware to pre-validate it -- SNP firmware provides an special command that can be used to pre-validate the pages without affecting the measurement. 2. Enhance the reset vector code to validate the pages. For now I choose #1. The pre-validation performed by the SNP firmware is sufficient to boot through the SEC phase. The SEC phase later decompress the Fv to a new memory location. Now we need the OVMF to take over the validation procedure. The series extends the MemEncryptSevLib to add a new helper MemEncryptSevSnpValidateRam(). The helper is used to validate the system RAM. See patch #12. SEC phase calls the MemEncryptSevSnpValidateRam() to validate the output buffer used for the decompression. This was sufficient to boot into the PEI phase, see patch #13. The PEI detects all the available system RAM. After the memory detection is completed the PlatformPei calls the AmdSevSnpInitialize(). The initialization routine iterate through the HOB and calls the MemEncryptSevSnpValidateRam() to validate all the system RAM. Is it possible the more system ram can be detected after the PlatformPei is completed ? One of the important thing is we should *never* validate the pages twice. The MemEncryptSevSnpValidateRam() uses a interval search tree to keep the record of what has been validated. Before validating the range, it lookup in its tree and if it finds that range is already validated then do nothing. If it detects an overlap then it will validate only non overlapping regions -- see patch #14. The patch #18 extend the MemEncrypt{Set,Clear}PageEncMask() to call the SNP page state change during the C-bit toggle. Please let me know if you have any questions. We can hash out the design here before you taking a closure look at the code. > > If there is a particular patch whose commit message is closely related > to my question, can you point it out? Patch#15 perhaps? (Doesn't seem > like a big patch; for some reason I'd expect something more complex, but > perhaps that's only because it builds upon the many earlier patches.) > > Thanks, > Laszlo > >> This series does not implements the following SEV-SNP features yet: >> >> * CPUID filtering >> * AP bring up using the new SEV-SNP NAE >> * Lazy validation >> * Interrupt security >> >> The series is based on commit: >> e542e05d4f UefiCpuPkg/SmmCpuFeaturesLib: Abstract PcdCpuMaxLogicalProcessorNumber >> >> Additional resources >> --------------------- >> SEV-SNP whitepaper >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.amd.com%2Fsystem%2Ffiles%2FTechDocs%2FSEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf&data=04%7C01%7Cbrijesh.singh%40amd.com%7Cb8ee3abf81aa4e5b1e6008d8fa74dbbc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637534728122143364%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=uUqlVHhWQ6geGDaNHwxGMpoSpIamB%2F1vHH69h%2FEGUro%3D&reserved=0 >> >> APM 2: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.amd.com%2Fsystem%2Ffiles%2FTechDocs%2F24593.pdf&data=04%7C01%7Cbrijesh.singh%40amd.com%7Cb8ee3abf81aa4e5b1e6008d8fa74dbbc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637534728122143364%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=R2kU42jvCDZat8kGZ5gDDz2nFXIawHfXdRW1aovhNK8%3D&reserved=0 (section 15.36) >> >> The complete source is available at >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAMDESE%2Fovmf%2Ftree%2Fsev-snp-rfc-1&data=04%7C01%7Cbrijesh.singh%40amd.com%7Cb8ee3abf81aa4e5b1e6008d8fa74dbbc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637534728122143364%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=yreFg2hr%2F82WYEjxqCmb7pXUdtrRCJRYPrPHgfrWjM8%3D&reserved=0 >> >> GHCB spec v2: >> The draft specification is posted on AMD-SEV-SNP mailing list: >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.suse.com%2Fmailman%2Fprivate%2Famd-sev-snp%2F&data=04%7C01%7Cbrijesh.singh%40amd.com%7Cb8ee3abf81aa4e5b1e6008d8fa74dbbc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637534728122143364%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=PFV2mA7T%2Fbl2zP5j52kNdT%2FavDMRLWDEDqz6JGusEFg%3D&reserved=0 >> >> Copy of the spec is also available at >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAMDESE%2FAMDSEV%2Fblob%2Fsev-snp-devel%2Fdocs%2F56421-Guest_Hypervisor_Communication_Block_Standardization.pdf&data=04%7C01%7Cbrijesh.singh%40amd.com%7Cb8ee3abf81aa4e5b1e6008d8fa74dbbc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637534728122143364%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=U3oKRe5m0NxI0xqv1gBoyh%2BEEX1LVeCWR42rvPh6XZ8%3D&reserved=0 >> >> GHCB spec v1: >> SEV-SNP firmware specification: >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdeveloper.amd.com%2Fsev%2F&data=04%7C01%7Cbrijesh.singh%40amd.com%7Cb8ee3abf81aa4e5b1e6008d8fa74dbbc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637534728122143364%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=G%2BttkORsTJchJ1Fy1iNlD%2B%2BqiQZuwI8md5vhJjEb%2Fn4%3D&reserved=0 >> >> Cc: James Bottomley <jejb@linux.ibm.com> >> Cc: Min Xu <min.m.xu@intel.com> >> Cc: Jiewen Yao <jiewen.yao@intel.com> >> Cc: Tom Lendacky <thomas.lendacky@amd.com> >> Cc: Jordan Justen <jordan.l.justen@intel.com> >> Cc: Ard Biesheuvel <ardb+tianocore@kernel.org> >> Cc: Laszlo Ersek <lersek@redhat.com> >> >> Brijesh Singh (19): >> OvmfPkg: Reserve the Secrets and Cpuid page for the SEV-SNP guest >> OvmfPkg: validate the data pages used in the SEC phase >> MdePkg: Expand the SEV MSR to include the SNP definition >> OvmfPkg/MemEncryptSevLib: add MemEncryptSevSnpEnabled() >> MdePkg: Define the GHCB GPA structure >> UefiCpuPkg/MpLib: add support to register GHCB GPA when SEV-SNP is >> enabled >> OvmfPkg: Add a library to support registering GHCB GPA >> OvmfPkg: register GHCB gpa for the SEV-SNP guest >> MdePkg: Add AsmPvalidate() support >> OvmfPkg: Define the Page State Change VMGEXIT structures >> OvmfPkg/ResetVector: Invalidate the GHCB page >> OvmfPkg/MemEncryptSevLib: Add support to validate system RAM >> OvmfPkg/SecMain: Validate the data/code pages used for the PEI phase >> OvmfPkg/MemEncryptSevLib: Add support to validate RAM in PEI phase >> OvmfPkg/PlatformPei: Validate the system RAM when SNP is active >> OvmfPkg/MemEncryptSevLib: Add support to validate > 4GB memory in PEI >> phase >> OvmfPkg/VmgExitLib: Allow PMBASE register access in Dxe phase >> OvmfPkg/MemEncryptSevLib: Validate the memory during set or clear enc >> attribute >> OvmfPkg/MemEncryptSevLib: Skip page state change for non RAM region >> >> MdePkg/Include/Library/BaseLib.h | 37 +++ >> MdePkg/Include/Register/Amd/Fam17Msr.h | 31 ++- >> MdePkg/Include/Register/Amd/Ghcb.h | 39 ++- >> MdePkg/Library/BaseLib/BaseLib.inf | 1 + >> MdePkg/Library/BaseLib/X64/Pvalidate.nasm | 43 +++ >> OvmfPkg/Include/Library/GhcbRegisterLib.h | 27 ++ >> OvmfPkg/Include/Library/MemEncryptSevLib.h | 30 +++ >> .../DxeMemEncryptSevLib.inf | 7 + >> .../DxeMemEncryptSevLibInternal.c | 27 ++ >> .../Ia32/SnpPageStateChange.c | 17 ++ >> .../PeiMemEncryptSevLib.inf | 9 + >> .../PeiMemEncryptSevLibInternal.c | 47 ++++ >> .../SecMemEncryptSevLib.inf | 4 + >> .../SecMemEncryptSevLibInternal.c | 39 +++ >> .../BaseMemEncryptSevLib/SnpPageStateChange.h | 37 +++ >> .../X64/PeiDxeSnpSetPageState.c | 63 +++++ >> .../X64/PeiDxeVirtualMemory.c | 151 ++++++++++- >> .../X64/PeiSnpSystemRamValidate.c | 129 +++++++++ >> .../X64/SecSnpSystemRamValidate.c | 23 ++ >> .../X64/SnpPageStateChangeInternal.c | 254 ++++++++++++++++++ >> .../X64/SnpPageStateTrack.c | 119 ++++++++ >> .../X64/SnpPageStateTrack.h | 36 +++ >> .../X64/SnpSetPageState.h | 27 ++ >> .../BaseMemEncryptSevLib/X64/VirtualMemory.h | 19 ++ >> .../Library/GhcbRegisterLib/GhcbRegisterLib.c | 97 +++++++ >> .../GhcbRegisterLib/GhcbRegisterLib.inf | 33 +++ >> OvmfPkg/Library/VmgExitLib/SecVmgExitLib.inf | 4 + >> OvmfPkg/Library/VmgExitLib/VmgExitLib.inf | 7 + >> OvmfPkg/Library/VmgExitLib/VmgExitVcHandler.c | 45 ++++ >> OvmfPkg/OvmfPkg.dec | 12 + >> OvmfPkg/OvmfPkgX64.dsc | 1 + >> OvmfPkg/OvmfPkgX64.fdf | 33 ++- >> OvmfPkg/PlatformPei/AmdSev.c | 52 ++++ >> OvmfPkg/PlatformPei/PlatformPei.inf | 2 + >> OvmfPkg/ResetVector/Ia16/ResetVectorVtf0.asm | 24 ++ >> OvmfPkg/ResetVector/Ia32/PageTables64.asm | 106 ++++++++ >> OvmfPkg/ResetVector/ResetVector.inf | 5 + >> OvmfPkg/ResetVector/ResetVector.nasmb | 4 + >> OvmfPkg/Sec/SecMain.c | 102 +++++++ >> OvmfPkg/Sec/SecMain.inf | 2 + >> UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf | 1 + >> UefiCpuPkg/Library/MpInitLib/MpEqu.inc | 1 + >> UefiCpuPkg/Library/MpInitLib/MpLib.c | 2 + >> UefiCpuPkg/Library/MpInitLib/MpLib.h | 2 + >> UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf | 1 + >> UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm | 51 ++++ >> UefiCpuPkg/UefiCpuPkg.dec | 6 + >> 47 files changed, 1790 insertions(+), 19 deletions(-) >> create mode 100644 MdePkg/Library/BaseLib/X64/Pvalidate.nasm >> create mode 100644 OvmfPkg/Include/Library/GhcbRegisterLib.h >> create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/Ia32/SnpPageStateChange.c >> create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/SnpPageStateChange.h >> create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/PeiDxeSnpSetPageState.c >> create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/PeiSnpSystemRamValidate.c >> create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SecSnpSystemRamValidate.c >> create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpPageStateChangeInternal.c >> create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpPageStateTrack.c >> create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpPageStateTrack.h >> create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpSetPageState.h >> create mode 100644 OvmfPkg/Library/GhcbRegisterLib/GhcbRegisterLib.c >> create mode 100644 OvmfPkg/Library/GhcbRegisterLib/GhcbRegisterLib.inf >> -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#73846): https://edk2.groups.io/g/devel/message/73846 Mute This Topic: https://groups.io/mt/81584576/1787277 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org] -=-=-=-=-=-=-=-=-=-=-=-
On 04/08/21 13:59, Brijesh Singh wrote: > On 4/8/21 4:58 AM, Laszlo Ersek wrote: >> On 03/24/21 16:31, Brijesh Singh wrote: >>> At this time we only support the pre-validation. OVMF detects all the available >>> system RAM in the PEI phase. When SEV-SNP is enabled, the memory is validated >>> before it is made available to the EDK2 core. >> Can you describe this in a bit more detail, before I look at the >> individual patches? Specifically, what existing logic in the PEI phase >> was taken, and extended, and how? > > One of the key requirement is that the guest private pages much be > validated before the access. If guest tries to access the pages before > the validation then it will result in #VC (page-not-validated) > exception. To avoid the #VC, we propose the validating the memory before > the access. We will incrementally add the support to lazy validate (i.e > validate on access). What is the primary threat (in simple terms please) that validation is supposed to prevent? And, against that particular threat, does pre-validation offer some protection, or will that only come with lazy validation? > > Let me try explaining a bit, the page validation process consist of two > steps: > > 1. Add the pages in the RMP table -- must be done by the hypervisor > using the RMPUPDATE instruction. The guest can use VMGEXIT NAEs to ask > hypervisor to add or remove pages from the RMP table. > > 2. Guest issue the PVALIDATE instruction -- this sets the validate bit > in the RMP table. > > Similar to SEV, the OVMF_CODE.fd is encrypted through the SNP firmware > before the launch. The SNP firmware also validates the memory page after > encrypting. This allows us to boot the initial entry code without guest > going through the validation process. > > The OVMF reset vector uses few data pages (e.g page table, early Sec > stack). Access to these data pages will result in #VC. There are two > approaches we can take to validate these data pages: > > 1. Ask SNP firmware to pre-validate it -- SNP firmware provides an > special command that can be used to pre-validate the pages without > affecting the measurement. This means the two pflash chips, right? > > 2. Enhance the reset vector code to validate the pages. > > For now I choose #1. > > The pre-validation performed by the SNP firmware is sufficient to boot > through the SEC phase. The SEC phase later decompress the Fv to a new > memory location. Now we need the OVMF to take over the validation > procedure. The series extends the MemEncryptSevLib to add a new helper > MemEncryptSevSnpValidateRam(). The helper is used to validate the system > RAM. See patch #12. SEC phase calls the MemEncryptSevSnpValidateRam() to > validate the output buffer used for the decompression. This was > sufficient to boot into the PEI phase, see patch #13. Two questions here: - Is ACPI S3 in scope for now? - We need more areas made accessible in SEC than just the decompression buffer; for example the temporary SEC/PEI heap and stack, and (IIRC) various special SEV-ES areas laid out via MEMFD. Where are all of those handled / enumerated? > The PEI detects > all the available system RAM. After the memory detection is completed > the PlatformPei calls the AmdSevSnpInitialize(). The initialization > routine iterate through the HOB and calls the > MemEncryptSevSnpValidateRam() to validate all the system RAM. Is it > possible the more system ram can be detected after the PlatformPei is > completed ? That would cause problems even without SEV-SNP (i.e., with plain SEV), so I'm not worried about it. > > One of the important thing is we should *never* validate the pages > twice. What are the symptoms / consequences of: - the guest accessing an unvalidated page (I understand it causes a #VC, but what is the direct result of that, when this series is applied?), - doubly-validating a page? The first question is relevant because we should crash as soon as we touch a page we forgot to validate (we shouldn't let any corruption or similar spread out until we finally realize there's a problem). The second question is relevant for security I guess. What attacks become possible, and/or what symptoms are produced, if we doubly-validate a page? Furthermore, IIRC, we have separate #VC handlers for the different firmware phases; do they behave consistently / identicall when a #VC(page-not-validated) occurs, when this patch set is applied? My first question is basically asking whether we can *exclusively* rely on #VC(page-not-validated) faults to tell us that we missed validating a particular page. If we can do that, then the job is a bit easier, because from the GPA, we can more or less also derive *when and where* we should pre-validate the page (at least until validation is done completely lazily). > The MemEncryptSevSnpValidateRam() uses a interval search tree to > keep the record of what has been validated. Before validating the range, > it lookup in its tree and if it finds that range is already validated > then do nothing. If it detects an overlap then it will validate only non > overlapping regions -- see patch #14. What data structure is used for implementing the interval tree? I'm not necessarily looking for a data structure with "nice" asymptotic time complexity. With pre-validation especially, I think simplicity (ease of review) is more important for the data structure than performance. If it's not an actual "tree", we shouldn't call it a "tree". (An "interval tree" is usually an extension of a Red-Black Tree, and that's not the simplest data structure in existence; although edk2 does offer an rbtree library.) Furthermore, what you describe above is called idempotency. No matter how many times we attempt to validate a range, it may (or may not even) cause an actual change in the first action only. Is this property (=idempotency) an inherent requirement of the technology, or is it a simplification of the implementation? Put differently: if you called CpuDeadLoop() in the validation function any time an overlapping validation request were received, would that hugely complicate the call sites? I'm kind of "obsessing" about idempotency because you say we must *never* doubly-validate a page, so the *difference* between: - explicitly crashing on such an attempt, - and silently ignoring such an attempt, may be meaningful. It's kind of the difference between "oops this is a call site *bug*, but we patched it up", vs. "this is expected of call sites, we should just handle it internally". > The patch #18 extend the MemEncrypt{Set,Clear}PageEncMask() to call the > SNP page state change during the C-bit toggle. - When exactly do we invalidate? - Does the time and place of invalidation depend on whether we perform pre-validation, or lazy validation? - Is page invalidation idempotent too? - What is the consequence (security impact) if we forget invalidation? - There are four page states I can imagine: - encrypted for host access, valid for guest access - encrypted for host access, invalid for guest access - decrypted for host access, valid for guest access - decrypted for host access, invalid for guest access Does a state exist, from these four, that's never used? (Potentially caught by the hardware?) Do the patches highlight / explain the validity transitions, in comments? My understanding is that the C-bit toggle and the guest access valid/invalid toggle are separate actions, so "middle" states do exist, but perhaps only temporarily. I'm curious how it works, for example, with variousvirtio transfers (bus master read, bus master write, bus master common buffer). In the IoMmuDxe driver minimally, we access memory both through PTEs with the C-bit set and through PTEs with the C-bit clear, meaning that "encrypted, valid", and "decrypted, valid" are both required states. But that seems to conflict with the notion that "C-bit toggle" be directly coupled with a "validity toggle". Put differently, I'm happy that modifying IoMmuDxe appears unnecessary, but then that tells me I'm missing something about the state transitions between the above *four* states. > > Please let me know if you have any questions. We can hash out the design > here before you taking a closure look at the code. Sorry that I've been (and am being) slow to start reviewing this series. Thanks, Laszlo -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#73901): https://edk2.groups.io/g/devel/message/73901 Mute This Topic: https://groups.io/mt/81584576/1787277 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org] -=-=-=-=-=-=-=-=-=-=-=-
On 4/9/21 7:24 AM, Laszlo Ersek wrote: > On 04/08/21 13:59, Brijesh Singh wrote: >> On 4/8/21 4:58 AM, Laszlo Ersek wrote: >>> On 03/24/21 16:31, Brijesh Singh wrote: >>>> At this time we only support the pre-validation. OVMF detects all the available >>>> system RAM in the PEI phase. When SEV-SNP is enabled, the memory is validated >>>> before it is made available to the EDK2 core. >>> Can you describe this in a bit more detail, before I look at the >>> individual patches? Specifically, what existing logic in the PEI phase >>> was taken, and extended, and how? >> One of the key requirement is that the guest private pages much be >> validated before the access. If guest tries to access the pages before >> the validation then it will result in #VC (page-not-validated) >> exception. To avoid the #VC, we propose the validating the memory before >> the access. We will incrementally add the support to lazy validate (i.e >> validate on access). > What is the primary threat (in simple terms please) that validation is > supposed to prevent? To protect against the memory re-mapping attack the guest pages must be validated. The idea is that every guest page can map only to a single physical memory page at one time. > And, against that particular threat, does pre-validation offer some > protection, or will that only come with lazy validation? For the hardware it does not matter how the memory was validated -- lazy vs prevalidate. Both approaches use the PVALIDATE instruction to validate the page. In the case of pre-validation, the memory is validated before the access. Whereas in the lazy validation, the access will cause a fault and fault handler should validate the page and retry the access. Its similar to a page fault handler, OS can populate the page table or back the pages on demand. The only downside of pre-validation is, we will take a hit on the boot time. The GHCB spec provides method by which we can batch multiple requests at once to minimize the context switches. > >> Let me try explaining a bit, the page validation process consist of two >> steps: >> >> 1. Add the pages in the RMP table -- must be done by the hypervisor >> using the RMPUPDATE instruction. The guest can use VMGEXIT NAEs to ask >> hypervisor to add or remove pages from the RMP table. >> >> 2. Guest issue the PVALIDATE instruction -- this sets the validate bit >> in the RMP table. >> >> Similar to SEV, the OVMF_CODE.fd is encrypted through the SNP firmware >> before the launch. The SNP firmware also validates the memory page after >> encrypting. This allows us to boot the initial entry code without guest >> going through the validation process. >> >> The OVMF reset vector uses few data pages (e.g page table, early Sec >> stack). Access to these data pages will result in #VC. There are two >> approaches we can take to validate these data pages: >> >> 1. Ask SNP firmware to pre-validate it -- SNP firmware provides an >> special command that can be used to pre-validate the pages without >> affecting the measurement. > This means the two pflash chips, right? This does not need to be two pflash chips. The SNP firmware command does not know anything about the ROM or pflash chip. The command accepts the system physical address that need to be validated by the firmware. In patch #2, OVMF provides a range of data pages that need to be validated by the SNP firmware before booting the guest. > >> 2. Enhance the reset vector code to validate the pages. >> >> For now I choose #1. >> >> The pre-validation performed by the SNP firmware is sufficient to boot >> through the SEC phase. The SEC phase later decompress the Fv to a new >> memory location. Now we need the OVMF to take over the validation >> procedure. The series extends the MemEncryptSevLib to add a new helper >> MemEncryptSevSnpValidateRam(). The helper is used to validate the system >> RAM. See patch #12. SEC phase calls the MemEncryptSevSnpValidateRam() to >> validate the output buffer used for the decompression. This was >> sufficient to boot into the PEI phase, see patch #13. > Two questions here: > > - Is ACPI S3 in scope for now? Its not in the scope yet. I have not looked at it. > > - We need more areas made accessible in SEC than just the decompression > buffer; for example the temporary SEC/PEI heap and stack, and (IIRC) > various special SEV-ES areas laid out via MEMFD. Where are all of those > handled / enumerated? Sorry, I simplified my response by saying just decompression. You are right that its more than the decompression buffer. In my current patch, the SEC phase validates all the memory up to PcdOvmfDecompressionScratchEnd (which includes more than just output buffer). See patch #13. > >> The PEI detects >> all the available system RAM. After the memory detection is completed >> the PlatformPei calls the AmdSevSnpInitialize(). The initialization >> routine iterate through the HOB and calls the >> MemEncryptSevSnpValidateRam() to validate all the system RAM. Is it >> possible the more system ram can be detected after the PlatformPei is >> completed ? > That would cause problems even without SEV-SNP (i.e., with plain SEV), > so I'm not worried about it. > >> One of the important thing is we should *never* validate the pages >> twice. > What are the symptoms / consequences of: > > - the guest accessing an unvalidated page (I understand it causes a #VC, > but what is the direct result of that, when this series is applied?), After the series it applied, an access to an unvalidated page will result in #VC. The series does not extend the VC handler to handle the page-not-validated error code. The current VC handler [InternalVmgExitHandleVc()] will inject a #GP to terminate the guest if #VC is raised for unvalidated page. > > - doubly-validating a page? > > The first question is relevant because we should crash as soon as we > touch a page we forgot to validate (we shouldn't let any corruption or > similar spread out until we finally realize there's a problem). > > The second question is relevant for security I guess. What attacks > become possible, and/or what symptoms are produced, if we > doubly-validate a page? Assuming that guest validates its memory, this guarantees the injective mapping between GPAs and SPAs. As I noted that the page validation process consist of two steps #1 RMPUPDATE and 2#PVALIDATE. The RMPUPDATE is a hypervisor instruction and PVALIDATE is a guest instruction. A malicious hypervisor can remap gpa to different physical address in RMP table using the RMPUPDATE just before the second PVALIDATE instruction is executed. The guest need to ensure that it does not leave the security vulnerability that can be exploited by a malicious hypervisor. The SNP white paper recommends that a guest OS should keep track of what is has validate so that it can avoid getting into these situation. > > Furthermore, IIRC, we have separate #VC handlers for the different > firmware phases; do they behave consistently / identicall when a > #VC(page-not-validated) occurs, when this patch set is applied? > > My first question is basically asking whether we can *exclusively* rely > on #VC(page-not-validated) faults to tell us that we missed validating a > particular page. If we can do that, then the job is a bit easier, > because from the GPA, we can more or less also derive *when and where* > we should pre-validate the page (at least until validation is done > completely lazily). Yes, we should be able to rely #VC to tell us that we missed validating the page. In my this series so far I have not seen a #VC due to page-not-validated because we have carefully validated the pages before they are accessed. I am thinking that #VC(page-not-validated) should be treated as an error in the first phase. Incrementally we will add the lazy validation, in which the #VC(page-not-validated) will provide a hint as to what has not been validated. > >> The MemEncryptSevSnpValidateRam() uses a interval search tree to >> keep the record of what has been validated. Before validating the range, >> it lookup in its tree and if it finds that range is already validated >> then do nothing. If it detects an overlap then it will validate only non >> overlapping regions -- see patch #14. > What data structure is used for implementing the interval tree? > > I'm not necessarily looking for a data structure with "nice" asymptotic > time complexity. With pre-validation especially, I think simplicity > (ease of review) is more important for the data structure than > performance. If it's not an actual "tree", we shouldn't call it a > "tree". (An "interval tree" is usually an extension of a Red-Black Tree, > and that's not the simplest data structure in existence; although edk2 > does offer an rbtree library.) I am using binary search tree. Currently, we don't have many nodes to track. Most of the guest memory is private and are validated before we exit from the PEI phase. > Furthermore, what you describe above is called idempotency. No matter > how many times we attempt to validate a range, it may (or may not even) > cause an actual change in the first action only. Is this property > (=idempotency) an inherent requirement of the technology, or is it a > simplification of the implementation? Put differently: if you called > CpuDeadLoop() in the validation function any time an overlapping > validation request were received, would that hugely complicate the call > sites? From the technology point of view you can validate the same page again and again. Hardware does not force any restriction. To protect against time-based remap attack a guest also need to ensure that it does not blindly validates the pages otherwise guest is taking risk. Avoiding a double validation is strong recommendation. In current implementation, the validation burden is not put on the caller. Caller calls standard APIs to toggle the encryption attribute. Under the hood, if we see that page is already validated then no need to issue the PVALIDATE. If the page is getting changed from C=1 -> 0 then unvalidate it. > I'm kind of "obsessing" about idempotency because you say we must > *never* doubly-validate a page, so the *difference* between: > - explicitly crashing on such an attempt, > - and silently ignoring such an attempt, > may be meaningful. I said we should never doubly validate to avoid creating a vulnerability that can later be exploited by the malicious hypervisor. Currently we lookup address in our tree to ensure that we don't do a double validation. The PVALIDATE instruction also can be used to determine if we are attempting to validate an already validated page. In current implementation after the PVALIDATE completion, I check the rflags.CF to ensure that its not a double validation condition. If its double validation then abort the guest. Its bug in the guest. > It's kind of the difference between "oops this is a call site *bug*, but > we patched it up", vs. "this is expected of call sites, we should just > handle it internally". Our attempt should be to patch to avoid the double validation. I would still check the status of PVALIDATE, if instruction detects it as a double validation then we have a bug in our tracking and should abort the guest. > >> The patch #18 extend the MemEncrypt{Set,Clear}PageEncMask() to call the >> SNP page state change during the C-bit toggle. > - When exactly do we invalidate? Basically before the page is transition from C=1 -> 0 in the page table. > > - Does the time and place of invalidation depend on whether we perform > pre-validation, or lazy validation? It does not matter. Both type of validation update the Validated bit in the RMP table. The important thing is that invalidation can happen only for a already validated private page. If the page is shared then invalidation will result in #GP. > > - Is page invalidation idempotent too? We should treat this as idempotent too. As I said, tacking the page state is strongly recommended for the security reason. The SNP white paper has a section dedicated for the validation and may able to clarify things which is not covered in my response. > > - What is the consequence (security impact) if we forget invalidation? I can't think of a security implication of it but if we forget to invalidate then it can lead issues in the tracking data structure when it comes to validation next time. e.g a page was mapped as C=1 and we changed it to C=0 without invalidating it. Now if the same page is being changed from C=0 -> 1 then tracking data structure may think that page is already validated and will skip the validation process and thus access will result in #VC (page-not-validated). > > - There are four page states I can imagine: > - encrypted for host access, valid for guest access > - encrypted for host access, invalid for guest access > - decrypted for host access, valid for guest access > - decrypted for host access, invalid for guest access > > Does a state exist, from these four, that's never used? (Potentially > caught by the hardware?) I would not mix the page state with the host access. Basically there is two page state exist - Guest-Valid - Guest-Invalid By default all the guest memory is in Guest-Invalid state on boot. The PVALIDATE instruction is used to transition from Valid->Invalid. Guest can use the PVALIDATE to change the state from Valid to Invalid. A guest private page (i.e page mapped C=1) must be in Guest Valid state. If hardware see that a private access page is not in the Guest-Valid state then it will trigger #VC. > > Do the patches highlight / explain the validity transitions, in > comments? My understanding is that the C-bit toggle and the guest access > valid/invalid toggle are separate actions, so "middle" states do exist, > but perhaps only temporarily. I have tried to document the steps in the patch description and I can add more comments as required. When SEV-SNP is active then we need to invalidate the page before toggling the C-bit from C=1 -> 0 and similarly after the page is toggled from C=0 -> 1 we must validate it. > I'm curious how it works, for example, with variousvirtio transfers (bus > master read, bus master write, bus master common buffer). In the > IoMmuDxe driver minimally, we access memory both through PTEs with the > C-bit set and through PTEs with the C-bit clear, meaning that > "encrypted, valid", and "decrypted, valid" are both required states. But > that seems to conflict with the notion that "C-bit toggle" be directly > coupled with a "validity toggle". That should not be an issue. Each page have entry in the RMP table. So, you can have one page as guest invalid and another page as a guest valid. > Put differently, I'm happy that modifying IoMmuDxe appears unnecessary, > but then that tells me I'm missing something about the state transitions > between the above *four* states. Only thing which I would propose changing in IoMmuDxe driver is to use a pre-allocated buffer and avoid toggling the C-bit on every read/write. In past the C-bit toggle was all contained inside the guest. But when SNP is enabled, each C-bit toggle causes two additional steps #1 RMPUPDATE and #2 PVALIDATE. The RMPUPDATE is requested using the VMEXIT so it will cause a VM context switch. I would prefer to avoid it. My current patch set does not have this optimization yet. > >> Please let me know if you have any questions. We can hash out the design >> here before you taking a closure look at the code. > Sorry that I've been (and am being) slow to start reviewing this series. No worries, take your time. Thank you for all your feedback. -Brijesh > > Thanks, > Laszlo > -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#73918): https://edk2.groups.io/g/devel/message/73918 Mute This Topic: https://groups.io/mt/81584576/1787277 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org] -=-=-=-=-=-=-=-=-=-=-=-
On 04/10/21 00:43, Brijesh Singh wrote: > > On 4/9/21 7:24 AM, Laszlo Ersek wrote: >> On 04/08/21 13:59, Brijesh Singh wrote: >>> On 4/8/21 4:58 AM, Laszlo Ersek wrote: >>>> On 03/24/21 16:31, Brijesh Singh wrote: >>>>> At this time we only support the pre-validation. OVMF detects all >>>>> the available system RAM in the PEI phase. When SEV-SNP is >>>>> enabled, the memory is validated before it is made available to >>>>> the EDK2 core. >>>> Can you describe this in a bit more detail, before I look at the >>>> individual patches? Specifically, what existing logic in the PEI >>>> phase was taken, and extended, and how? >>> One of the key requirement is that the guest private pages much be >>> validated before the access. If guest tries to access the pages >>> before the validation then it will result in #VC >>> (page-not-validated) exception. To avoid the #VC, we propose the >>> validating the memory before the access. We will incrementally add >>> the support to lazy validate (i.e validate on access). >> What is the primary threat (in simple terms please) that validation >> is supposed to prevent? > > > To protect against the memory re-mapping attack the guest pages must > be validated. The idea is that every guest page can map only to a > single physical memory page at one time. I don't understand. How can a given guest page frame map to multiple host page frames? Do you mean that the situation to prevent is when multiple guest page frames (GPAs) map to the same host page frame, as set up by the hypervisor? >> And, against that particular threat, does pre-validation offer some >> protection, or will that only come with lazy validation? > > For the hardware it does not matter how the memory was validated -- > lazy vs prevalidate. Both approaches use the PVALIDATE instruction to > validate the page. > > In the case of pre-validation, the memory is validated before the > access. Whereas in the lazy validation, the access will cause a fault > and fault handler should validate the page and retry the access. Its > similar to a page fault handler, OS can populate the page table or > back the pages on demand. > > The only downside of pre-validation is, we will take a hit on the boot > time. The GHCB spec provides method by which we can batch multiple > requests at once to minimize the context switches. If pre-validation is simpler, and the only drawback is a one-time hit during boot, then wouldn't it be better to stick with pre-validation forever? I expect that would be a lot simpler for (a) the #VC handlers, (b) the tracking of the "already validated" information. >>> Similar to SEV, the OVMF_CODE.fd is encrypted through the SNP >>> firmware before the launch. The SNP firmware also validates the >>> memory page after encrypting. This allows us to boot the initial >>> entry code without guest going through the validation process. >>> >>> The OVMF reset vector uses few data pages (e.g page table, early Sec >>> stack). Access to these data pages will result in #VC. There are two >>> approaches we can take to validate these data pages: >>> >>> 1. Ask SNP firmware to pre-validate it -- SNP firmware provides an >>> special command that can be used to pre-validate the pages without >>> affecting the measurement. >> This means the two pflash chips, right? > > > This does not need to be two pflash chips. The SNP firmware command > does not know anything about the ROM or pflash chip. The command > accepts the system physical address that need to be validated by the > firmware. In patch #2, OVMF provides a range of data pages that need > to be validated by the SNP firmware before booting the guest. I guess my question related to the guest code that executes from pflash, and/or accesses data from pflash, before the guest itself could ask for any validation. Such as, the reset vector (which runs from pflash). Then, some non-volatile UEFI variable data that resides in the other pflash chip, and is accessed (read-only) during the PEI phase (the memory type information variable namely). I understood your comments as QEMU pre-validating those areas before guest launch. Is that correct? Anyway, I guess at this point I should just start reviewing the series. Some more comments below. >> - We need more areas made accessible in SEC than just the >> decompression buffer; for example the temporary SEC/PEI heap and >> stack, and (IIRC) various special SEV-ES areas laid out via MEMFD. >> Where are all of those handled / enumerated? > > > Sorry, I simplified my response by saying just decompression. You are > right that its more than the decompression buffer. In my current > patch, the SEC phase validates all the memory up to > PcdOvmfDecompressionScratchEnd (which includes more than just output > buffer). See patch #13. That sounds good (will check the patch out of course). >>> One of the important thing is we should *never* validate the pages >>> twice. >> What are the symptoms / consequences of: >> >> - the guest accessing an unvalidated page (I understand it causes a >> #VC, but what is the direct result of that, when this series is >> applied?), > > > After the series it applied, an access to an unvalidated page will > result in #VC. The series does not extend the VC handler to handle the > page-not-validated error code. The current VC handler > [InternalVmgExitHandleVc()] will inject a #GP to terminate the guest > if #VC is raised for unvalidated page. OK. >> - doubly-validating a page? >> >> The first question is relevant because we should crash as soon as we >> touch a page we forgot to validate (we shouldn't let any corruption >> or similar spread out until we finally realize there's a problem). >> >> The second question is relevant for security I guess. What attacks >> become possible, and/or what symptoms are produced, if we >> doubly-validate a page? > > Assuming that guest validates its memory, this guarantees the > injective mapping between GPAs and SPAs. > > As I noted that the page validation process consist of two steps #1 > RMPUPDATE and 2#PVALIDATE. The RMPUPDATE is a hypervisor instruction > and PVALIDATE is a guest instruction. A malicious hypervisor can remap > gpa to different physical address in RMP table using the RMPUPDATE > just before the second PVALIDATE instruction is executed. The guest > need to ensure that it does not leave the security vulnerability that > can be exploited by a malicious hypervisor. The SNP white paper > recommends that a guest OS should keep track of what is has validate > so that it can avoid getting into these situation. Hm, OK. I think this answers my top-most question. >> Furthermore, IIRC, we have separate #VC handlers for the different >> firmware phases; do they behave consistently / identicall when a >> #VC(page-not-validated) occurs, when this patch set is applied? >> >> My first question is basically asking whether we can *exclusively* >> rely on #VC(page-not-validated) faults to tell us that we missed >> validating a particular page. If we can do that, then the job is a >> bit easier, because from the GPA, we can more or less also derive >> *when and where* we should pre-validate the page (at least until >> validation is done completely lazily). > > Yes, we should be able to rely #VC to tell us that we missed > validating the page. In my this series so far I have not seen a #VC > due to page-not-validated because we have carefully validated the > pages before they are accessed. I am thinking that > #VC(page-not-validated) should be treated as an error in the first > phase. Incrementally we will add the lazy validation, in which the > #VC(page-not-validated) will provide a hint as to what has not been > validated. I'm OK with the current handling of #VC(page-not-validated) -- i.e., fatal error. I'm not yet convinced that lazy validation will be useful at all, but that could be just my lack of understanding. >>> The MemEncryptSevSnpValidateRam() uses a interval search tree to >>> keep the record of what has been validated. Before validating the >>> range, it lookup in its tree and if it finds that range is already >>> validated then do nothing. If it detects an overlap then it will >>> validate only non overlapping regions -- see patch #14. hat data >>> structure is used for implementing the interval tree? >> >> I'm not necessarily looking for a data structure with "nice" >> asymptotic time complexity. With pre-validation especially, I think >> simplicity (ease of review) is more important for the data structure >> than performance. If it's not an actual "tree", we shouldn't call it >> a "tree". (An "interval tree" is usually an extension of a Red-Black >> Tree, and that's not the simplest data structure in existence; >> although edk2 does offer an rbtree library.) > > > I am using binary search tree. Currently, we don't have many nodes to > track. Most of the guest memory is private and are validated before we > exit from the PEI phase. My preliminary comments about MemEncryptSevSnpValidateSystemRam(): (1) The function prototype (patch#12) promises that the function can never fail. In patch#14 however, I see some AddRangeToIntervalTree() calls whose return values are not checked. I think error checking should be strict, and if an error is detected (such as memory allocation failure), a plain ASSERT (FALSE) is not sufficient. Both ASSERT() and CpuDeadLoop() should be used explicitly, for the sake of RELEASE builds. And the function prototype (the leading comment) should highlight that the function either succeeds, or doesn't return. (2) I don't like the recursive nature of the functions. That's always convenient first, and almost always a mess to get out of later. However, this is not a request to complicate the tree's implementation, because... (3) ... you mention we don't have many nodes to track. Is a tree structure, with dynamically allocated nodes, justified at all? (4) Regarding repeating dynamic allocation -- that's never a great thing to do in the SEC / PEI phases. Can we use a fixed size, once-allocated, or even static, array somewhere? In particular, patch#14 uses AllocatePool() for SNP_VALIDATED_RANGE, in the PEI phase, that results in EFI_HOB_TYPE_MEMORY_POOL HOBs; see: AllocatePool() [MdePkg/Library/PeiMemoryAllocationLib/MemoryAllocationLib.c] PeiServicesAllocatePool() [MdePkg/Library/PeiServicesLib/PeiServicesLib.c] PeiAllocatePool() [MdeModulePkg/Core/Pei/Memory/MemoryServices.c] Now, dependent on where you call this AllocatePool() from, it could occur still on the temporary SEC/PEI RAM -- i.e., before permanent PEI RAM is installed. That means that such HOBs would be subject to the HOB list migration, after permanent PEI RAM is installed. The PI spec writes about EFI_HOB_MEMORY_POOL: "The HOB consumer phase should be able to ignore these HOBs". The PI spec writes about AllocatePool(): "The early allocations from temporary memory will be migrated to permanent memory when permanent main memory is installed; this migration shall occur when the HOB list is migrated to permanent memory". So, whenever we use AllocatePool() in the PEI phase, we need an "address stability proof", namely that the AllocatePool() call, and any later access to the allocated area (the HOB), are never separated by the installation of "gEfiPeiMemoryDiscoveredPpiGuid" in the PPI database, by the PEI Core. If we can't guarantee that, then an access later on could occur to the original allocation address, which has by then been invalidated by the temporary RAM migration (= the allocation HOB has been moved). AllocatePages() is "stable" however, in this sense; no proof like the one described above is necessary. (5) The "mRootNode" variable is defined in the file "OvmfPkg/Library/BaseMemEncryptSevLib/X64/PeiSnpSystemRamValidate.c" (patch #14). That means that every single module (i.e., PEIM) which consumes this library instance will own a separate "mRootNode" variable, with a validation-tracking tree that's correspondingly private to that PEIM. But the validation state is global to the entire firmware. I think we should somehow clarify, or even enforce, that the tracking data structure is firmware-global. Perhaps this can be solved by (a) moving the ownership of the data structure to a particular firmware module, with sole responsibility for pre-validation, and (b) updating the validation APIs to take this data structure explicitly. Alternatively, if multiple modules are expected to inter-operate on the tracker data structure (by which I mean inter-operation *in sequence*, not concurrently in a multiprocessing sense), then we might need a dynamic PCD for inter-driver understanding. (NB: both points (4) and (5) seem to conflict with *existent* code in BaseMemEncryptSevLib. However: - concerning (4), the current allocations are all performed with AllocateAlignedPages(), which are unaffected by temporary-to-permanent RAM migration, - and concerning (5), namely the existent "mPageTablePool" global variable, is a *known bug* -- see commit b721aa749b86a and TianoCore#847. This issue (namely, multiple modules messing with the page tables independently) remains unsolved, and I'd like to avoid introducing *more* paging-related allocations that are module-owned, and not firmware-level.) All in all, here's what I'd prefer (and please correct me if it's unrealistic): - make the SNP validation functions take a simple, *unresizeable* tracker structure for input/output, - allocate that data structure with a single AllocatePages() call in the proper PEIM, - if the SNP validation functions run out of space in the structure, hang hard, - avoid recursion, and use a simple linear search through the array, - evaluate whether graceful handling of overlaps (= range splitting, merging) is actually a requirement. See more on this below. (6) Can you please confirm that the order in which MemEncryptSevSnpValidateSystemRam()'s declaration, call sites, and implementations are introduced, is sensible? I can't tell immediately, but I'd like to be sure that the tree at least builds at every stage (no linkage errors at any stage). >> Furthermore, what you describe above is called idempotency. No matter >> how many times we attempt to validate a range, it may (or may not >> even) cause an actual change in the first action only. Is this >> property (=idempotency) an inherent requirement of the technology, or >> is it a simplification of the implementation? Put differently: if you >> called CpuDeadLoop() in the validation function any time an >> overlapping validation request were received, would that hugely >> complicate the call sites? > > From the technology point of view you can validate the same page again > and again. Hardware does not force any restriction. To protect against > time-based remap attack a guest also need to ensure that it does not > blindly validates the pages otherwise guest is taking risk. Avoiding a > double validation is strong recommendation. In current implementation, > the validation burden is not put on the caller. Caller calls standard > APIs to toggle the encryption attribute. Under the hood, if we see > that page is already validated then no need to issue the PVALIDATE. If > the page is getting changed from C=1 -> 0 then unvalidate it. There remains a huge gap in my understanding here. (The rest of my earlier questions did concern the relationship between the C bit and validation, but I might as well comment right at this spot.) "Invalidation in connection to C-bit toggling" means that there's going to be significant "validation churn" in the DXE phase. For example when fw_cfg is used (which relies on the edk2 IOMMU protocol), or when virtio transfers are performed (which also relies on the edk2 IOMMU protocol, although less directly -- through PciIo, namely). That, however, brings up several problems for me: - Is the tracker data structure scalable? I don't see any balancing in it. (And I wouldn't want a new, complex data structure for it anyway.) - In fact, I don't see any code related to invalidation -- more precisely, I don't see where and how the C-bit changes consult the tracker structure *at all*. - Given issue (5), I don't see how the tracker structure would be inherited from the PEI phase to the DXE phase. In particular, in patch#18, the flipping of the C-bit results in calls to SetPageStateInternal(). But SetPageStateInternal() does not go *through* the tracker structure. Even in patch#14, SevSnpValidateSystemRam() calls SetPageStateInternal() *in addition to* AddRangeToIntervalTree(). In other words, I don't see where validity is tracked in the DXE phase, in response to the C-bit toggling. >> I'm kind of "obsessing" about idempotency because you say we must >> *never* doubly-validate a page, so the *difference* between: >> - explicitly crashing on such an attempt, >> - and silently ignoring such an attempt, >> may be meaningful. > > > I said we should never doubly validate to avoid creating a > vulnerability that can later be exploited by the malicious hypervisor. > Currently we lookup address in our tree to ensure that we don't do a > double validation. The PVALIDATE instruction also can be used to > determine if we are attempting to validate an already validated page. > In current implementation after the PVALIDATE completion, I check the > rflags.CF to ensure that its not a double validation condition. If its > double validation then abort the guest. Its bug in the guest. > >> It's kind of the difference between "oops this is a call site *bug*, >> but we patched it up", vs. "this is expected of call sites, we should >> just handle it internally". > > Our attempt should be to patch to avoid the double validation. I would > still check the status of PVALIDATE, if instruction detects it as a > double validation then we have a bug in our tracking and should abort > the guest. Let me rephrase. My question concerns the *abstraction level* at which we should catch and handle double-validation attempts. If we silently split and merge ranges in the tracker structure, then the guest (as a whole) will not double-validate, which is good. However, double-validation attempts will potentially still *exist*, in higher-level modules in the firmware. My question is whether it is *right* to paper over such overlaps, coming from the high-level code, in the tracker structure. I suspect that it's not right. In other words, the spot where we should abort, due to "double-validation detected", is not when the PVALIDATE instruction reports a problem in the tracker structure. Instead, the spot where we should abort is when we detect an overlapping request *before* issuing PVALIDATE. I'd like the tracker to be as simple and dumb as possible, and I'd like the actual sites that request validation to blow up immediately, if they issue an overlapping request. Is my expectation wrong? >>> The patch #18 extend the MemEncrypt{Set,Clear}PageEncMask() to call >>> the SNP page state change during the C-bit toggle. >> - When exactly do we invalidate? > > Basically before the page is transition from C=1 -> 0 in the page > table. > >> >> - Does the time and place of invalidation depend on whether we >> perform pre-validation, or lazy validation? > It does not matter. Both type of validation update the Validated bit > in the RMP table. The important thing is that invalidation can happen > only for a already validated private page. If the page is shared then > invalidation will result in #GP. >> >> - Is page invalidation idempotent too? > We should treat this as idempotent too. As I said, tacking the page > state is strongly recommended for the security reason. The SNP white > paper has a section dedicated for the validation and may able to > clarify things which is not covered in my response. >> >> - What is the consequence (security impact) if we forget >> invalidation? > I can't think of a security implication of it but if we forget to > invalidate then it can lead issues in the tracking data structure when > it comes to validation next time. e.g a page was mapped as C=1 and we > changed it to C=0 without invalidating it. Now if the same page is > being changed from C=0 -> 1 then tracking data structure may think > that page is already validated and will skip the validation process > and thus access will result in #VC (page-not-validated). As I said above, I don't see where the validation tracking and the C-bit flipping "meet". Especially across the PEI and DXE phases. Basically I'm missing the core concept for maintaining a firmware-global structure, for tracking the validation status of GPA intervals. Up to and *excluding* the DXE phase, the validation tracking should be trivial, naive, and unforgiving. This is because, in the PEI phase, validation is one-shot. I would expect the tracker structure to live in the PlatformPei module. Starting with the DXE phase, where we need (in)validation to closely accompany C-bit toggling, we need a different, performant data structure for tracking validation status. This data structure should be primed from the status last set up in the PEI phase. The tracking should still be unforgiving. I believe *this* tracker structure may have to live in the IOMMU driver. BaseMemEncryptSevLib may offer helper functions, but even if it does, it should never attempt to *own* tracking information, via an internal global variable. Furthermore, those helper functions that only make sense for a particular firmware phase, should have such implementations that hang hard, in library instances that match the *other* firmware phases. For example, SevSnpValidateSystemRam() should have a DXE-specific implementation that hangs hard. It's entirely possible that my points above are complete garbage. I don't have a grasp on the core concept, for the firmware-wide tracking of the validation status. ... Ugh, wait. I've just noticed something. In patch#19, "SEC_SEV_ES_WORK_AREA.SnpSystemRamValidatedRootAddress" is introduced, which seems (?) to be implementing the kind of global (inter-driver, inter-phase) tracking that I've been missing. However, even if my guess (?) is correct about that, this idea definitely doesn't belong in the last patch, presented as a "bugfix". Even if it were a bugfix, it should be incorporated into one of the earlier patches (so that we preferably not introduce the bug in the first place). However, my argument is that it's not a bugfix -- to me that seems to be the core concept. That's what the whole series needs to be built upon. (In all of my thinking, before this particular paragraph, I *never* looked at patch#19. There was no reason to.) >> - There are four page states I can imagine: >> - encrypted for host access, valid for guest access >> - encrypted for host access, invalid for guest access >> - decrypted for host access, valid for guest access >> - decrypted for host access, invalid for guest access >> >> Does a state exist, from these four, that's never used? (Potentially >> caught by the hardware?) > > I would not mix the page state with the host access. Basically there > is two page state exist > > - Guest-Valid > > - Guest-Invalid > > By default all the guest memory is in Guest-Invalid state on boot. The > PVALIDATE instruction is used to transition from Valid->Invalid. Do you mean "Invalid->Valid" transition? > Guest can use the PVALIDATE to change the state from Valid to Invalid. > > A guest private page (i.e page mapped C=1) must be in Guest Valid > state. If hardware see that a private access page is not in the > Guest-Valid state then it will trigger #VC. > > >> >> Do the patches highlight / explain the validity transitions, in >> comments? My understanding is that the C-bit toggle and the guest >> access valid/invalid toggle are separate actions, so "middle" states >> do exist, but perhaps only temporarily. > > I have tried to document the steps in the patch description and I can > add more comments as required. When SEV-SNP is active then we need to > invalidate the page before toggling the C-bit from C=1 -> 0 and > similarly after the page is toggled from C=0 -> 1 we must validate it. > > >> I'm curious how it works, for example, with variousvirtio transfers >> (bus master read, bus master write, bus master common buffer). In the >> IoMmuDxe driver minimally, we access memory both through PTEs with >> the C-bit set and through PTEs with the C-bit clear, meaning that >> "encrypted, valid", and "decrypted, valid" are both required states. >> But that seems to conflict with the notion that "C-bit toggle" be >> directly coupled with a "validity toggle". > > That should not be an issue. Each page have entry in the RMP table. > So, you can have one page as guest invalid and another page as a guest > valid. > >> Put differently, I'm happy that modifying IoMmuDxe appears >> unnecessary, but then that tells me I'm missing something about the >> state transitions between the above *four* states. > > > Only thing which I would propose changing in IoMmuDxe driver is to use > a pre-allocated buffer and avoid toggling the C-bit on every > read/write. In past the C-bit toggle was all contained inside the > guest. But when SNP is enabled, each C-bit toggle causes two > additional steps #1 RMPUPDATE and #2 PVALIDATE. The RMPUPDATE is > requested using the VMEXIT so it will cause a VM context switch. I > would prefer to avoid it. > > My current patch set does not have this optimization yet. Please let's stick with the absolute minimum in this series, yes. I realize my email is complete chaos. That's because my understanding of this work is complete chaos. Can we approach this feature as follows: - Make "validation tracking" the core tenet (central principle) of this feature. - Design the validation *patterns* that are required in each firmware phase. - Identify the *sole* owner module, for validation tracking, in each firmware phase. If we can't readily do this, we might have to introduce new drivers (PEIMs, DXE drivers?) and new interfaces (PPIs, protocols?) - Design the hand-off between the owner modules (on firmware phase boundaries). - Add only helper functions to BaseMemEncryptSevLib, without any ownership of tracking info. The helpers should be as restricted as possible. I'd like to avoid "smearing" page validation responsibility over a bunch of modules. (See again: TianoCore#847.) I'd like responsibilities strictly centralized. If you feel that my points make no sense, I can start going over the individual patches. However, even that way, I'd have found patch#19 only in the end, and that doesn't seem right. And I wouldn't like to go "numb" on the actual code, being distracted by style issues and so on, before we have some agreement on the core approach. Another tip: if you can add content to MdePkg that is governed by AMD specifications, such as patches #3, #5, #9, I would recommend splitting that off to a separate (pre-requisite) series. That content seems unaffected by whatever we do in OvmfPkg, and it would help decrease the size of the OvmfPkg series. It's not like we're going to abandon the SEV-SNP feature, so I don't think there's a risk in merging the MdePkg series first, while the OvmfPkg series is not ready. The MdePkg stuff will not remain unused indefinitely. Thanks, Laszlo -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#73966): https://edk2.groups.io/g/devel/message/73966 Mute This Topic: https://groups.io/mt/81584576/1787277 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org] -=-=-=-=-=-=-=-=-=-=-=-
On 4/12/21 11:23 AM, Laszlo Ersek wrote: > On 04/10/21 00:43, Brijesh Singh wrote: >> On 4/9/21 7:24 AM, Laszlo Ersek wrote: >>> On 04/08/21 13:59, Brijesh Singh wrote: >>>> On 4/8/21 4:58 AM, Laszlo Ersek wrote: >>>>> On 03/24/21 16:31, Brijesh Singh wrote: >>>>>> At this time we only support the pre-validation. OVMF detects all >>>>>> the available system RAM in the PEI phase. When SEV-SNP is >>>>>> enabled, the memory is validated before it is made available to >>>>>> the EDK2 core. >>>>> Can you describe this in a bit more detail, before I look at the >>>>> individual patches? Specifically, what existing logic in the PEI >>>>> phase was taken, and extended, and how? >>>> One of the key requirement is that the guest private pages much be >>>> validated before the access. If guest tries to access the pages >>>> before the validation then it will result in #VC >>>> (page-not-validated) exception. To avoid the #VC, we propose the >>>> validating the memory before the access. We will incrementally add >>>> the support to lazy validate (i.e validate on access). >>> What is the primary threat (in simple terms please) that validation >>> is supposed to prevent? >> >> To protect against the memory re-mapping attack the guest pages must >> be validated. The idea is that every guest page can map only to a >> single physical memory page at one time. > I don't understand. How can a given guest page frame map to multiple > host page frames? > > Do you mean that the situation to prevent is when multiple guest page > frames (GPAs) map to the same host page frame, as set up by the > hypervisor? > A malicious hypervisor may attempt to remap validated gpa to a different host frame. The PVALIDATE is designed to protect against those type of attacks. See https://static.sched.com/hosted_files/lsseu2019/65/SEV-SNP%20Slides%20Nov%201%202019.pdf (slide 13). You can also find more information about it in the SEV-SNP whitepaper. >>> And, against that particular threat, does pre-validation offer some >>> protection, or will that only come with lazy validation? >> For the hardware it does not matter how the memory was validated -- >> lazy vs prevalidate. Both approaches use the PVALIDATE instruction to >> validate the page. >> >> In the case of pre-validation, the memory is validated before the >> access. Whereas in the lazy validation, the access will cause a fault >> and fault handler should validate the page and retry the access. Its >> similar to a page fault handler, OS can populate the page table or >> back the pages on demand. >> >> The only downside of pre-validation is, we will take a hit on the boot >> time. The GHCB spec provides method by which we can batch multiple >> requests at once to minimize the context switches. > If pre-validation is simpler, and the only drawback is a one-time hit > during boot, then wouldn't it be better to stick with pre-validation > forever? I expect that would be a lot simpler for (a) the #VC handlers, > (b) the tracking of the "already validated" information. This could be an issue for the larger VMs. The boot delay will vary based on the VM size. In addition to the boot delay, the PVALIDATE instruction requires that there is a valid entry for the page in the nested page table. If the entry does not exist in the NPT then HV will get #NPF and will fill the NPT with the backing host page. By using the lazy-validation we can push allocation of the backing pages to only when required and thus release the memory pressure on the VMM. > > >>>> Similar to SEV, the OVMF_CODE.fd is encrypted through the SNP >>>> firmware before the launch. The SNP firmware also validates the >>>> memory page after encrypting. This allows us to boot the initial >>>> entry code without guest going through the validation process. >>>> >>>> The OVMF reset vector uses few data pages (e.g page table, early Sec >>>> stack). Access to these data pages will result in #VC. There are two >>>> approaches we can take to validate these data pages: >>>> >>>> 1. Ask SNP firmware to pre-validate it -- SNP firmware provides an >>>> special command that can be used to pre-validate the pages without >>>> affecting the measurement. >>> This means the two pflash chips, right? >> >> This does not need to be two pflash chips. The SNP firmware command >> does not know anything about the ROM or pflash chip. The command >> accepts the system physical address that need to be validated by the >> firmware. In patch #2, OVMF provides a range of data pages that need >> to be validated by the SNP firmware before booting the guest. > I guess my question related to the guest code that executes from pflash, > and/or accesses data from pflash, before the guest itself could ask for > any validation. Such as, the reset vector (which runs from pflash). > Then, some non-volatile UEFI variable data that resides in the other > pflash chip, and is accessed (read-only) during the PEI phase (the > memory type information variable namely). I understood your comments as > QEMU pre-validating those areas before guest launch. Is that correct? Yes that is correct. All the OVMF pflash0 memory will be pre-validated before the guest is launched. IIRC, the variables which resides on the other pflash chip are accessed unencrypted in the guest. Only the encrypted memory access (aka C=1) need to be validated. The host MMIO regions does not need to be validated. > > Anyway, I guess at this point I should just start reviewing the series. > > Some more comments below. > > >>> - We need more areas made accessible in SEC than just the >>> decompression buffer; for example the temporary SEC/PEI heap and >>> stack, and (IIRC) various special SEV-ES areas laid out via MEMFD. >>> Where are all of those handled / enumerated? >> >> Sorry, I simplified my response by saying just decompression. You are >> right that its more than the decompression buffer. In my current >> patch, the SEC phase validates all the memory up to >> PcdOvmfDecompressionScratchEnd (which includes more than just output >> buffer). See patch #13. > That sounds good (will check the patch out of course). > > >>>> One of the important thing is we should *never* validate the pages >>>> twice. >>> What are the symptoms / consequences of: >>> >>> - the guest accessing an unvalidated page (I understand it causes a >>> #VC, but what is the direct result of that, when this series is >>> applied?), >> >> After the series it applied, an access to an unvalidated page will >> result in #VC. The series does not extend the VC handler to handle the >> page-not-validated error code. The current VC handler >> [InternalVmgExitHandleVc()] will inject a #GP to terminate the guest >> if #VC is raised for unvalidated page. > OK. > > >>> - doubly-validating a page? >>> >>> The first question is relevant because we should crash as soon as we >>> touch a page we forgot to validate (we shouldn't let any corruption >>> or similar spread out until we finally realize there's a problem). >>> >>> The second question is relevant for security I guess. What attacks >>> become possible, and/or what symptoms are produced, if we >>> doubly-validate a page? >> Assuming that guest validates its memory, this guarantees the >> injective mapping between GPAs and SPAs. >> >> As I noted that the page validation process consist of two steps #1 >> RMPUPDATE and 2#PVALIDATE. The RMPUPDATE is a hypervisor instruction >> and PVALIDATE is a guest instruction. A malicious hypervisor can remap >> gpa to different physical address in RMP table using the RMPUPDATE >> just before the second PVALIDATE instruction is executed. The guest >> need to ensure that it does not leave the security vulnerability that >> can be exploited by a malicious hypervisor. The SNP white paper >> recommends that a guest OS should keep track of what is has validate >> so that it can avoid getting into these situation. > Hm, OK. I think this answers my top-most question. > > >>> Furthermore, IIRC, we have separate #VC handlers for the different >>> firmware phases; do they behave consistently / identicall when a >>> #VC(page-not-validated) occurs, when this patch set is applied? >>> >>> My first question is basically asking whether we can *exclusively* >>> rely on #VC(page-not-validated) faults to tell us that we missed >>> validating a particular page. If we can do that, then the job is a >>> bit easier, because from the GPA, we can more or less also derive >>> *when and where* we should pre-validate the page (at least until >>> validation is done completely lazily). >> Yes, we should be able to rely #VC to tell us that we missed >> validating the page. In my this series so far I have not seen a #VC >> due to page-not-validated because we have carefully validated the >> pages before they are accessed. I am thinking that >> #VC(page-not-validated) should be treated as an error in the first >> phase. Incrementally we will add the lazy validation, in which the >> #VC(page-not-validated) will provide a hint as to what has not been >> validated. > I'm OK with the current handling of #VC(page-not-validated) -- i.e., > fatal error. > > I'm not yet convinced that lazy validation will be useful at all, but > that could be just my lack of understanding. See if my above comment makes sense on why we may need the lazy validation. IMO, we should not be going with the lazy validation in OVMF. To minimize the boot delay and avoid the complexity of the code in the best case OVMF validated lower 4GB memory and mark rest of the memory as "unaccepted" in the UEFI map. David from google has started a thread about lazy validation in kernel https://www.spinics.net/lists/linux-mm/msg243954.html. We would like to find a method which works for both Intel TDX and AMD SEV. We will build these support incrementally both in the kernel and OVMF. > > >>>> The MemEncryptSevSnpValidateRam() uses a interval search tree to >>>> keep the record of what has been validated. Before validating the >>>> range, it lookup in its tree and if it finds that range is already >>>> validated then do nothing. If it detects an overlap then it will >>>> validate only non overlapping regions -- see patch #14. hat data >>>> structure is used for implementing the interval tree? >>> I'm not necessarily looking for a data structure with "nice" >>> asymptotic time complexity. With pre-validation especially, I think >>> simplicity (ease of review) is more important for the data structure >>> than performance. If it's not an actual "tree", we shouldn't call it >>> a "tree". (An "interval tree" is usually an extension of a Red-Black >>> Tree, and that's not the simplest data structure in existence; >>> although edk2 does offer an rbtree library.) >> >> I am using binary search tree. Currently, we don't have many nodes to >> track. Most of the guest memory is private and are validated before we >> exit from the PEI phase. > My preliminary comments about MemEncryptSevSnpValidateSystemRam(): > > (1) The function prototype (patch#12) promises that the function can > never fail. In patch#14 however, I see some AddRangeToIntervalTree() > calls whose return values are not checked. I think error checking should > be strict, and if an error is detected (such as memory allocation > failure), a plain ASSERT (FALSE) is not sufficient. Both ASSERT() and > CpuDeadLoop() should be used explicitly, for the sake of RELEASE builds. > And the function prototype (the leading comment) should highlight that > the function either succeeds, or doesn't return. > > > (2) I don't like the recursive nature of the functions. That's always > convenient first, and almost always a mess to get out of later. However, > this is not a request to complicate the tree's implementation, > because... > > > (3) ... you mention we don't have many nodes to track. Is a tree > structure, with dynamically allocated nodes, justified at all? > > > (4) Regarding repeating dynamic allocation -- that's never a great thing > to do in the SEC / PEI phases. Can we use a fixed size, once-allocated, > or even static, array somewhere? > > In particular, patch#14 uses AllocatePool() for SNP_VALIDATED_RANGE, in > the PEI phase, that results in EFI_HOB_TYPE_MEMORY_POOL HOBs; see: > > AllocatePool() [MdePkg/Library/PeiMemoryAllocationLib/MemoryAllocationLib.c] > PeiServicesAllocatePool() [MdePkg/Library/PeiServicesLib/PeiServicesLib.c] > PeiAllocatePool() [MdeModulePkg/Core/Pei/Memory/MemoryServices.c] > > Now, dependent on where you call this AllocatePool() from, it could > occur still on the temporary SEC/PEI RAM -- i.e., before permanent PEI > RAM is installed. That means that such HOBs would be subject to the HOB > list migration, after permanent PEI RAM is installed. The PI spec writes > about EFI_HOB_MEMORY_POOL: "The HOB consumer phase should be able to > ignore these HOBs". The PI spec writes about AllocatePool(): "The early > allocations from temporary memory will be migrated to permanent memory > when permanent main memory is installed; this migration shall occur when > the HOB list is migrated to permanent memory". > > So, whenever we use AllocatePool() in the PEI phase, we need an "address > stability proof", namely that the AllocatePool() call, and any later > access to the allocated area (the HOB), are never separated by the > installation of "gEfiPeiMemoryDiscoveredPpiGuid" in the PPI database, by > the PEI Core. If we can't guarantee that, then an access later on could > occur to the original allocation address, which has by then been > invalidated by the temporary RAM migration (= the allocation HOB has > been moved). > > AllocatePages() is "stable" however, in this sense; no proof like the > one described above is necessary. > > > (5) The "mRootNode" variable is defined in the file > "OvmfPkg/Library/BaseMemEncryptSevLib/X64/PeiSnpSystemRamValidate.c" > (patch #14). That means that every single module (i.e., PEIM) which > consumes this library instance will own a separate "mRootNode" variable, > with a validation-tracking tree that's correspondingly private to that > PEIM. > > But the validation state is global to the entire firmware. I think we > should somehow clarify, or even enforce, that the tracking data > structure is firmware-global. > > Perhaps this can be solved by (a) moving the ownership of the data > structure to a particular firmware module, with sole responsibility for > pre-validation, and (b) updating the validation APIs to take this data > structure explicitly. > > Alternatively, if multiple modules are expected to inter-operate on the > tracker data structure (by which I mean inter-operation *in sequence*, > not concurrently in a multiprocessing sense), then we might need a > dynamic PCD for inter-driver understanding. > > > (NB: both points (4) and (5) seem to conflict with *existent* code in > BaseMemEncryptSevLib. However: > > - concerning (4), the current allocations are all performed with > AllocateAlignedPages(), which are unaffected by temporary-to-permanent > RAM migration, > > - and concerning (5), namely the existent "mPageTablePool" global > variable, is a *known bug* -- see commit b721aa749b86a and > TianoCore#847. This issue (namely, multiple modules messing with the > page tables independently) remains unsolved, and I'd like to avoid > introducing *more* paging-related allocations that are module-owned, > and not firmware-level.) > > > All in all, here's what I'd prefer (and please correct me if it's > unrealistic): > > - make the SNP validation functions take a simple, *unresizeable* > tracker structure for input/output, > > - allocate that data structure with a single AllocatePages() call in the > proper PEIM, > > - if the SNP validation functions run out of space in the structure, > hang hard, > > - avoid recursion, and use a simple linear search through the array, > > - evaluate whether graceful handling of overlaps (= range splitting, > merging) is actually a requirement. See more on this below. Yes this will work fine. The primary reason for me adding the interval tree was to detect the overlap condition before we finalize the pre-validation in PEI. The validation flow works like this: - Qemu validated few pages - SEC validates few more page - When we detect the memory in the PEI we need to exclude the pages validated by the Qemu and SEC. The page ranges validated using the Qemu and SEC are accessible through a fixed PCDs. I can do simple overlap check with those fixed PCDs and skip the pre-validated ranges. I don't have any strong reason to use the interval tree to detect the overlap condition. There is no call to toggle the C-bit before we finalize the validation. IMO, there is no need for tracking the page state after we finish the pre-validation. After the pre-validtion is completed, the caller will use either clear the C-bit or set the C-bit. If it attempts to set the C-bit without clearing it first then its a bug -- which will be caught by the page state change internal function. We can use the output of CF flag from the PVALIDATE to determine where caller is trying to do doubly validation or invalidation and abort it. > > (6) Can you please confirm that the order in which > MemEncryptSevSnpValidateSystemRam()'s declaration, call sites, and > implementations are introduced, is sensible? I can't tell immediately, > but I'd like to be sure that the tree at least builds at every stage (no > linkage errors at any stage). I can certainly say the order in which they are introduced and the call sites are a sensible. I don't expect any linkage failure but since it was an RFC so I didn't go through making sure that it builds in every stage. I wanted to get the overall direction on where community would like to go before making the final changes. You have been asking some very good question and that is certainly helping us to reduce some of the complexity in the patch. > >>> Furthermore, what you describe above is called idempotency. No matter >>> how many times we attempt to validate a range, it may (or may not >>> even) cause an actual change in the first action only. Is this >>> property (=idempotency) an inherent requirement of the technology, or >>> is it a simplification of the implementation? Put differently: if you >>> called CpuDeadLoop() in the validation function any time an >>> overlapping validation request were received, would that hugely >>> complicate the call sites? >> From the technology point of view you can validate the same page again >> and again. Hardware does not force any restriction. To protect against >> time-based remap attack a guest also need to ensure that it does not >> blindly validates the pages otherwise guest is taking risk. Avoiding a >> double validation is strong recommendation. In current implementation, >> the validation burden is not put on the caller. Caller calls standard >> APIs to toggle the encryption attribute. Under the hood, if we see >> that page is already validated then no need to issue the PVALIDATE. If >> the page is getting changed from C=1 -> 0 then unvalidate it. > There remains a huge gap in my understanding here. (The rest of my > earlier questions did concern the relationship between the C bit and > validation, but I might as well comment right at this spot.) > > "Invalidation in connection to C-bit toggling" means that there's going > to be significant "validation churn" in the DXE phase. For example when > fw_cfg is used (which relies on the edk2 IOMMU protocol), or when virtio > transfers are performed (which also relies on the edk2 IOMMU protocol, > although less directly -- through PciIo, namely). > > That, however, brings up several problems for me: > > - Is the tracker data structure scalable? I don't see any balancing in > it. (And I wouldn't want a new, complex data structure for it anyway.) > > - In fact, I don't see any code related to invalidation -- more > precisely, I don't see where and how the C-bit changes consult the > tracker structure *at all*. > > - Given issue (5), I don't see how the tracker structure would be > inherited from the PEI phase to the DXE phase. > > In particular, in patch#18, the flipping of the C-bit results in calls > to SetPageStateInternal(). But SetPageStateInternal() does not go > *through* the tracker structure. Even in patch#14, > SevSnpValidateSystemRam() calls SetPageStateInternal() *in addition to* > AddRangeToIntervalTree(). > > In other words, I don't see where validity is tracked in the DXE phase, > in response to the C-bit toggling. As I highlighted above there is actually no need to track the page state after we successfully complete the pre-validation. All these guest page validation or invalidation are applicable to the system RAM. But AmdSevDxe driver calls to clear the C-bit of the MMIO range. These range are not a system and have never been validated so we invalidating it will cause an issue. Since I had the data structures available in PEI phase for the tracking the page state hence made those available to DXE to verify that we are called to invalidate the SYSTEM_RAM and not MMIO. IMO, we should either extend the MemEncryptSev{Set,Clear}PageEncMask() to pass either a new flag to hint where its system RAM or non-RAM. Thoughts ? > >>> I'm kind of "obsessing" about idempotency because you say we must >>> *never* doubly-validate a page, so the *difference* between: >>> - explicitly crashing on such an attempt, >>> - and silently ignoring such an attempt, >>> may be meaningful. >> >> I said we should never doubly validate to avoid creating a >> vulnerability that can later be exploited by the malicious hypervisor. >> Currently we lookup address in our tree to ensure that we don't do a >> double validation. The PVALIDATE instruction also can be used to >> determine if we are attempting to validate an already validated page. >> In current implementation after the PVALIDATE completion, I check the >> rflags.CF to ensure that its not a double validation condition. If its >> double validation then abort the guest. Its bug in the guest. >> >>> It's kind of the difference between "oops this is a call site *bug*, >>> but we patched it up", vs. "this is expected of call sites, we should >>> just handle it internally". >> Our attempt should be to patch to avoid the double validation. I would >> still check the status of PVALIDATE, if instruction detects it as a >> double validation then we have a bug in our tracking and should abort >> the guest. > Let me rephrase. My question concerns the *abstraction level* at which > we should catch and handle double-validation attempts. > > If we silently split and merge ranges in the tracker structure, then the > guest (as a whole) will not double-validate, which is good. However, > double-validation attempts will potentially still *exist*, in > higher-level modules in the firmware. My question is whether it is > *right* to paper over such overlaps, coming from the high-level code, in > the tracker structure. I suspect that it's not right. Yes we should bail out if such a request comes from the high level module. This is why I was actually not checking the tracking structure when toggling the C-bit. The complete memory much have been validated before we are asked to toggle the C-bit. If a module is performing a doubly validation or invalidation then it should be aborted to avoid any future exploits. > > In other words, the spot where we should abort, due to > "double-validation detected", is not when the PVALIDATE instruction > reports a problem in the tracker structure. Instead, the spot where we > should abort is when we detect an overlapping request *before* issuing > PVALIDATE. I'd like the tracker to be as simple and dumb as possible, > and I'd like the actual sites that request validation to blow up > immediately, if they issue an overlapping request. Is my expectation > wrong? > If we go with the tracking only until we complete the pre-validation then we can use the hardware PVALIDATE help to detect the doubly validation and abort it. There is no need to check the overlap conditions. >>>> The patch #18 extend the MemEncrypt{Set,Clear}PageEncMask() to call >>>> the SNP page state change during the C-bit toggle. >>> - When exactly do we invalidate? >> Basically before the page is transition from C=1 -> 0 in the page >> table. >> >>> - Does the time and place of invalidation depend on whether we >>> perform pre-validation, or lazy validation? >> It does not matter. Both type of validation update the Validated bit >> in the RMP table. The important thing is that invalidation can happen >> only for a already validated private page. If the page is shared then >> invalidation will result in #GP. >>> - Is page invalidation idempotent too? >> We should treat this as idempotent too. As I said, tacking the page >> state is strongly recommended for the security reason. The SNP white >> paper has a section dedicated for the validation and may able to >> clarify things which is not covered in my response. >>> - What is the consequence (security impact) if we forget >>> invalidation? >> I can't think of a security implication of it but if we forget to >> invalidate then it can lead issues in the tracking data structure when >> it comes to validation next time. e.g a page was mapped as C=1 and we >> changed it to C=0 without invalidating it. Now if the same page is >> being changed from C=0 -> 1 then tracking data structure may think >> that page is already validated and will skip the validation process >> and thus access will result in #VC (page-not-validated). > As I said above, I don't see where the validation tracking and the C-bit > flipping "meet". Especially across the PEI and DXE phases. > > Basically I'm missing the core concept for maintaining a firmware-global > structure, for tracking the validation status of GPA intervals. > > Up to and *excluding* the DXE phase, the validation tracking should be > trivial, naive, and unforgiving. This is because, in the PEI phase, > validation is one-shot. I would expect the tracker structure to live in > the PlatformPei module. > > Starting with the DXE phase, where we need (in)validation to closely > accompany C-bit toggling, we need a different, performant data structure > for tracking validation status. This data structure should be primed > from the status last set up in the PEI phase. The tracking should still > be unforgiving. I believe *this* tracker structure may have to live in > the IOMMU driver. > > BaseMemEncryptSevLib may offer helper functions, but even if it does, it > should never attempt to *own* tracking information, via an internal > global variable. Furthermore, those helper functions that only make > sense for a particular firmware phase, should have such implementations > that hang hard, in library instances that match the *other* firmware > phases. For example, SevSnpValidateSystemRam() should have a > DXE-specific implementation that hangs hard. > > It's entirely possible that my points above are complete garbage. I > don't have a grasp on the core concept, for the firmware-wide tracking > of the validation status. > > ... Ugh, wait. I've just noticed something. In patch#19, > "SEC_SEV_ES_WORK_AREA.SnpSystemRamValidatedRootAddress" is introduced, > which seems (?) to be implementing the kind of global (inter-driver, > inter-phase) tracking that I've been missing. However, even if my guess > (?) is correct about that, this idea definitely doesn't belong in the > last patch, presented as a "bugfix". Even if it were a bugfix, it should > be incorporated into one of the earlier patches (so that we preferably > not introduce the bug in the first place). However, my argument is that > it's not a bugfix -- to me that seems to be the core concept. That's > what the whole series needs to be built upon. (In all of my thinking, > before this particular paragraph, I *never* looked at patch#19. There > was no reason to.) It was my bad that I should have highlighted this before and not left it to the last patch. As you can see the only reason pass the data structure from PEI to DXE is to catch those non SYSTEM invalidation request coming from the AmdSev Driver. I am too happy about the solution I have in this RFC. I want to go with option to change the call sites to tell us whether its a system RAM or non existence memory type. > >>> - There are four page states I can imagine: >>> - encrypted for host access, valid for guest access >>> - encrypted for host access, invalid for guest access >>> - decrypted for host access, valid for guest access >>> - decrypted for host access, invalid for guest access >>> >>> Does a state exist, from these four, that's never used? (Potentially >>> caught by the hardware?) >> I would not mix the page state with the host access. Basically there >> is two page state exist >> >> - Guest-Valid >> >> - Guest-Invalid >> >> By default all the guest memory is in Guest-Invalid state on boot. The >> PVALIDATE instruction is used to transition from Valid->Invalid. > Do you mean "Invalid->Valid" transition? Ah yes that is correct. > > >> Guest can use the PVALIDATE to change the state from Valid to Invalid. >> >> A guest private page (i.e page mapped C=1) must be in Guest Valid >> state. If hardware see that a private access page is not in the >> Guest-Valid state then it will trigger #VC. >> >> >>> Do the patches highlight / explain the validity transitions, in >>> comments? My understanding is that the C-bit toggle and the guest >>> access valid/invalid toggle are separate actions, so "middle" states >>> do exist, but perhaps only temporarily. >> I have tried to document the steps in the patch description and I can >> add more comments as required. When SEV-SNP is active then we need to >> invalidate the page before toggling the C-bit from C=1 -> 0 and >> similarly after the page is toggled from C=0 -> 1 we must validate it. >> >> >>> I'm curious how it works, for example, with variousvirtio transfers >>> (bus master read, bus master write, bus master common buffer). In the >>> IoMmuDxe driver minimally, we access memory both through PTEs with >>> the C-bit set and through PTEs with the C-bit clear, meaning that >>> "encrypted, valid", and "decrypted, valid" are both required states. >>> But that seems to conflict with the notion that "C-bit toggle" be >>> directly coupled with a "validity toggle". >> That should not be an issue. Each page have entry in the RMP table. >> So, you can have one page as guest invalid and another page as a guest >> valid. >> >>> Put differently, I'm happy that modifying IoMmuDxe appears >>> unnecessary, but then that tells me I'm missing something about the >>> state transitions between the above *four* states. >> >> Only thing which I would propose changing in IoMmuDxe driver is to use >> a pre-allocated buffer and avoid toggling the C-bit on every >> read/write. In past the C-bit toggle was all contained inside the >> guest. But when SNP is enabled, each C-bit toggle causes two >> additional steps #1 RMPUPDATE and #2 PVALIDATE. The RMPUPDATE is >> requested using the VMEXIT so it will cause a VM context switch. I >> would prefer to avoid it. >> >> My current patch set does not have this optimization yet. > Please let's stick with the absolute minimum in this series, yes. > > I realize my email is complete chaos. That's because my understanding of > this work is complete chaos. > > Can we approach this feature as follows: > > - Make "validation tracking" the core tenet (central principle) of this > feature. > > - Design the validation *patterns* that are required in each firmware > phase. > > - Identify the *sole* owner module, for validation tracking, in each > firmware phase. If we can't readily do this, we might have to > introduce new drivers (PEIMs, DXE drivers?) and new interfaces (PPIs, > protocols?) > > - Design the hand-off between the owner modules (on firmware phase > boundaries). > > - Add only helper functions to BaseMemEncryptSevLib, without any > ownership of tracking info. The helpers should be as restricted as > possible. > > I'd like to avoid "smearing" page validation responsibility over a bunch > of modules. (See again: TianoCore#847.) I'd like responsibilities > strictly centralized. > > If you feel that my points make no sense, I can start going over the > individual patches. However, even that way, I'd have found patch#19 only > in the end, and that doesn't seem right. And I wouldn't like to go > "numb" on the actual code, being distracted by style issues and so on, > before we have some agreement on the core approach. Certainly your points makes sense. Let me know what you think about removing the tracking data structure and using a liner array of pre-valiated range (build with a Fixed PCD) works ? I was hoping you to just glance over it and provide me high level feedback. I will go through styles and comments in great details on non RFC patches. I still have some TODO items (e.g Secrets, CPUID etc) before we take off the RFC tag. Thank you so much for your detail feedback. > > Another tip: if you can add content to MdePkg that is governed by AMD > specifications, such as patches #3, #5, #9, I would recommend splitting > that off to a separate (pre-requisite) series. That content seems > unaffected by whatever we do in OvmfPkg, and it would help decrease the > size of the OvmfPkg series. It's not like we're going to abandon the > SEV-SNP feature, so I don't think there's a risk in merging the MdePkg > series first, while the OvmfPkg series is not ready. The MdePkg stuff > will not remain unused indefinitely. Ah, sure I can do that. -Brijsh -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#73985): https://edk2.groups.io/g/devel/message/73985 Mute This Topic: https://groups.io/mt/81584576/1787277 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org] -=-=-=-=-=-=-=-=-=-=-=-
On 04/12/21 22:14, Brijesh Singh wrote: > > On 4/12/21 11:23 AM, Laszlo Ersek wrote: >> On 04/10/21 00:43, Brijesh Singh wrote: >>> On 4/9/21 7:24 AM, Laszlo Ersek wrote: >>>> What is the primary threat (in simple terms please) that validation >>>> is supposed to prevent? >>> >>> To protect against the memory re-mapping attack the guest pages must >>> be validated. The idea is that every guest page can map only to a >>> single physical memory page at one time. >> I don't understand. How can a given guest page frame map to multiple >> host page frames? >> >> Do you mean that the situation to prevent is when multiple guest page >> frames (GPAs) map to the same host page frame, as set up by the >> hypervisor? >> > A malicious hypervisor may attempt to remap validated gpa to a > different host frame. The PVALIDATE is designed to protect against > those type of attacks. See > https://static.sched.com/hosted_files/lsseu2019/65/SEV-SNP%20Slides%20Nov%201%202019.pdf > (slide 13). You can also find more information about it in the SEV-SNP > whitepaper. Thanks for the link. Page 7 confirms my understanding: """ Memory Aliasing Example attack: Map two guest pages to same DRAM page """ (On the other hand, slide 12 seems to be buggy, "Page validation guarantees that each GPA only maps to one valid SPA at any given time" -- the GPA:SPA relationship is n:1, so any given GPA could never ever map to multiple SPAs, regardless of SEV-SNP. The multiplicity is possible in the other direction (a single SPA is exposed as multiple GPAs, maybe in a single guest, maybe in multiple guests), and that's what SEV-SNP is supposed to prevent. The GPA->SPA mapping must be injective, but that's not what the words on the slide actually express.) The page remapping slide (13) does not necessarily break injectivity, and even if the hypervisor does that, the same GPA *still* corresponds to a single SPA, at this new ("one") time. It's just that the correspondence is not what the guest has approved. Expressing this correctly would require a temporal logic formula of sorts. Anyway... at least now I (believe I) understand that these are two separate attacks (aliasing vs. remapping). >> If pre-validation is simpler, and the only drawback is a one-time hit >> during boot, then wouldn't it be better to stick with pre-validation >> forever? I expect that would be a lot simpler for (a) the #VC >> handlers, (b) the tracking of the "already validated" information. > > This could be an issue for the larger VMs. The boot delay will vary > based on the VM size. In addition to the boot delay, the PVALIDATE > instruction requires that there is a valid entry for the page in the > nested page table. If the entry does not exist in the NPT then HV will > get #NPF and will fill the NPT with the backing host page. By using > the lazy-validation we can push allocation of the backing pages to > only when required and thus release the memory pressure on the VMM. Ugh... OK, I guess. I'd need to digest this a lot more, for a fully informed agreement, but I'm happy with this answer (and my rudimentary understanding thereof) for now. >> I guess my question related to the guest code that executes from >> pflash, and/or accesses data from pflash, before the guest itself >> could ask for any validation. Such as, the reset vector (which runs >> from pflash). Then, some non-volatile UEFI variable data that resides >> in the other pflash chip, and is accessed (read-only) during the PEI >> phase (the memory type information variable namely). I understood >> your comments as QEMU pre-validating those areas before guest launch. >> Is that correct? > > > Yes that is correct. All the OVMF pflash0 memory will be pre-validated > before the guest is launched. IIRC, the variables which resides on the > other pflash chip are accessed unencrypted in the guest. Only the > encrypted memory access (aka C=1) need to be validated. The host MMIO > regions does not need to be validated. Good point. Thanks for the explanation / reminder! > See if my above comment makes sense on why we may need the lazy > validation. IMO, we should not be going with the lazy validation in > OVMF. To minimize the boot delay and avoid the complexity of the code > in the best case OVMF validated lower 4GB memory and mark rest of the > memory as "unaccepted" in the UEFI map. I'm going to violently support any idea that keeps things simple for OVMF -- is this "lower 4GB" what the current patch set does already? Or is it the proposed next step? > David from google has started a thread about lazy validation in kernel > https://www.spinics.net/lists/linux-mm/msg243954.html. We would like > to find a method which works for both Intel TDX and AMD SEV. We will > build these support incrementally both in the kernel and OVMF. Ah, OK. That explains it. Next step then. Hmm, yes, patch#15 seems to pre-validate everything now. OK. > Yes this will work fine. The primary reason for me adding the interval > tree was to detect the overlap condition before we finalize the > pre-validation in PEI. The validation flow works like this: > > - Qemu validated few pages > > - SEC validates few more page > > - When we detect the memory in the PEI we need to exclude the pages > validated by the Qemu and SEC. > > The page ranges validated using the Qemu and SEC are accessible > through a fixed PCDs. I can do simple overlap check with those fixed > PCDs and skip the pre-validated ranges. I don't have any strong reason > to use the interval tree to detect the overlap condition. There is no > call to toggle the C-bit before we finalize the validation. Thank you. This is a *huge* relief. > IMO, there is no need for tracking the page state after we finish the > pre-validation. After the pre-validtion is completed, the caller will > use either clear the C-bit or set the C-bit. If it attempts to set the > C-bit without clearing it first then its a bug -- which will be caught > by the page state change internal function. We can use the output of > CF flag from the PVALIDATE to determine where caller is trying to do > doubly validation or invalidation and abort it. Sounds good. Even in case the final PVALIDATE action proves incorrect, if we catch that immediately via CF, and refuse to do anything else (i.e. we won't go ahead and use the page(s) in question), things should be OK. >> (6) Can you please confirm that the order in which >> MemEncryptSevSnpValidateSystemRam()'s declaration, call sites, and >> implementations are introduced, is sensible? I can't tell >> immediately, but I'd like to be sure that the tree at least builds at >> every stage (no linkage errors at any stage). > > I can certainly say the order in which they are introduced and the > call sites are a sensible. I don't expect any linkage failure but > since it was an RFC so I didn't go through making sure that it builds > in every stage. I wanted to get the overall direction on where > community would like to go before making the final changes. You have > been asking some very good question and that is certainly helping us > to reduce some of the complexity in the patch. Thanks. I've been very stressed as to whether my questions or "proposals" make any sense. And yes, it's a desirable trait for a patch set to build at every stage (bisectability). > As I highlighted above there is actually no need to track the page > state after we successfully complete the pre-validation. Again, I welcome this very much. > All these guest page > validation or invalidation are applicable to the system RAM. But > AmdSevDxe driver calls to clear the C-bit of the MMIO range. These > range are not a system and have never been validated so we > invalidating it will cause an issue. Understood. > Since I had the data structures available in PEI > phase for the tracking the page state hence made those available to > DXE to verify that we are called to invalidate the SYSTEM_RAM and not > MMIO. IMO, we should either extend the > MemEncryptSev{Set,Clear}PageEncMask() to pass either a new flag to > hint where its system RAM or non-RAM. Thoughts ? I think I'd prefer a new function in the lib class, one that's dedicated to clearing the C bit on MMIO without attempting invalidation. It's a special-case function, and the dedicated name helps with two things: - no need to mess with existing call sites except in AmdSevDxe, - the new function is easier to grep for. BTW, there would be at least one other approach for this, I think -- the PEI instance of the library could consult the HOB list, and the DXE instance of the library could consult the GCD memory space map, to decide whether the page being C-bit-flipped is MMIO or not. But that's a lot of complexity (and likely some performance hit too), when in fact we know at the call sites whether the area being encrypted/decrypted is MMIO or RAM. Note also that (IIUC) in AmdSevDxe, we only *decrypt*, so we only need *one* new function, "MemEncryptSevClearMmioPageEncMask" or similar. BTW, in the "MemEncryptSevLib.h" header, the documentation of the MemEncryptSev{Set,Clear}PageEncMask function declarations should be updated -- the leading comments should explain that, in case SEV-SNP is found active, then page (in)validation will occur too (as appropriate). > Yes we should bail out if such a request comes from the high level > module. This is why I was actually not checking the tracking structure > when toggling the C-bit. The complete memory much have been validated > before we are asked to toggle the C-bit. If a module is performing a > doubly validation or invalidation then it should be aborted to avoid > any future exploits. Great, thank you. > If we go with the tracking only until we complete the pre-validation > then we can use the hardware PVALIDATE help to detect the doubly > validation and abort it. There is no need to check the overlap > conditions. Again, great. In the pre-validation step, where you explicitly carve out the PCD-described ranges that have been handled already by QEMU and earlier phases of the firmware, you can still check the PVALIDATE result (CF), just to be sure. > Certainly your points makes sense. Let me know what you think about > removing the tracking data structure and using a liner array of > pre-valiated range (build with a Fixed PCD) works ? I couldn't have hoped for such a great resolution to this conundrum. Yes, please do that. > I was hoping you to just glance over it and provide me high level > feedback. I will go through styles and comments in great details on > non RFC patches. I still have some TODO items (e.g Secrets, CPUID etc) > before we take off the RFC tag. Thank you so much for your detail > feedback. Well, this is the first email on the SEV-SNP topic where what I feel can be described as "relief" :) Thank you! Laszlo -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#74041): https://edk2.groups.io/g/devel/message/74041 Mute This Topic: https://groups.io/mt/81584576/1787277 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org] -=-=-=-=-=-=-=-=-=-=-=-
>> Since I had the data structures available in PEI >> phase for the tracking the page state hence made those available to >> DXE to verify that we are called to invalidate the SYSTEM_RAM and not >> MMIO. IMO, we should either extend the >> MemEncryptSev{Set,Clear}PageEncMask() to pass either a new flag to >> hint where its system RAM or non-RAM. Thoughts ? > I think I'd prefer a new function in the lib class, one that's dedicated > to clearing the C bit on MMIO without attempting invalidation. It's a > special-case function, and the dedicated name helps with two things: > > - no need to mess with existing call sites except in AmdSevDxe, > - the new function is easier to grep for. > > BTW, there would be at least one other approach for this, I think -- the > PEI instance of the library could consult the HOB list, and the DXE > instance of the library could consult the GCD memory space map, to > decide whether the page being C-bit-flipped is MMIO or not. But that's a > lot of complexity (and likely some performance hit too), when in fact we > know at the call sites whether the area being encrypted/decrypted is > MMIO or RAM. Sound good to me, will introduce new function for the MMIO in v2. > Note also that (IIUC) in AmdSevDxe, we only *decrypt*, so we only need > *one* new function, "MemEncryptSevClearMmioPageEncMask" or similar. > > BTW, in the "MemEncryptSevLib.h" header, the documentation of the > MemEncryptSev{Set,Clear}PageEncMask function declarations should be > updated -- the leading comments should explain that, in case SEV-SNP is > found active, then page (in)validation will occur too (as appropriate). Noted. -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#74094): https://edk2.groups.io/g/devel/message/74094 Mute This Topic: https://groups.io/mt/81584576/1787277 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org] -=-=-=-=-=-=-=-=-=-=-=-
© 2016 - 2024 Red Hat, Inc.