arch/x86/boot/compressed/sev.c | 67 +++++--------------- 1 file changed, 15 insertions(+), 52 deletions(-)
From: Ard Biesheuvel <ardb@kernel.org>
Communicating with the hypervisor using the shared GHCB page requires
clearing the C bit in the mapping of that page. When executing in the
context of the EFI boot services, the page tables are owned by the
firmware, and this manipulation is not possible.
So switch to a different API for accepting memory in SEV-SNP guests, one
which is actually supported at the point during boot where the EFI stub
may need to accept memory, but the SEV-SNP init code has not executed
yet.
For simplicity, also switch the memory acceptance carried out by the
decompressor when not booting via EFI - this only involves the
allocation for the decompressed kernel, and is generally only called
after kexec, as normal boot will jump straight into the kernel from the
EFI stub.
Cc: Tom Lendacky <thomas.lendacky@amd.com>,
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>,
Cc: Dionna Amalie Glaze <dionnaglaze@google.com>,
Cc: Kevin Loughlin <kevinloughlin@google.com>
Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
Changes since v2 [0]:
- avoid two separate acceptance APIs; instead, use MSR based page-by-page
acceptance for the decompressor as well
[0] https://lore.kernel.org/all/20250404082921.2767593-8-ardb+git@google.com/T/#m38389f607accd0cfa83c41c3bd0d410514b023c6
arch/x86/boot/compressed/sev.c | 67 +++++---------------
1 file changed, 15 insertions(+), 52 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index bb55934c1cee..89ba168f4f0f 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -164,10 +164,7 @@ bool sev_snp_enabled(void)
static void __page_state_change(unsigned long paddr, enum psc_op op)
{
- u64 val;
-
- if (!sev_snp_enabled())
- return;
+ u64 val, msr;
/*
* If private -> shared then invalidate the page before requesting the
@@ -176,6 +173,9 @@ static void __page_state_change(unsigned long paddr, enum psc_op op)
if (op == SNP_PAGE_STATE_SHARED)
pvalidate_4k_page(paddr, paddr, false);
+ /* Save the current GHCB MSR value */
+ msr = sev_es_rd_ghcb_msr();
+
/* Issue VMGEXIT to change the page state in RMP table. */
sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
VMGEXIT();
@@ -185,6 +185,9 @@ static void __page_state_change(unsigned long paddr, enum psc_op op)
if ((GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP) || GHCB_MSR_PSC_RESP_VAL(val))
sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
+ /* Restore the GHCB MSR value */
+ sev_es_wr_ghcb_msr(msr);
+
/*
* Now that page state is changed in the RMP table, validate it so that it is
* consistent with the RMP entry.
@@ -195,11 +198,17 @@ static void __page_state_change(unsigned long paddr, enum psc_op op)
void snp_set_page_private(unsigned long paddr)
{
+ if (!sev_snp_enabled())
+ return;
+
__page_state_change(paddr, SNP_PAGE_STATE_PRIVATE);
}
void snp_set_page_shared(unsigned long paddr)
{
+ if (!sev_snp_enabled())
+ return;
+
__page_state_change(paddr, SNP_PAGE_STATE_SHARED);
}
@@ -223,56 +232,10 @@ static bool early_setup_ghcb(void)
return true;
}
-static phys_addr_t __snp_accept_memory(struct snp_psc_desc *desc,
- phys_addr_t pa, phys_addr_t pa_end)
-{
- struct psc_hdr *hdr;
- struct psc_entry *e;
- unsigned int i;
-
- hdr = &desc->hdr;
- memset(hdr, 0, sizeof(*hdr));
-
- e = desc->entries;
-
- i = 0;
- while (pa < pa_end && i < VMGEXIT_PSC_MAX_ENTRY) {
- hdr->end_entry = i;
-
- e->gfn = pa >> PAGE_SHIFT;
- e->operation = SNP_PAGE_STATE_PRIVATE;
- if (IS_ALIGNED(pa, PMD_SIZE) && (pa_end - pa) >= PMD_SIZE) {
- e->pagesize = RMP_PG_SIZE_2M;
- pa += PMD_SIZE;
- } else {
- e->pagesize = RMP_PG_SIZE_4K;
- pa += PAGE_SIZE;
- }
-
- e++;
- i++;
- }
-
- if (vmgexit_psc(boot_ghcb, desc))
- sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
-
- pvalidate_pages(desc);
-
- return pa;
-}
-
void snp_accept_memory(phys_addr_t start, phys_addr_t end)
{
- struct snp_psc_desc desc = {};
- unsigned int i;
- phys_addr_t pa;
-
- if (!boot_ghcb && !early_setup_ghcb())
- sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
-
- pa = start;
- while (pa < end)
- pa = __snp_accept_memory(&desc, pa, end);
+ for (phys_addr_t pa = start; pa < end; pa += PAGE_SIZE)
+ __page_state_change(pa, SNP_PAGE_STATE_PRIVATE);
}
void sev_es_shutdown_ghcb(void)
--
2.49.0.504.g3bcea36a83-goog
On Thu, Apr 10, 2025 at 03:28:51PM +0200, Ard Biesheuvel wrote: > From: Ard Biesheuvel <ardb@kernel.org> > > Communicating with the hypervisor using the shared GHCB page requires > clearing the C bit in the mapping of that page. When executing in the > context of the EFI boot services, the page tables are owned by the > firmware, and this manipulation is not possible. > > So switch to a different API for accepting memory in SEV-SNP guests, one That being the GHCB MSR protocol, it seems. And since Tom co-developed, I guess we wanna do that. But then how much slower do we become? And nothing in here talks about why that GHCB method worked or didn't work before and that it is ok or not ok why we're axing that off. I'm somehow missing that aspect of why that change is warranted... Thx.
On Fri, 11 Apr 2025 at 20:40, Borislav Petkov <bp@alien8.de> wrote: > > On Thu, Apr 10, 2025 at 03:28:51PM +0200, Ard Biesheuvel wrote: > > From: Ard Biesheuvel <ardb@kernel.org> > > > > Communicating with the hypervisor using the shared GHCB page requires > > clearing the C bit in the mapping of that page. When executing in the > > context of the EFI boot services, the page tables are owned by the > > firmware, and this manipulation is not possible. > > > > So switch to a different API for accepting memory in SEV-SNP guests, one > > That being the GHCB MSR protocol, it seems. > Yes. > And since Tom co-developed, I guess we wanna do that. > > But then how much slower do we become? > Non-EFI stub boot will become slower if the memory that is used to decompress the kernel has not been accepted yet. But given how heavily SEV-SNP depends on EFI boot, this typically only happens on kexec, as that is the only boot path that goes through the traditional decompressor. > And nothing in here talks about why that GHCB method worked or didn't > work before and that it is ok or not ok why we're axing that off. > ---%<--- The GHCB shared page method never worked for accepting memory from the EFI stub, but this is rarely needed in practice: when using the higher level page allocation APIs, the firmware will make sure that memory is accepted before it is returned. The only use case for explicit memory acceptance by the EFI stub is when populating the 'unaccepted memory' bitmap, which tracks unaccepted memory at a 2MB granularity, and so chunks of unaccepted memory that are misaligned wrt that are accepted without being allocated or used. ---%<--- > I'm somehow missing that aspect of why that change is warranted... > This never worked correctly for SEV-SNP, we're just lucky the firmware appears to accept memory in 2+ MB batches and so these misaligned chunks are rare in practice. Tom did manage to trigger it IIUC by giving a VM an amount of memory that is not a multiple of 2M.
On 4/11/25 14:00, Ard Biesheuvel wrote: > On Fri, 11 Apr 2025 at 20:40, Borislav Petkov <bp@alien8.de> wrote: >> >> On Thu, Apr 10, 2025 at 03:28:51PM +0200, Ard Biesheuvel wrote: >>> From: Ard Biesheuvel <ardb@kernel.org> >>> >>> Communicating with the hypervisor using the shared GHCB page requires >>> clearing the C bit in the mapping of that page. When executing in the >>> context of the EFI boot services, the page tables are owned by the >>> firmware, and this manipulation is not possible. >>> >>> So switch to a different API for accepting memory in SEV-SNP guests, one >> >> That being the GHCB MSR protocol, it seems. >> > > Yes. > >> And since Tom co-developed, I guess we wanna do that. >> >> But then how much slower do we become? >> > > Non-EFI stub boot will become slower if the memory that is used to > decompress the kernel has not been accepted yet. But given how heavily > SEV-SNP depends on EFI boot, this typically only happens on kexec, as > that is the only boot path that goes through the traditional > decompressor. Some quick testing showed no significant differences in kexec booting and testing shows everything seems to be good. But, in testing with non-2M sized memory (e.g. a guest with 4097M of memory) and without the change to how SNP is detected before sev_enable() is called, we hit the error path in arch_accept_memory() in arch/x86/boot/compressed/mem.c and the boot crashes. Thanks, Tom > >> And nothing in here talks about why that GHCB method worked or didn't >> work before and that it is ok or not ok why we're axing that off. >> > > ---%<--- > The GHCB shared page method never worked for accepting memory from the > EFI stub, but this is rarely needed in practice: when using the higher > level page allocation APIs, the firmware will make sure that memory is > accepted before it is returned. The only use case for explicit memory > acceptance by the EFI stub is when populating the 'unaccepted memory' > bitmap, which tracks unaccepted memory at a 2MB granularity, and so > chunks of unaccepted memory that are misaligned wrt that are accepted > without being allocated or used. > ---%<--- > >> I'm somehow missing that aspect of why that change is warranted... >> > > This never worked correctly for SEV-SNP, we're just lucky the firmware > appears to accept memory in 2+ MB batches and so these misaligned > chunks are rare in practice. Tom did manage to trigger it IIUC by > giving a VM an amount of memory that is not a multiple of 2M.
On Thu, 17 Apr 2025 at 18:08, Tom Lendacky <thomas.lendacky@amd.com> wrote: > > On 4/11/25 14:00, Ard Biesheuvel wrote: > > On Fri, 11 Apr 2025 at 20:40, Borislav Petkov <bp@alien8.de> wrote: > >> > >> On Thu, Apr 10, 2025 at 03:28:51PM +0200, Ard Biesheuvel wrote: > >>> From: Ard Biesheuvel <ardb@kernel.org> > >>> > >>> Communicating with the hypervisor using the shared GHCB page requires > >>> clearing the C bit in the mapping of that page. When executing in the > >>> context of the EFI boot services, the page tables are owned by the > >>> firmware, and this manipulation is not possible. > >>> > >>> So switch to a different API for accepting memory in SEV-SNP guests, one > >> > >> That being the GHCB MSR protocol, it seems. > >> > > > > Yes. > > > >> And since Tom co-developed, I guess we wanna do that. > >> > >> But then how much slower do we become? > >> > > > > Non-EFI stub boot will become slower if the memory that is used to > > decompress the kernel has not been accepted yet. But given how heavily > > SEV-SNP depends on EFI boot, this typically only happens on kexec, as > > that is the only boot path that goes through the traditional > > decompressor. > > Some quick testing showed no significant differences in kexec booting > and testing shows everything seems to be good. > Thanks. > But, in testing with non-2M sized memory (e.g. a guest with 4097M of > memory) and without the change to how SNP is detected before > sev_enable() is called, we hit the error path in arch_accept_memory() in > arch/x86/boot/compressed/mem.c and the boot crashes. > Right. So this is because sev_snp_enabled() is based on sev_status, which has not been set yet at this point, right? And for the record, could you please indicate whether you are ok with the co-developed-by/signed-off-by credits on this patch (and subsequent revisions)?
On 4/17/25 11:14, Ard Biesheuvel wrote: > On Thu, 17 Apr 2025 at 18:08, Tom Lendacky <thomas.lendacky@amd.com> wrote: >> >> On 4/11/25 14:00, Ard Biesheuvel wrote: >>> On Fri, 11 Apr 2025 at 20:40, Borislav Petkov <bp@alien8.de> wrote: >>>> >>>> On Thu, Apr 10, 2025 at 03:28:51PM +0200, Ard Biesheuvel wrote: >>>>> From: Ard Biesheuvel <ardb@kernel.org> >>>>> >>>>> Communicating with the hypervisor using the shared GHCB page requires >>>>> clearing the C bit in the mapping of that page. When executing in the >>>>> context of the EFI boot services, the page tables are owned by the >>>>> firmware, and this manipulation is not possible. >>>>> >>>>> So switch to a different API for accepting memory in SEV-SNP guests, one >>>> >>>> That being the GHCB MSR protocol, it seems. >>>> >>> >>> Yes. >>> >>>> And since Tom co-developed, I guess we wanna do that. >>>> >>>> But then how much slower do we become? >>>> >>> >>> Non-EFI stub boot will become slower if the memory that is used to >>> decompress the kernel has not been accepted yet. But given how heavily >>> SEV-SNP depends on EFI boot, this typically only happens on kexec, as >>> that is the only boot path that goes through the traditional >>> decompressor. >> >> Some quick testing showed no significant differences in kexec booting >> and testing shows everything seems to be good. >> > > Thanks. > >> But, in testing with non-2M sized memory (e.g. a guest with 4097M of >> memory) and without the change to how SNP is detected before >> sev_enable() is called, we hit the error path in arch_accept_memory() in >> arch/x86/boot/compressed/mem.c and the boot crashes. >> > > Right. So this is because sev_snp_enabled() is based on sev_status, > which has not been set yet at this point, right? Correct. > > And for the record, could you please indicate whether you are ok with > the co-developed-by/signed-off-by credits on this patch (and > subsequent revisions)? Yep, I'm fine with that. Thanks, Tom
On Thu, 17 Apr 2025 at 18:21, Tom Lendacky <thomas.lendacky@amd.com> wrote:
>
> On 4/17/25 11:14, Ard Biesheuvel wrote:
> > On Thu, 17 Apr 2025 at 18:08, Tom Lendacky <thomas.lendacky@amd.com> wrote:
> >>
> >> On 4/11/25 14:00, Ard Biesheuvel wrote:
> >>> On Fri, 11 Apr 2025 at 20:40, Borislav Petkov <bp@alien8.de> wrote:
> >>>>
> >>>> On Thu, Apr 10, 2025 at 03:28:51PM +0200, Ard Biesheuvel wrote:
> >>>>> From: Ard Biesheuvel <ardb@kernel.org>
> >>>>>
> >>>>> Communicating with the hypervisor using the shared GHCB page requires
> >>>>> clearing the C bit in the mapping of that page. When executing in the
> >>>>> context of the EFI boot services, the page tables are owned by the
> >>>>> firmware, and this manipulation is not possible.
> >>>>>
> >>>>> So switch to a different API for accepting memory in SEV-SNP guests, one
> >>>>
> >>>> That being the GHCB MSR protocol, it seems.
> >>>>
> >>>
> >>> Yes.
> >>>
> >>>> And since Tom co-developed, I guess we wanna do that.
> >>>>
> >>>> But then how much slower do we become?
> >>>>
> >>>
> >>> Non-EFI stub boot will become slower if the memory that is used to
> >>> decompress the kernel has not been accepted yet. But given how heavily
> >>> SEV-SNP depends on EFI boot, this typically only happens on kexec, as
> >>> that is the only boot path that goes through the traditional
> >>> decompressor.
> >>
> >> Some quick testing showed no significant differences in kexec booting
> >> and testing shows everything seems to be good.
> >>
> >
> > Thanks.
> >
> >> But, in testing with non-2M sized memory (e.g. a guest with 4097M of
> >> memory) and without the change to how SNP is detected before
> >> sev_enable() is called, we hit the error path in arch_accept_memory() in
> >> arch/x86/boot/compressed/mem.c and the boot crashes.
> >>
> >
> > Right. So this is because sev_snp_enabled() is based on sev_status,
> > which has not been set yet at this point, right?
>
> Correct.
>
OK. Would this do the trick? (with asm/sev.h added to the #includes)
--- a/arch/x86/boot/compressed/mem.c
+++ b/arch/x86/boot/compressed/mem.c
@@ -34,11 +34,14 @@ static bool early_is_tdx_guest(void)
void arch_accept_memory(phys_addr_t start, phys_addr_t end)
{
+ static bool sevsnp;
+
/* Platform-specific memory-acceptance call goes here */
if (early_is_tdx_guest()) {
if (!tdx_accept_memory(start, end))
panic("TDX: Failed to accept memory\n");
- } else if (sev_snp_enabled()) {
+ } else if (sevsnp || (sev_get_status() & MSR_AMD64_SEV_SNP_ENABLED)) {
+ sevsnp = true;
snp_accept_memory(start, end);
} else {
error("Cannot accept memory: unknown platform\n");
> >
> > And for the record, could you please indicate whether you are ok with
> > the co-developed-by/signed-off-by credits on this patch (and
> > subsequent revisions)?
>
> Yep, I'm fine with that.
>
Thanks.
On 4/17/25 11:38, Ard Biesheuvel wrote:
> On Thu, 17 Apr 2025 at 18:21, Tom Lendacky <thomas.lendacky@amd.com> wrote:
>>
>> On 4/17/25 11:14, Ard Biesheuvel wrote:
>>> On Thu, 17 Apr 2025 at 18:08, Tom Lendacky <thomas.lendacky@amd.com> wrote:
>>>>
>>>> On 4/11/25 14:00, Ard Biesheuvel wrote:
>>>>> On Fri, 11 Apr 2025 at 20:40, Borislav Petkov <bp@alien8.de> wrote:
>>>>>>
>>>>>> On Thu, Apr 10, 2025 at 03:28:51PM +0200, Ard Biesheuvel wrote:
>>>>>>> From: Ard Biesheuvel <ardb@kernel.org>
>>>>>>>
>>>>>>> Communicating with the hypervisor using the shared GHCB page requires
>>>>>>> clearing the C bit in the mapping of that page. When executing in the
>>>>>>> context of the EFI boot services, the page tables are owned by the
>>>>>>> firmware, and this manipulation is not possible.
>>>>>>>
>>>>>>> So switch to a different API for accepting memory in SEV-SNP guests, one
>>>>>>
>>>>>> That being the GHCB MSR protocol, it seems.
>>>>>>
>>>>>
>>>>> Yes.
>>>>>
>>>>>> And since Tom co-developed, I guess we wanna do that.
>>>>>>
>>>>>> But then how much slower do we become?
>>>>>>
>>>>>
>>>>> Non-EFI stub boot will become slower if the memory that is used to
>>>>> decompress the kernel has not been accepted yet. But given how heavily
>>>>> SEV-SNP depends on EFI boot, this typically only happens on kexec, as
>>>>> that is the only boot path that goes through the traditional
>>>>> decompressor.
>>>>
>>>> Some quick testing showed no significant differences in kexec booting
>>>> and testing shows everything seems to be good.
>>>>
>>>
>>> Thanks.
>>>
>>>> But, in testing with non-2M sized memory (e.g. a guest with 4097M of
>>>> memory) and without the change to how SNP is detected before
>>>> sev_enable() is called, we hit the error path in arch_accept_memory() in
>>>> arch/x86/boot/compressed/mem.c and the boot crashes.
>>>>
>>>
>>> Right. So this is because sev_snp_enabled() is based on sev_status,
>>> which has not been set yet at this point, right?
>>
>> Correct.
>>
>
> OK. Would this do the trick? (with asm/sev.h added to the #includes)
Yes, that works for booting. Let me do some kexec testing and get back
to you. Sorry, that might not be until tomorrow, though.
Thanks,
Tom
>
> --- a/arch/x86/boot/compressed/mem.c
> +++ b/arch/x86/boot/compressed/mem.c
> @@ -34,11 +34,14 @@ static bool early_is_tdx_guest(void)
>
> void arch_accept_memory(phys_addr_t start, phys_addr_t end)
> {
> + static bool sevsnp;
> +
> /* Platform-specific memory-acceptance call goes here */
> if (early_is_tdx_guest()) {
> if (!tdx_accept_memory(start, end))
> panic("TDX: Failed to accept memory\n");
> - } else if (sev_snp_enabled()) {
> + } else if (sevsnp || (sev_get_status() & MSR_AMD64_SEV_SNP_ENABLED)) {
> + sevsnp = true;
> snp_accept_memory(start, end);
> } else {
> error("Cannot accept memory: unknown platform\n");
>
>>>
>>> And for the record, could you please indicate whether you are ok with
>>> the co-developed-by/signed-off-by credits on this patch (and
>>> subsequent revisions)?
>>
>> Yep, I'm fine with that.
>>
>
> Thanks.
On 4/17/25 12:26, Tom Lendacky wrote:
> On 4/17/25 11:38, Ard Biesheuvel wrote:
>> On Thu, 17 Apr 2025 at 18:21, Tom Lendacky <thomas.lendacky@amd.com> wrote:
>>>
>>> On 4/17/25 11:14, Ard Biesheuvel wrote:
>>>> On Thu, 17 Apr 2025 at 18:08, Tom Lendacky <thomas.lendacky@amd.com> wrote:
>>>>>
>>>>> On 4/11/25 14:00, Ard Biesheuvel wrote:
>>>>>> On Fri, 11 Apr 2025 at 20:40, Borislav Petkov <bp@alien8.de> wrote:
>>>>>>>
>>>>>>> On Thu, Apr 10, 2025 at 03:28:51PM +0200, Ard Biesheuvel wrote:
>>>>>>>> From: Ard Biesheuvel <ardb@kernel.org>
>>>>>>>>
>>>>>>>> Communicating with the hypervisor using the shared GHCB page requires
>>>>>>>> clearing the C bit in the mapping of that page. When executing in the
>>>>>>>> context of the EFI boot services, the page tables are owned by the
>>>>>>>> firmware, and this manipulation is not possible.
>>>>>>>>
>>>>>>>> So switch to a different API for accepting memory in SEV-SNP guests, one
>>>>>>>
>>>>>>> That being the GHCB MSR protocol, it seems.
>>>>>>>
>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>>> And since Tom co-developed, I guess we wanna do that.
>>>>>>>
>>>>>>> But then how much slower do we become?
>>>>>>>
>>>>>>
>>>>>> Non-EFI stub boot will become slower if the memory that is used to
>>>>>> decompress the kernel has not been accepted yet. But given how heavily
>>>>>> SEV-SNP depends on EFI boot, this typically only happens on kexec, as
>>>>>> that is the only boot path that goes through the traditional
>>>>>> decompressor.
>>>>>
>>>>> Some quick testing showed no significant differences in kexec booting
>>>>> and testing shows everything seems to be good.
>>>>>
>>>>
>>>> Thanks.
>>>>
>>>>> But, in testing with non-2M sized memory (e.g. a guest with 4097M of
>>>>> memory) and without the change to how SNP is detected before
>>>>> sev_enable() is called, we hit the error path in arch_accept_memory() in
>>>>> arch/x86/boot/compressed/mem.c and the boot crashes.
>>>>>
>>>>
>>>> Right. So this is because sev_snp_enabled() is based on sev_status,
>>>> which has not been set yet at this point, right?
>>>
>>> Correct.
>>>
>>
>> OK. Would this do the trick? (with asm/sev.h added to the #includes)
>
> Yes, that works for booting. Let me do some kexec testing and get back
> to you. Sorry, that might not be until tomorrow, though.
Ok, found some time... looks good with kexec, too.
Thanks,
Tom
>
> Thanks,
> Tom
>
>>
>> --- a/arch/x86/boot/compressed/mem.c
>> +++ b/arch/x86/boot/compressed/mem.c
>> @@ -34,11 +34,14 @@ static bool early_is_tdx_guest(void)
>>
>> void arch_accept_memory(phys_addr_t start, phys_addr_t end)
>> {
>> + static bool sevsnp;
>> +
>> /* Platform-specific memory-acceptance call goes here */
>> if (early_is_tdx_guest()) {
>> if (!tdx_accept_memory(start, end))
>> panic("TDX: Failed to accept memory\n");
>> - } else if (sev_snp_enabled()) {
>> + } else if (sevsnp || (sev_get_status() & MSR_AMD64_SEV_SNP_ENABLED)) {
>> + sevsnp = true;
>> snp_accept_memory(start, end);
>> } else {
>> error("Cannot accept memory: unknown platform\n");
>>
>>>>
>>>> And for the record, could you please indicate whether you are ok with
>>>> the co-developed-by/signed-off-by credits on this patch (and
>>>> subsequent revisions)?
>>>
>>> Yep, I'm fine with that.
>>>
>>
>> Thanks.
On Thu, 17 Apr 2025 at 22:01, Tom Lendacky <thomas.lendacky@amd.com> wrote: > > On 4/17/25 12:26, Tom Lendacky wrote: > > On 4/17/25 11:38, Ard Biesheuvel wrote: > >> On Thu, 17 Apr 2025 at 18:21, Tom Lendacky <thomas.lendacky@amd.com> wrote: > >>> > >>> On 4/17/25 11:14, Ard Biesheuvel wrote: > >>>> On Thu, 17 Apr 2025 at 18:08, Tom Lendacky <thomas.lendacky@amd.com> wrote: > >>>>> > >>>>> On 4/11/25 14:00, Ard Biesheuvel wrote: > >>>>>> On Fri, 11 Apr 2025 at 20:40, Borislav Petkov <bp@alien8.de> wrote: > >>>>>>> > >>>>>>> On Thu, Apr 10, 2025 at 03:28:51PM +0200, Ard Biesheuvel wrote: > >>>>>>>> From: Ard Biesheuvel <ardb@kernel.org> > >>>>>>>> > >>>>>>>> Communicating with the hypervisor using the shared GHCB page requires > >>>>>>>> clearing the C bit in the mapping of that page. When executing in the > >>>>>>>> context of the EFI boot services, the page tables are owned by the > >>>>>>>> firmware, and this manipulation is not possible. > >>>>>>>> > >>>>>>>> So switch to a different API for accepting memory in SEV-SNP guests, one > >>>>>>> > >>>>>>> That being the GHCB MSR protocol, it seems. > >>>>>>> > >>>>>> > >>>>>> Yes. > >>>>>> > >>>>>>> And since Tom co-developed, I guess we wanna do that. > >>>>>>> > >>>>>>> But then how much slower do we become? > >>>>>>> > >>>>>> > >>>>>> Non-EFI stub boot will become slower if the memory that is used to > >>>>>> decompress the kernel has not been accepted yet. But given how heavily > >>>>>> SEV-SNP depends on EFI boot, this typically only happens on kexec, as > >>>>>> that is the only boot path that goes through the traditional > >>>>>> decompressor. > >>>>> > >>>>> Some quick testing showed no significant differences in kexec booting > >>>>> and testing shows everything seems to be good. > >>>>> > >>>> > >>>> Thanks. > >>>> > >>>>> But, in testing with non-2M sized memory (e.g. a guest with 4097M of > >>>>> memory) and without the change to how SNP is detected before > >>>>> sev_enable() is called, we hit the error path in arch_accept_memory() in > >>>>> arch/x86/boot/compressed/mem.c and the boot crashes. > >>>>> > >>>> > >>>> Right. So this is because sev_snp_enabled() is based on sev_status, > >>>> which has not been set yet at this point, right? > >>> > >>> Correct. > >>> > >> > >> OK. Would this do the trick? (with asm/sev.h added to the #includes) > > > > Yes, that works for booting. Let me do some kexec testing and get back > > to you. Sorry, that might not be until tomorrow, though. > > Ok, found some time... looks good with kexec, too. > Thanks!
* Ard Biesheuvel <ardb@kernel.org> wrote: > > >> OK. Would this do the trick? (with asm/sev.h added to the #includes) > > > > > > Yes, that works for booting. Let me do some kexec testing and get back > > > to you. Sorry, that might not be until tomorrow, though. > > > > Ok, found some time... looks good with kexec, too. > > > > Thanks! I've turned this into this tag for the -v4 patch: Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Thanks, Ingo
On Fri, Apr 11, 2025 at 09:00:51PM +0200, Ard Biesheuvel wrote: > ---%<--- > The GHCB shared page method never worked for accepting memory from the > EFI stub, but this is rarely needed in practice: when using the higher > level page allocation APIs, the firmware will make sure that memory is > accepted before it is returned. The only use case for explicit memory > acceptance by the EFI stub is when populating the 'unaccepted memory' > bitmap, which tracks unaccepted memory at a 2MB granularity, and so > chunks of unaccepted memory that are misaligned wrt that are accepted > without being allocated or used. > ---%<--- > > > I'm somehow missing that aspect of why that change is warranted... > > > > This never worked correctly for SEV-SNP, we're just lucky the firmware > appears to accept memory in 2+ MB batches and so these misaligned > chunks are rare in practice. Tom did manage to trigger it IIUC by > giving a VM an amount of memory that is not a multiple of 2M. Can you pls put that lenghty explanation in the commit message? It is useful and we might go back to it in the future. Thx.
The following commit has been merged into the x86/boot branch of tip:
Commit-ID: 5871c9d650bc0ccb16ed8444f16045808ba936bf
Gitweb: https://git.kernel.org/tip/5871c9d650bc0ccb16ed8444f16045808ba936bf
Author: Ard Biesheuvel <ardb@kernel.org>
AuthorDate: Thu, 10 Apr 2025 15:28:51 +02:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 12 Apr 2025 20:43:41 +02:00
x86/boot/sev: Avoid shared GHCB page for early memory acceptance
Communicating with the hypervisor using the shared GHCB page requires
clearing the C bit in the mapping of that page. When executing in the
context of the EFI boot services, the page tables are owned by the
firmware, and this manipulation is not possible.
So switch to a different API for accepting memory in SEV-SNP guests, one
which is actually supported at the point during boot where the EFI stub
may need to accept memory, but the SEV-SNP init code has not executed
yet.
For simplicity, also switch the memory acceptance carried out by the
decompressor when not booting via EFI - this only involves the
allocation for the decompressed kernel, and is generally only called
after kexec, as normal boot will jump straight into the kernel from the
EFI stub.
Non-EFI stub boot will become slower if the memory that is used to
decompress the kernel has not been accepted yet. But given how heavily
SEV-SNP depends on EFI boot, this typically only happens on kexec, as
that is the only boot path that goes through the traditional
decompressor.
The GHCB shared page method never worked for accepting memory from the
EFI stub, but this is rarely needed in practice: when using the higher
level page allocation APIs, the firmware will make sure that memory is
accepted before it is returned. The only use case for explicit memory
acceptance by the EFI stub is when populating the 'unaccepted memory'
bitmap, which tracks unaccepted memory at a 2MB granularity, and so
chunks of unaccepted memory that are misaligned wrt that are accepted
without being allocated or used.
AFAICS this never worked correctly for SEV-SNP, we're just lucky
the firmware appears to accept memory in 2+ MB batches and so
these misaligned chunks are rare in practice. Tom did manage to
trigger it IIUC by giving a VM an amount of memory that is not a
multiple of 2M.
Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Dionna Amalie Glaze <dionnaglaze@google.com>
Cc: Kevin Loughlin <kevinloughlin@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: linux-efi@vger.kernel.org
Link: https://lore.kernel.org/r/20250410132850.3708703-2-ardb+git@google.com
---
arch/x86/boot/compressed/sev.c | 67 +++++++--------------------------
1 file changed, 15 insertions(+), 52 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 6eadd79..478eca4 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -169,10 +169,7 @@ bool sev_snp_enabled(void)
static void __page_state_change(unsigned long paddr, enum psc_op op)
{
- u64 val;
-
- if (!sev_snp_enabled())
- return;
+ u64 val, msr;
/*
* If private -> shared then invalidate the page before requesting the
@@ -181,6 +178,9 @@ static void __page_state_change(unsigned long paddr, enum psc_op op)
if (op == SNP_PAGE_STATE_SHARED)
pvalidate_4k_page(paddr, paddr, false);
+ /* Save the current GHCB MSR value */
+ msr = sev_es_rd_ghcb_msr();
+
/* Issue VMGEXIT to change the page state in RMP table. */
sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
VMGEXIT();
@@ -190,6 +190,9 @@ static void __page_state_change(unsigned long paddr, enum psc_op op)
if ((GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP) || GHCB_MSR_PSC_RESP_VAL(val))
sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
+ /* Restore the GHCB MSR value */
+ sev_es_wr_ghcb_msr(msr);
+
/*
* Now that page state is changed in the RMP table, validate it so that it is
* consistent with the RMP entry.
@@ -200,11 +203,17 @@ static void __page_state_change(unsigned long paddr, enum psc_op op)
void snp_set_page_private(unsigned long paddr)
{
+ if (!sev_snp_enabled())
+ return;
+
__page_state_change(paddr, SNP_PAGE_STATE_PRIVATE);
}
void snp_set_page_shared(unsigned long paddr)
{
+ if (!sev_snp_enabled())
+ return;
+
__page_state_change(paddr, SNP_PAGE_STATE_SHARED);
}
@@ -228,56 +237,10 @@ static bool early_setup_ghcb(void)
return true;
}
-static phys_addr_t __snp_accept_memory(struct snp_psc_desc *desc,
- phys_addr_t pa, phys_addr_t pa_end)
-{
- struct psc_hdr *hdr;
- struct psc_entry *e;
- unsigned int i;
-
- hdr = &desc->hdr;
- memset(hdr, 0, sizeof(*hdr));
-
- e = desc->entries;
-
- i = 0;
- while (pa < pa_end && i < VMGEXIT_PSC_MAX_ENTRY) {
- hdr->end_entry = i;
-
- e->gfn = pa >> PAGE_SHIFT;
- e->operation = SNP_PAGE_STATE_PRIVATE;
- if (IS_ALIGNED(pa, PMD_SIZE) && (pa_end - pa) >= PMD_SIZE) {
- e->pagesize = RMP_PG_SIZE_2M;
- pa += PMD_SIZE;
- } else {
- e->pagesize = RMP_PG_SIZE_4K;
- pa += PAGE_SIZE;
- }
-
- e++;
- i++;
- }
-
- if (vmgexit_psc(boot_ghcb, desc))
- sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
-
- pvalidate_pages(desc);
-
- return pa;
-}
-
void snp_accept_memory(phys_addr_t start, phys_addr_t end)
{
- struct snp_psc_desc desc = {};
- unsigned int i;
- phys_addr_t pa;
-
- if (!boot_ghcb && !early_setup_ghcb())
- sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
-
- pa = start;
- while (pa < end)
- pa = __snp_accept_memory(&desc, pa, end);
+ for (phys_addr_t pa = start; pa < end; pa += PAGE_SIZE)
+ __page_state_change(pa, SNP_PAGE_STATE_PRIVATE);
}
void sev_es_shutdown_ghcb(void)
On Sat, 12 Apr 2025 at 22:29, tip-bot2 for Ard Biesheuvel <tip-bot2@linutronix.de> wrote: > > The following commit has been merged into the x86/boot branch of tip: > This may be slightly premature. I took some of Tom's code, hence the co-developed-by, but the should really confirm that what I did is correct before we queue this up.
* Ard Biesheuvel <ardb@kernel.org> wrote: > On Sat, 12 Apr 2025 at 22:29, tip-bot2 for Ard Biesheuvel > <tip-bot2@linutronix.de> wrote: > > > > The following commit has been merged into the x86/boot branch of tip: > > > > This may be slightly premature. I took some of Tom's code, hence the > co-developed-by, but the should really confirm that what I did is > correct before we queue this up. OK, I've zapped it again, especially as the rest of the series wasn't ready either, please include the latest version of this patch as part of the boot/setup/ series, which hard-relies upon it. Thanks, Ingo
On Sat, 12 Apr 2025 at 22:45, Ingo Molnar <mingo@kernel.org> wrote: > > > * Ard Biesheuvel <ardb@kernel.org> wrote: > > > On Sat, 12 Apr 2025 at 22:29, tip-bot2 for Ard Biesheuvel > > <tip-bot2@linutronix.de> wrote: > > > > > > The following commit has been merged into the x86/boot branch of tip: > > > > > > > This may be slightly premature. I took some of Tom's code, hence the > > co-developed-by, but the should really confirm that what I did is > > correct before we queue this up. > > OK, I've zapped it again, especially as the rest of the series wasn't > ready either, please include the latest version of this patch as part > of the boot/setup/ series, which hard-relies upon it. > I have sent out a v4 here [0]. I am not including it in the next rev of the startup/ refactor series, given that this change is a fix that also needs to go to stable. Please apply it as a fix and merge back the branch into tip/x86/boot - I will rebase the startup/ refactor series on top of that. Thanks, [0] https://lore.kernel.org/linux-efi/20250417202120.1002102-2-ardb+git@google.com/T/#u
* Ard Biesheuvel <ardb@kernel.org> wrote: > On Sat, 12 Apr 2025 at 22:45, Ingo Molnar <mingo@kernel.org> wrote: > > > > > > * Ard Biesheuvel <ardb@kernel.org> wrote: > > > > > On Sat, 12 Apr 2025 at 22:29, tip-bot2 for Ard Biesheuvel > > > <tip-bot2@linutronix.de> wrote: > > > > > > > > The following commit has been merged into the x86/boot branch of tip: > > > > > > > > > > This may be slightly premature. I took some of Tom's code, hence the > > > co-developed-by, but the should really confirm that what I did is > > > correct before we queue this up. > > > > OK, I've zapped it again, especially as the rest of the series wasn't > > ready either, please include the latest version of this patch as part > > of the boot/setup/ series, which hard-relies upon it. > > > > I have sent out a v4 here [0]. > > I am not including it in the next rev of the startup/ refactor series, > given that this change is a fix that also needs to go to stable. > Please apply it as a fix and merge back the branch into tip/x86/boot - > I will rebase the startup/ refactor series on top of that. > > Thanks, > > [0] https://lore.kernel.org/linux-efi/20250417202120.1002102-2-ardb+git@google.com/T/#u Noted, thanks for the heads up! Ingo
* Ingo Molnar <mingo@kernel.org> wrote:
>
> * Ard Biesheuvel <ardb@kernel.org> wrote:
>
> > On Sat, 12 Apr 2025 at 22:45, Ingo Molnar <mingo@kernel.org> wrote:
> > >
> > >
> > > * Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > > On Sat, 12 Apr 2025 at 22:29, tip-bot2 for Ard Biesheuvel
> > > > <tip-bot2@linutronix.de> wrote:
> > > > >
> > > > > The following commit has been merged into the x86/boot branch of tip:
> > > > >
> > > >
> > > > This may be slightly premature. I took some of Tom's code, hence the
> > > > co-developed-by, but the should really confirm that what I did is
> > > > correct before we queue this up.
> > >
> > > OK, I've zapped it again, especially as the rest of the series wasn't
> > > ready either, please include the latest version of this patch as part
> > > of the boot/setup/ series, which hard-relies upon it.
> > >
> >
> > I have sent out a v4 here [0].
> >
> > I am not including it in the next rev of the startup/ refactor series,
> > given that this change is a fix that also needs to go to stable.
> > Please apply it as a fix and merge back the branch into tip/x86/boot -
> > I will rebase the startup/ refactor series on top of that.
> >
> > Thanks,
> >
> > [0] https://lore.kernel.org/linux-efi/20250417202120.1002102-2-ardb+git@google.com/T/#u
>
> Noted, thanks for the heads up!
So there's this new merge commit in tip:x86/boot:
54f71aa90c84 Merge branch 'x86/urgent' into x86/boot, to merge dependent commit and fixes
which brings this fix into x86/boot:
a718833cb456 ("x86/boot/sev: Avoid shared GHCB page for early memory acceptance")
thus 54f71aa90c84 should be a proper base the rest of the startup/
series, right?
Thanks,
Ingo
On Fri, 18 Apr 2025 at 10:14, Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Ingo Molnar <mingo@kernel.org> wrote:
>
...
>
> So there's this new merge commit in tip:x86/boot:
>
> 54f71aa90c84 Merge branch 'x86/urgent' into x86/boot, to merge dependent commit and fixes
>
> which brings this fix into x86/boot:
>
> a718833cb456 ("x86/boot/sev: Avoid shared GHCB page for early memory acceptance")
>
> thus 54f71aa90c84 should be a proper base the rest of the startup/
> series, right?
>
Yes, perfect. Thanks.
On Sat, 12 Apr 2025 at 22:45, Ingo Molnar <mingo@kernel.org> wrote: > > > * Ard Biesheuvel <ardb@kernel.org> wrote: > > > On Sat, 12 Apr 2025 at 22:29, tip-bot2 for Ard Biesheuvel > > <tip-bot2@linutronix.de> wrote: > > > > > > The following commit has been merged into the x86/boot branch of tip: > > > > > > > This may be slightly premature. I took some of Tom's code, hence the > > co-developed-by, but the should really confirm that what I did is > > correct before we queue this up. > > OK, I've zapped it again, especially as the rest of the series wasn't > ready either, please include the latest version of this patch as part > of the boot/setup/ series, which hard-relies upon it. > OK
© 2016 - 2026 Red Hat, Inc.