[v3] x86/boot/sev: Avoid shared GHCB page for early memory acceptance

[PATCH v3] x86/boot/sev: Avoid shared GHCB page for early memory acceptance

Posted by Ard Biesheuvel 10 months ago

From: Ard Biesheuvel <ardb@kernel.org>

Communicating with the hypervisor using the shared GHCB page requires
clearing the C bit in the mapping of that page. When executing in the
context of the EFI boot services, the page tables are owned by the
firmware, and this manipulation is not possible.

So switch to a different API for accepting memory in SEV-SNP guests, one
which is actually supported at the point during boot where the EFI stub
may need to accept memory, but the SEV-SNP init code has not executed
yet.

For simplicity, also switch the memory acceptance carried out by the
decompressor when not booting via EFI - this only involves the
allocation for the decompressed kernel, and is generally only called
after kexec, as normal boot will jump straight into the kernel from the
EFI stub.

Cc: Tom Lendacky <thomas.lendacky@amd.com>,
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>,
Cc: Dionna Amalie Glaze <dionnaglaze@google.com>,
Cc: Kevin Loughlin <kevinloughlin@google.com>
Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
Changes since v2 [0]:
- avoid two separate acceptance APIs; instead, use MSR based page-by-page
  acceptance for the decompressor as well

[0] https://lore.kernel.org/all/20250404082921.2767593-8-ardb+git@google.com/T/#m38389f607accd0cfa83c41c3bd0d410514b023c6

 arch/x86/boot/compressed/sev.c | 67 +++++---------------
 1 file changed, 15 insertions(+), 52 deletions(-)

diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index bb55934c1cee..89ba168f4f0f 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -164,10 +164,7 @@ bool sev_snp_enabled(void)
 
 static void __page_state_change(unsigned long paddr, enum psc_op op)
 {
-	u64 val;
-
-	if (!sev_snp_enabled())
-		return;
+	u64 val, msr;
 
 	/*
 	 * If private -> shared then invalidate the page before requesting the
@@ -176,6 +173,9 @@ static void __page_state_change(unsigned long paddr, enum psc_op op)
 	if (op == SNP_PAGE_STATE_SHARED)
 		pvalidate_4k_page(paddr, paddr, false);
 
+	/* Save the current GHCB MSR value */
+	msr = sev_es_rd_ghcb_msr();
+
 	/* Issue VMGEXIT to change the page state in RMP table. */
 	sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
 	VMGEXIT();
@@ -185,6 +185,9 @@ static void __page_state_change(unsigned long paddr, enum psc_op op)
 	if ((GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP) || GHCB_MSR_PSC_RESP_VAL(val))
 		sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
 
+	/* Restore the GHCB MSR value */
+	sev_es_wr_ghcb_msr(msr);
+
 	/*
 	 * Now that page state is changed in the RMP table, validate it so that it is
 	 * consistent with the RMP entry.
@@ -195,11 +198,17 @@ static void __page_state_change(unsigned long paddr, enum psc_op op)
 
 void snp_set_page_private(unsigned long paddr)
 {
+	if (!sev_snp_enabled())
+		return;
+
 	__page_state_change(paddr, SNP_PAGE_STATE_PRIVATE);
 }
 
 void snp_set_page_shared(unsigned long paddr)
 {
+	if (!sev_snp_enabled())
+		return;
+
 	__page_state_change(paddr, SNP_PAGE_STATE_SHARED);
 }
 
@@ -223,56 +232,10 @@ static bool early_setup_ghcb(void)
 	return true;
 }
 
-static phys_addr_t __snp_accept_memory(struct snp_psc_desc *desc,
-				       phys_addr_t pa, phys_addr_t pa_end)
-{
-	struct psc_hdr *hdr;
-	struct psc_entry *e;
-	unsigned int i;
-
-	hdr = &desc->hdr;
-	memset(hdr, 0, sizeof(*hdr));
-
-	e = desc->entries;
-
-	i = 0;
-	while (pa < pa_end && i < VMGEXIT_PSC_MAX_ENTRY) {
-		hdr->end_entry = i;
-
-		e->gfn = pa >> PAGE_SHIFT;
-		e->operation = SNP_PAGE_STATE_PRIVATE;
-		if (IS_ALIGNED(pa, PMD_SIZE) && (pa_end - pa) >= PMD_SIZE) {
-			e->pagesize = RMP_PG_SIZE_2M;
-			pa += PMD_SIZE;
-		} else {
-			e->pagesize = RMP_PG_SIZE_4K;
-			pa += PAGE_SIZE;
-		}
-
-		e++;
-		i++;
-	}
-
-	if (vmgexit_psc(boot_ghcb, desc))
-		sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
-
-	pvalidate_pages(desc);
-
-	return pa;
-}
-
 void snp_accept_memory(phys_addr_t start, phys_addr_t end)
 {
-	struct snp_psc_desc desc = {};
-	unsigned int i;
-	phys_addr_t pa;
-
-	if (!boot_ghcb && !early_setup_ghcb())
-		sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
-
-	pa = start;
-	while (pa < end)
-		pa = __snp_accept_memory(&desc, pa, end);
+	for (phys_addr_t pa = start; pa < end; pa += PAGE_SIZE)
+		__page_state_change(pa, SNP_PAGE_STATE_PRIVATE);
 }
 
 void sev_es_shutdown_ghcb(void)
-- 
2.49.0.504.g3bcea36a83-goog

Re: [PATCH v3] x86/boot/sev: Avoid shared GHCB page for early memory acceptance

Posted by Borislav Petkov 10 months ago

On Thu, Apr 10, 2025 at 03:28:51PM +0200, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
> 
> Communicating with the hypervisor using the shared GHCB page requires
> clearing the C bit in the mapping of that page. When executing in the
> context of the EFI boot services, the page tables are owned by the
> firmware, and this manipulation is not possible.
> 
> So switch to a different API for accepting memory in SEV-SNP guests, one

That being the GHCB MSR protocol, it seems.

And since Tom co-developed, I guess we wanna do that.

But then how much slower do we become?

And nothing in here talks about why that GHCB method worked or didn't
work before and that it is ok or not ok why we're axing that off.

I'm somehow missing that aspect of why that change is warranted...

Thx.

Re: [PATCH v3] x86/boot/sev: Avoid shared GHCB page for early memory acceptance

Posted by Ard Biesheuvel 10 months ago

On Fri, 11 Apr 2025 at 20:40, Borislav Petkov <bp@alien8.de> wrote:
>
> On Thu, Apr 10, 2025 at 03:28:51PM +0200, Ard Biesheuvel wrote:
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > Communicating with the hypervisor using the shared GHCB page requires
> > clearing the C bit in the mapping of that page. When executing in the
> > context of the EFI boot services, the page tables are owned by the
> > firmware, and this manipulation is not possible.
> >
> > So switch to a different API for accepting memory in SEV-SNP guests, one
>
> That being the GHCB MSR protocol, it seems.
>

Yes.

> And since Tom co-developed, I guess we wanna do that.
>
> But then how much slower do we become?
>

Non-EFI stub boot will become slower if the memory that is used to
decompress the kernel has not been accepted yet. But given how heavily
SEV-SNP depends on EFI boot, this typically only happens on kexec, as
that is the only boot path that goes through the traditional
decompressor.

> And nothing in here talks about why that GHCB method worked or didn't
> work before and that it is ok or not ok why we're axing that off.
>

---%<---
The GHCB shared page method never worked for accepting memory from the
EFI stub, but this is rarely needed in practice: when using the higher
level page allocation APIs, the firmware will make sure that memory is
accepted before it is returned. The only use case for explicit memory
acceptance by the EFI stub is when populating the 'unaccepted memory'
bitmap, which tracks unaccepted memory at a 2MB granularity, and so
chunks of unaccepted memory that are misaligned wrt that are accepted
without being allocated or used.
---%<---

> I'm somehow missing that aspect of why that change is warranted...
>

This never worked correctly for SEV-SNP, we're just lucky the firmware
appears to accept memory in 2+ MB batches and so these misaligned
chunks are rare in practice. Tom did manage to trigger it IIUC by
giving a VM an amount of memory that is not a multiple of 2M.

Re: [PATCH v3] x86/boot/sev: Avoid shared GHCB page for early memory acceptance

Posted by Tom Lendacky 9 months, 3 weeks ago

On 4/11/25 14:00, Ard Biesheuvel wrote:
> On Fri, 11 Apr 2025 at 20:40, Borislav Petkov <bp@alien8.de> wrote:
>>
>> On Thu, Apr 10, 2025 at 03:28:51PM +0200, Ard Biesheuvel wrote:
>>> From: Ard Biesheuvel <ardb@kernel.org>
>>>
>>> Communicating with the hypervisor using the shared GHCB page requires
>>> clearing the C bit in the mapping of that page. When executing in the
>>> context of the EFI boot services, the page tables are owned by the
>>> firmware, and this manipulation is not possible.
>>>
>>> So switch to a different API for accepting memory in SEV-SNP guests, one
>>
>> That being the GHCB MSR protocol, it seems.
>>
> 
> Yes.
> 
>> And since Tom co-developed, I guess we wanna do that.
>>
>> But then how much slower do we become?
>>
> 
> Non-EFI stub boot will become slower if the memory that is used to
> decompress the kernel has not been accepted yet. But given how heavily
> SEV-SNP depends on EFI boot, this typically only happens on kexec, as
> that is the only boot path that goes through the traditional
> decompressor.

Some quick testing showed no significant differences in kexec booting
and testing shows everything seems to be good.

But, in testing with non-2M sized memory (e.g. a guest with 4097M of
memory) and without the change to how SNP is detected before
sev_enable() is called, we hit the error path in arch_accept_memory() in
arch/x86/boot/compressed/mem.c and the boot crashes.

Thanks,
Tom

> 
>> And nothing in here talks about why that GHCB method worked or didn't
>> work before and that it is ok or not ok why we're axing that off.
>>
> 
> ---%<---
> The GHCB shared page method never worked for accepting memory from the
> EFI stub, but this is rarely needed in practice: when using the higher
> level page allocation APIs, the firmware will make sure that memory is
> accepted before it is returned. The only use case for explicit memory
> acceptance by the EFI stub is when populating the 'unaccepted memory'
> bitmap, which tracks unaccepted memory at a 2MB granularity, and so
> chunks of unaccepted memory that are misaligned wrt that are accepted
> without being allocated or used.
> ---%<---
> 
>> I'm somehow missing that aspect of why that change is warranted...
>>
> 
> This never worked correctly for SEV-SNP, we're just lucky the firmware
> appears to accept memory in 2+ MB batches and so these misaligned
> chunks are rare in practice. Tom did manage to trigger it IIUC by
> giving a VM an amount of memory that is not a multiple of 2M.

Re: [PATCH v3] x86/boot/sev: Avoid shared GHCB page for early memory acceptance

Posted by Ard Biesheuvel 9 months, 3 weeks ago

On Thu, 17 Apr 2025 at 18:08, Tom Lendacky <thomas.lendacky@amd.com> wrote:
>
> On 4/11/25 14:00, Ard Biesheuvel wrote:
> > On Fri, 11 Apr 2025 at 20:40, Borislav Petkov <bp@alien8.de> wrote:
> >>
> >> On Thu, Apr 10, 2025 at 03:28:51PM +0200, Ard Biesheuvel wrote:
> >>> From: Ard Biesheuvel <ardb@kernel.org>
> >>>
> >>> Communicating with the hypervisor using the shared GHCB page requires
> >>> clearing the C bit in the mapping of that page. When executing in the
> >>> context of the EFI boot services, the page tables are owned by the
> >>> firmware, and this manipulation is not possible.
> >>>
> >>> So switch to a different API for accepting memory in SEV-SNP guests, one
> >>
> >> That being the GHCB MSR protocol, it seems.
> >>
> >
> > Yes.
> >
> >> And since Tom co-developed, I guess we wanna do that.
> >>
> >> But then how much slower do we become?
> >>
> >
> > Non-EFI stub boot will become slower if the memory that is used to
> > decompress the kernel has not been accepted yet. But given how heavily
> > SEV-SNP depends on EFI boot, this typically only happens on kexec, as
> > that is the only boot path that goes through the traditional
> > decompressor.
>
> Some quick testing showed no significant differences in kexec booting
> and testing shows everything seems to be good.
>

Thanks.

> But, in testing with non-2M sized memory (e.g. a guest with 4097M of
> memory) and without the change to how SNP is detected before
> sev_enable() is called, we hit the error path in arch_accept_memory() in
> arch/x86/boot/compressed/mem.c and the boot crashes.
>

Right. So this is because sev_snp_enabled() is based on sev_status,
which has not been set yet at this point, right?

And for the record, could you please indicate whether you are ok with
the co-developed-by/signed-off-by credits on this patch (and
subsequent revisions)?

Re: [PATCH v3] x86/boot/sev: Avoid shared GHCB page for early memory acceptance

Posted by Tom Lendacky 9 months, 3 weeks ago

On 4/17/25 11:14, Ard Biesheuvel wrote:
> On Thu, 17 Apr 2025 at 18:08, Tom Lendacky <thomas.lendacky@amd.com> wrote:
>>
>> On 4/11/25 14:00, Ard Biesheuvel wrote:
>>> On Fri, 11 Apr 2025 at 20:40, Borislav Petkov <bp@alien8.de> wrote:
>>>>
>>>> On Thu, Apr 10, 2025 at 03:28:51PM +0200, Ard Biesheuvel wrote:
>>>>> From: Ard Biesheuvel <ardb@kernel.org>
>>>>>
>>>>> Communicating with the hypervisor using the shared GHCB page requires
>>>>> clearing the C bit in the mapping of that page. When executing in the
>>>>> context of the EFI boot services, the page tables are owned by the
>>>>> firmware, and this manipulation is not possible.
>>>>>
>>>>> So switch to a different API for accepting memory in SEV-SNP guests, one
>>>>
>>>> That being the GHCB MSR protocol, it seems.
>>>>
>>>
>>> Yes.
>>>
>>>> And since Tom co-developed, I guess we wanna do that.
>>>>
>>>> But then how much slower do we become?
>>>>
>>>
>>> Non-EFI stub boot will become slower if the memory that is used to
>>> decompress the kernel has not been accepted yet. But given how heavily
>>> SEV-SNP depends on EFI boot, this typically only happens on kexec, as
>>> that is the only boot path that goes through the traditional
>>> decompressor.
>>
>> Some quick testing showed no significant differences in kexec booting
>> and testing shows everything seems to be good.
>>
> 
> Thanks.
> 
>> But, in testing with non-2M sized memory (e.g. a guest with 4097M of
>> memory) and without the change to how SNP is detected before
>> sev_enable() is called, we hit the error path in arch_accept_memory() in
>> arch/x86/boot/compressed/mem.c and the boot crashes.
>>
> 
> Right. So this is because sev_snp_enabled() is based on sev_status,
> which has not been set yet at this point, right?

Correct.

> 
> And for the record, could you please indicate whether you are ok with
> the co-developed-by/signed-off-by credits on this patch (and
> subsequent revisions)?

Yep, I'm fine with that.

Thanks,
Tom

Re: [PATCH v3] x86/boot/sev: Avoid shared GHCB page for early memory acceptance

Posted by Ard Biesheuvel 9 months, 3 weeks ago

On Thu, 17 Apr 2025 at 18:21, Tom Lendacky <thomas.lendacky@amd.com> wrote:
>
> On 4/17/25 11:14, Ard Biesheuvel wrote:
> > On Thu, 17 Apr 2025 at 18:08, Tom Lendacky <thomas.lendacky@amd.com> wrote:
> >>
> >> On 4/11/25 14:00, Ard Biesheuvel wrote:
> >>> On Fri, 11 Apr 2025 at 20:40, Borislav Petkov <bp@alien8.de> wrote:
> >>>>
> >>>> On Thu, Apr 10, 2025 at 03:28:51PM +0200, Ard Biesheuvel wrote:
> >>>>> From: Ard Biesheuvel <ardb@kernel.org>
> >>>>>
> >>>>> Communicating with the hypervisor using the shared GHCB page requires
> >>>>> clearing the C bit in the mapping of that page. When executing in the
> >>>>> context of the EFI boot services, the page tables are owned by the
> >>>>> firmware, and this manipulation is not possible.
> >>>>>
> >>>>> So switch to a different API for accepting memory in SEV-SNP guests, one
> >>>>
> >>>> That being the GHCB MSR protocol, it seems.
> >>>>
> >>>
> >>> Yes.
> >>>
> >>>> And since Tom co-developed, I guess we wanna do that.
> >>>>
> >>>> But then how much slower do we become?
> >>>>
> >>>
> >>> Non-EFI stub boot will become slower if the memory that is used to
> >>> decompress the kernel has not been accepted yet. But given how heavily
> >>> SEV-SNP depends on EFI boot, this typically only happens on kexec, as
> >>> that is the only boot path that goes through the traditional
> >>> decompressor.
> >>
> >> Some quick testing showed no significant differences in kexec booting
> >> and testing shows everything seems to be good.
> >>
> >
> > Thanks.
> >
> >> But, in testing with non-2M sized memory (e.g. a guest with 4097M of
> >> memory) and without the change to how SNP is detected before
> >> sev_enable() is called, we hit the error path in arch_accept_memory() in
> >> arch/x86/boot/compressed/mem.c and the boot crashes.
> >>
> >
> > Right. So this is because sev_snp_enabled() is based on sev_status,
> > which has not been set yet at this point, right?
>
> Correct.
>

OK. Would this do the trick? (with asm/sev.h added to the #includes)

--- a/arch/x86/boot/compressed/mem.c
+++ b/arch/x86/boot/compressed/mem.c
@@ -34,11 +34,14 @@ static bool early_is_tdx_guest(void)

 void arch_accept_memory(phys_addr_t start, phys_addr_t end)
 {
+       static bool sevsnp;
+
        /* Platform-specific memory-acceptance call goes here */
        if (early_is_tdx_guest()) {
                if (!tdx_accept_memory(start, end))
                        panic("TDX: Failed to accept memory\n");
-       } else if (sev_snp_enabled()) {
+       } else if (sevsnp || (sev_get_status() & MSR_AMD64_SEV_SNP_ENABLED)) {
+               sevsnp = true;
                snp_accept_memory(start, end);
        } else {
                error("Cannot accept memory: unknown platform\n");

> >
> > And for the record, could you please indicate whether you are ok with
> > the co-developed-by/signed-off-by credits on this patch (and
> > subsequent revisions)?
>
> Yep, I'm fine with that.
>

Thanks.

Re: [PATCH v3] x86/boot/sev: Avoid shared GHCB page for early memory acceptance

Posted by Tom Lendacky 9 months, 3 weeks ago

On 4/17/25 11:38, Ard Biesheuvel wrote:
> On Thu, 17 Apr 2025 at 18:21, Tom Lendacky <thomas.lendacky@amd.com> wrote:
>>
>> On 4/17/25 11:14, Ard Biesheuvel wrote:
>>> On Thu, 17 Apr 2025 at 18:08, Tom Lendacky <thomas.lendacky@amd.com> wrote:
>>>>
>>>> On 4/11/25 14:00, Ard Biesheuvel wrote:
>>>>> On Fri, 11 Apr 2025 at 20:40, Borislav Petkov <bp@alien8.de> wrote:
>>>>>>
>>>>>> On Thu, Apr 10, 2025 at 03:28:51PM +0200, Ard Biesheuvel wrote:
>>>>>>> From: Ard Biesheuvel <ardb@kernel.org>
>>>>>>>
>>>>>>> Communicating with the hypervisor using the shared GHCB page requires
>>>>>>> clearing the C bit in the mapping of that page. When executing in the
>>>>>>> context of the EFI boot services, the page tables are owned by the
>>>>>>> firmware, and this manipulation is not possible.
>>>>>>>
>>>>>>> So switch to a different API for accepting memory in SEV-SNP guests, one
>>>>>>
>>>>>> That being the GHCB MSR protocol, it seems.
>>>>>>
>>>>>
>>>>> Yes.
>>>>>
>>>>>> And since Tom co-developed, I guess we wanna do that.
>>>>>>
>>>>>> But then how much slower do we become?
>>>>>>
>>>>>
>>>>> Non-EFI stub boot will become slower if the memory that is used to
>>>>> decompress the kernel has not been accepted yet. But given how heavily
>>>>> SEV-SNP depends on EFI boot, this typically only happens on kexec, as
>>>>> that is the only boot path that goes through the traditional
>>>>> decompressor.
>>>>
>>>> Some quick testing showed no significant differences in kexec booting
>>>> and testing shows everything seems to be good.
>>>>
>>>
>>> Thanks.
>>>
>>>> But, in testing with non-2M sized memory (e.g. a guest with 4097M of
>>>> memory) and without the change to how SNP is detected before
>>>> sev_enable() is called, we hit the error path in arch_accept_memory() in
>>>> arch/x86/boot/compressed/mem.c and the boot crashes.
>>>>
>>>
>>> Right. So this is because sev_snp_enabled() is based on sev_status,
>>> which has not been set yet at this point, right?
>>
>> Correct.
>>
> 
> OK. Would this do the trick? (with asm/sev.h added to the #includes)

Yes, that works for booting. Let me do some kexec testing and get back
to you. Sorry, that might not be until tomorrow, though.

Thanks,
Tom

> 
> --- a/arch/x86/boot/compressed/mem.c
> +++ b/arch/x86/boot/compressed/mem.c
> @@ -34,11 +34,14 @@ static bool early_is_tdx_guest(void)
> 
>  void arch_accept_memory(phys_addr_t start, phys_addr_t end)
>  {
> +       static bool sevsnp;
> +
>         /* Platform-specific memory-acceptance call goes here */
>         if (early_is_tdx_guest()) {
>                 if (!tdx_accept_memory(start, end))
>                         panic("TDX: Failed to accept memory\n");
> -       } else if (sev_snp_enabled()) {
> +       } else if (sevsnp || (sev_get_status() & MSR_AMD64_SEV_SNP_ENABLED)) {
> +               sevsnp = true;
>                 snp_accept_memory(start, end);
>         } else {
>                 error("Cannot accept memory: unknown platform\n");
> 
>>>
>>> And for the record, could you please indicate whether you are ok with
>>> the co-developed-by/signed-off-by credits on this patch (and
>>> subsequent revisions)?
>>
>> Yep, I'm fine with that.
>>
> 
> Thanks.

Re: [PATCH v3] x86/boot/sev: Avoid shared GHCB page for early memory acceptance

Posted by Tom Lendacky 9 months, 3 weeks ago

On 4/17/25 12:26, Tom Lendacky wrote:
> On 4/17/25 11:38, Ard Biesheuvel wrote:
>> On Thu, 17 Apr 2025 at 18:21, Tom Lendacky <thomas.lendacky@amd.com> wrote:
>>>
>>> On 4/17/25 11:14, Ard Biesheuvel wrote:
>>>> On Thu, 17 Apr 2025 at 18:08, Tom Lendacky <thomas.lendacky@amd.com> wrote:
>>>>>
>>>>> On 4/11/25 14:00, Ard Biesheuvel wrote:
>>>>>> On Fri, 11 Apr 2025 at 20:40, Borislav Petkov <bp@alien8.de> wrote:
>>>>>>>
>>>>>>> On Thu, Apr 10, 2025 at 03:28:51PM +0200, Ard Biesheuvel wrote:
>>>>>>>> From: Ard Biesheuvel <ardb@kernel.org>
>>>>>>>>
>>>>>>>> Communicating with the hypervisor using the shared GHCB page requires
>>>>>>>> clearing the C bit in the mapping of that page. When executing in the
>>>>>>>> context of the EFI boot services, the page tables are owned by the
>>>>>>>> firmware, and this manipulation is not possible.
>>>>>>>>
>>>>>>>> So switch to a different API for accepting memory in SEV-SNP guests, one
>>>>>>>
>>>>>>> That being the GHCB MSR protocol, it seems.
>>>>>>>
>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>>> And since Tom co-developed, I guess we wanna do that.
>>>>>>>
>>>>>>> But then how much slower do we become?
>>>>>>>
>>>>>>
>>>>>> Non-EFI stub boot will become slower if the memory that is used to
>>>>>> decompress the kernel has not been accepted yet. But given how heavily
>>>>>> SEV-SNP depends on EFI boot, this typically only happens on kexec, as
>>>>>> that is the only boot path that goes through the traditional
>>>>>> decompressor.
>>>>>
>>>>> Some quick testing showed no significant differences in kexec booting
>>>>> and testing shows everything seems to be good.
>>>>>
>>>>
>>>> Thanks.
>>>>
>>>>> But, in testing with non-2M sized memory (e.g. a guest with 4097M of
>>>>> memory) and without the change to how SNP is detected before
>>>>> sev_enable() is called, we hit the error path in arch_accept_memory() in
>>>>> arch/x86/boot/compressed/mem.c and the boot crashes.
>>>>>
>>>>
>>>> Right. So this is because sev_snp_enabled() is based on sev_status,
>>>> which has not been set yet at this point, right?
>>>
>>> Correct.
>>>
>>
>> OK. Would this do the trick? (with asm/sev.h added to the #includes)
> 
> Yes, that works for booting. Let me do some kexec testing and get back
> to you. Sorry, that might not be until tomorrow, though.

Ok, found some time... looks good with kexec, too.

Thanks,
Tom

> 
> Thanks,
> Tom
> 
>>
>> --- a/arch/x86/boot/compressed/mem.c
>> +++ b/arch/x86/boot/compressed/mem.c
>> @@ -34,11 +34,14 @@ static bool early_is_tdx_guest(void)
>>
>>  void arch_accept_memory(phys_addr_t start, phys_addr_t end)
>>  {
>> +       static bool sevsnp;
>> +
>>         /* Platform-specific memory-acceptance call goes here */
>>         if (early_is_tdx_guest()) {
>>                 if (!tdx_accept_memory(start, end))
>>                         panic("TDX: Failed to accept memory\n");
>> -       } else if (sev_snp_enabled()) {
>> +       } else if (sevsnp || (sev_get_status() & MSR_AMD64_SEV_SNP_ENABLED)) {
>> +               sevsnp = true;
>>                 snp_accept_memory(start, end);
>>         } else {
>>                 error("Cannot accept memory: unknown platform\n");
>>
>>>>
>>>> And for the record, could you please indicate whether you are ok with
>>>> the co-developed-by/signed-off-by credits on this patch (and
>>>> subsequent revisions)?
>>>
>>> Yep, I'm fine with that.
>>>
>>
>> Thanks.

Re: [PATCH v3] x86/boot/sev: Avoid shared GHCB page for early memory acceptance

Posted by Ard Biesheuvel 9 months, 3 weeks ago

On Thu, 17 Apr 2025 at 22:01, Tom Lendacky <thomas.lendacky@amd.com> wrote:
>
> On 4/17/25 12:26, Tom Lendacky wrote:
> > On 4/17/25 11:38, Ard Biesheuvel wrote:
> >> On Thu, 17 Apr 2025 at 18:21, Tom Lendacky <thomas.lendacky@amd.com> wrote:
> >>>
> >>> On 4/17/25 11:14, Ard Biesheuvel wrote:
> >>>> On Thu, 17 Apr 2025 at 18:08, Tom Lendacky <thomas.lendacky@amd.com> wrote:
> >>>>>
> >>>>> On 4/11/25 14:00, Ard Biesheuvel wrote:
> >>>>>> On Fri, 11 Apr 2025 at 20:40, Borislav Petkov <bp@alien8.de> wrote:
> >>>>>>>
> >>>>>>> On Thu, Apr 10, 2025 at 03:28:51PM +0200, Ard Biesheuvel wrote:
> >>>>>>>> From: Ard Biesheuvel <ardb@kernel.org>
> >>>>>>>>
> >>>>>>>> Communicating with the hypervisor using the shared GHCB page requires
> >>>>>>>> clearing the C bit in the mapping of that page. When executing in the
> >>>>>>>> context of the EFI boot services, the page tables are owned by the
> >>>>>>>> firmware, and this manipulation is not possible.
> >>>>>>>>
> >>>>>>>> So switch to a different API for accepting memory in SEV-SNP guests, one
> >>>>>>>
> >>>>>>> That being the GHCB MSR protocol, it seems.
> >>>>>>>
> >>>>>>
> >>>>>> Yes.
> >>>>>>
> >>>>>>> And since Tom co-developed, I guess we wanna do that.
> >>>>>>>
> >>>>>>> But then how much slower do we become?
> >>>>>>>
> >>>>>>
> >>>>>> Non-EFI stub boot will become slower if the memory that is used to
> >>>>>> decompress the kernel has not been accepted yet. But given how heavily
> >>>>>> SEV-SNP depends on EFI boot, this typically only happens on kexec, as
> >>>>>> that is the only boot path that goes through the traditional
> >>>>>> decompressor.
> >>>>>
> >>>>> Some quick testing showed no significant differences in kexec booting
> >>>>> and testing shows everything seems to be good.
> >>>>>
> >>>>
> >>>> Thanks.
> >>>>
> >>>>> But, in testing with non-2M sized memory (e.g. a guest with 4097M of
> >>>>> memory) and without the change to how SNP is detected before
> >>>>> sev_enable() is called, we hit the error path in arch_accept_memory() in
> >>>>> arch/x86/boot/compressed/mem.c and the boot crashes.
> >>>>>
> >>>>
> >>>> Right. So this is because sev_snp_enabled() is based on sev_status,
> >>>> which has not been set yet at this point, right?
> >>>
> >>> Correct.
> >>>
> >>
> >> OK. Would this do the trick? (with asm/sev.h added to the #includes)
> >
> > Yes, that works for booting. Let me do some kexec testing and get back
> > to you. Sorry, that might not be until tomorrow, though.
>
> Ok, found some time... looks good with kexec, too.
>

Thanks!

Re: [PATCH v3] x86/boot/sev: Avoid shared GHCB page for early memory acceptance

Posted by Ingo Molnar 9 months, 3 weeks ago

* Ard Biesheuvel <ardb@kernel.org> wrote:

> > >> OK. Would this do the trick? (with asm/sev.h added to the #includes)
> > >
> > > Yes, that works for booting. Let me do some kexec testing and get back
> > > to you. Sorry, that might not be until tomorrow, though.
> >
> > Ok, found some time... looks good with kexec, too.
> >
> 
> Thanks!

I've turned this into this tag for the -v4 patch:

 Tested-by: Tom Lendacky <thomas.lendacky@amd.com>

Thanks,

	Ingo

Re: [PATCH v3] x86/boot/sev: Avoid shared GHCB page for early memory acceptance

Posted by Borislav Petkov 10 months ago

On Fri, Apr 11, 2025 at 09:00:51PM +0200, Ard Biesheuvel wrote:
> ---%<---
> The GHCB shared page method never worked for accepting memory from the
> EFI stub, but this is rarely needed in practice: when using the higher
> level page allocation APIs, the firmware will make sure that memory is
> accepted before it is returned. The only use case for explicit memory
> acceptance by the EFI stub is when populating the 'unaccepted memory'
> bitmap, which tracks unaccepted memory at a 2MB granularity, and so
> chunks of unaccepted memory that are misaligned wrt that are accepted
> without being allocated or used.
> ---%<---
> 
> > I'm somehow missing that aspect of why that change is warranted...
> >
> 
> This never worked correctly for SEV-SNP, we're just lucky the firmware
> appears to accept memory in 2+ MB batches and so these misaligned
> chunks are rare in practice. Tom did manage to trigger it IIUC by
> giving a VM an amount of memory that is not a multiple of 2M.

Can you pls put that lenghty explanation in the commit message?

It is useful and we might go back to it in the future.

Thx.

[tip: x86/boot] x86/boot/sev: Avoid shared GHCB page for early memory acceptance

Posted by tip-bot2 for Ard Biesheuvel 10 months ago

The following commit has been merged into the x86/boot branch of tip:

Commit-ID:     5871c9d650bc0ccb16ed8444f16045808ba936bf
Gitweb:        https://git.kernel.org/tip/5871c9d650bc0ccb16ed8444f16045808ba936bf
Author:        Ard Biesheuvel <ardb@kernel.org>
AuthorDate:    Thu, 10 Apr 2025 15:28:51 +02:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 12 Apr 2025 20:43:41 +02:00

x86/boot/sev: Avoid shared GHCB page for early memory acceptance

Communicating with the hypervisor using the shared GHCB page requires
clearing the C bit in the mapping of that page. When executing in the
context of the EFI boot services, the page tables are owned by the
firmware, and this manipulation is not possible.

So switch to a different API for accepting memory in SEV-SNP guests, one
which is actually supported at the point during boot where the EFI stub
may need to accept memory, but the SEV-SNP init code has not executed
yet.

For simplicity, also switch the memory acceptance carried out by the
decompressor when not booting via EFI - this only involves the
allocation for the decompressed kernel, and is generally only called
after kexec, as normal boot will jump straight into the kernel from the
EFI stub.

Non-EFI stub boot will become slower if the memory that is used to
decompress the kernel has not been accepted yet. But given how heavily
SEV-SNP depends on EFI boot, this typically only happens on kexec, as
that is the only boot path that goes through the traditional
decompressor.

The GHCB shared page method never worked for accepting memory from the
EFI stub, but this is rarely needed in practice: when using the higher
level page allocation APIs, the firmware will make sure that memory is
accepted before it is returned. The only use case for explicit memory
acceptance by the EFI stub is when populating the 'unaccepted memory'
bitmap, which tracks unaccepted memory at a 2MB granularity, and so
chunks of unaccepted memory that are misaligned wrt that are accepted
without being allocated or used.

AFAICS this never worked correctly for SEV-SNP, we're just lucky
the firmware appears to accept memory in 2+ MB batches and so
these misaligned chunks are rare in practice. Tom did manage to
trigger it IIUC by giving a VM an amount of memory that is not a
multiple of 2M.

Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com>

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Dionna Amalie Glaze <dionnaglaze@google.com>
Cc: Kevin Loughlin <kevinloughlin@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: linux-efi@vger.kernel.org
Link: https://lore.kernel.org/r/20250410132850.3708703-2-ardb+git@google.com
---
 arch/x86/boot/compressed/sev.c | 67 +++++++--------------------------
 1 file changed, 15 insertions(+), 52 deletions(-)

diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 6eadd79..478eca4 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -169,10 +169,7 @@ bool sev_snp_enabled(void)
 
 static void __page_state_change(unsigned long paddr, enum psc_op op)
 {
-	u64 val;
-
-	if (!sev_snp_enabled())
-		return;
+	u64 val, msr;
 
 	/*
 	 * If private -> shared then invalidate the page before requesting the
@@ -181,6 +178,9 @@ static void __page_state_change(unsigned long paddr, enum psc_op op)
 	if (op == SNP_PAGE_STATE_SHARED)
 		pvalidate_4k_page(paddr, paddr, false);
 
+	/* Save the current GHCB MSR value */
+	msr = sev_es_rd_ghcb_msr();
+
 	/* Issue VMGEXIT to change the page state in RMP table. */
 	sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
 	VMGEXIT();
@@ -190,6 +190,9 @@ static void __page_state_change(unsigned long paddr, enum psc_op op)
 	if ((GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP) || GHCB_MSR_PSC_RESP_VAL(val))
 		sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
 
+	/* Restore the GHCB MSR value */
+	sev_es_wr_ghcb_msr(msr);
+
 	/*
 	 * Now that page state is changed in the RMP table, validate it so that it is
 	 * consistent with the RMP entry.
@@ -200,11 +203,17 @@ static void __page_state_change(unsigned long paddr, enum psc_op op)
 
 void snp_set_page_private(unsigned long paddr)
 {
+	if (!sev_snp_enabled())
+		return;
+
 	__page_state_change(paddr, SNP_PAGE_STATE_PRIVATE);
 }
 
 void snp_set_page_shared(unsigned long paddr)
 {
+	if (!sev_snp_enabled())
+		return;
+
 	__page_state_change(paddr, SNP_PAGE_STATE_SHARED);
 }
 
@@ -228,56 +237,10 @@ static bool early_setup_ghcb(void)
 	return true;
 }
 
-static phys_addr_t __snp_accept_memory(struct snp_psc_desc *desc,
-				       phys_addr_t pa, phys_addr_t pa_end)
-{
-	struct psc_hdr *hdr;
-	struct psc_entry *e;
-	unsigned int i;
-
-	hdr = &desc->hdr;
-	memset(hdr, 0, sizeof(*hdr));
-
-	e = desc->entries;
-
-	i = 0;
-	while (pa < pa_end && i < VMGEXIT_PSC_MAX_ENTRY) {
-		hdr->end_entry = i;
-
-		e->gfn = pa >> PAGE_SHIFT;
-		e->operation = SNP_PAGE_STATE_PRIVATE;
-		if (IS_ALIGNED(pa, PMD_SIZE) && (pa_end - pa) >= PMD_SIZE) {
-			e->pagesize = RMP_PG_SIZE_2M;
-			pa += PMD_SIZE;
-		} else {
-			e->pagesize = RMP_PG_SIZE_4K;
-			pa += PAGE_SIZE;
-		}
-
-		e++;
-		i++;
-	}
-
-	if (vmgexit_psc(boot_ghcb, desc))
-		sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
-
-	pvalidate_pages(desc);
-
-	return pa;
-}
-
 void snp_accept_memory(phys_addr_t start, phys_addr_t end)
 {
-	struct snp_psc_desc desc = {};
-	unsigned int i;
-	phys_addr_t pa;
-
-	if (!boot_ghcb && !early_setup_ghcb())
-		sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
-
-	pa = start;
-	while (pa < end)
-		pa = __snp_accept_memory(&desc, pa, end);
+	for (phys_addr_t pa = start; pa < end; pa += PAGE_SIZE)
+		__page_state_change(pa, SNP_PAGE_STATE_PRIVATE);
 }
 
 void sev_es_shutdown_ghcb(void)

Re: [tip: x86/boot] x86/boot/sev: Avoid shared GHCB page for early memory acceptance

Posted by Ard Biesheuvel 10 months ago

On Sat, 12 Apr 2025 at 22:29, tip-bot2 for Ard Biesheuvel
<tip-bot2@linutronix.de> wrote:
>
> The following commit has been merged into the x86/boot branch of tip:
>

This may be slightly premature. I took some of Tom's code, hence the
co-developed-by, but the should really confirm that what I did is
correct before we queue this up.

Re: [tip: x86/boot] x86/boot/sev: Avoid shared GHCB page for early memory acceptance

Posted by Ingo Molnar 10 months ago

* Ard Biesheuvel <ardb@kernel.org> wrote:

> On Sat, 12 Apr 2025 at 22:29, tip-bot2 for Ard Biesheuvel
> <tip-bot2@linutronix.de> wrote:
> >
> > The following commit has been merged into the x86/boot branch of tip:
> >
> 
> This may be slightly premature. I took some of Tom's code, hence the
> co-developed-by, but the should really confirm that what I did is
> correct before we queue this up.

OK, I've zapped it again, especially as the rest of the series wasn't 
ready either, please include the latest version of this patch as part 
of the boot/setup/ series, which hard-relies upon it.

Thanks,

	Ingo

Re: [tip: x86/boot] x86/boot/sev: Avoid shared GHCB page for early memory acceptance

Posted by Ard Biesheuvel 9 months, 3 weeks ago

On Sat, 12 Apr 2025 at 22:45, Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Ard Biesheuvel <ardb@kernel.org> wrote:
>
> > On Sat, 12 Apr 2025 at 22:29, tip-bot2 for Ard Biesheuvel
> > <tip-bot2@linutronix.de> wrote:
> > >
> > > The following commit has been merged into the x86/boot branch of tip:
> > >
> >
> > This may be slightly premature. I took some of Tom's code, hence the
> > co-developed-by, but the should really confirm that what I did is
> > correct before we queue this up.
>
> OK, I've zapped it again, especially as the rest of the series wasn't
> ready either, please include the latest version of this patch as part
> of the boot/setup/ series, which hard-relies upon it.
>

I have sent out a v4 here [0].

I am not including it in the next rev of the startup/ refactor series,
given that this change is a fix that also needs to go to stable.
Please apply it as a fix and merge back the branch into tip/x86/boot -
I will rebase the startup/ refactor series on top of that.

Thanks,

[0] https://lore.kernel.org/linux-efi/20250417202120.1002102-2-ardb+git@google.com/T/#u

Re: [tip: x86/boot] x86/boot/sev: Avoid shared GHCB page for early memory acceptance

Posted by Ingo Molnar 9 months, 3 weeks ago

* Ard Biesheuvel <ardb@kernel.org> wrote:

> On Sat, 12 Apr 2025 at 22:45, Ingo Molnar <mingo@kernel.org> wrote:
> >
> >
> > * Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > > On Sat, 12 Apr 2025 at 22:29, tip-bot2 for Ard Biesheuvel
> > > <tip-bot2@linutronix.de> wrote:
> > > >
> > > > The following commit has been merged into the x86/boot branch of tip:
> > > >
> > >
> > > This may be slightly premature. I took some of Tom's code, hence the
> > > co-developed-by, but the should really confirm that what I did is
> > > correct before we queue this up.
> >
> > OK, I've zapped it again, especially as the rest of the series wasn't
> > ready either, please include the latest version of this patch as part
> > of the boot/setup/ series, which hard-relies upon it.
> >
> 
> I have sent out a v4 here [0].
> 
> I am not including it in the next rev of the startup/ refactor series,
> given that this change is a fix that also needs to go to stable.
> Please apply it as a fix and merge back the branch into tip/x86/boot -
> I will rebase the startup/ refactor series on top of that.
> 
> Thanks,
> 
> [0] https://lore.kernel.org/linux-efi/20250417202120.1002102-2-ardb+git@google.com/T/#u

Noted, thanks for the heads up!

	Ingo

Re: [tip: x86/boot] x86/boot/sev: Avoid shared GHCB page for early memory acceptance

Posted by Ingo Molnar 9 months, 3 weeks ago

* Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Ard Biesheuvel <ardb@kernel.org> wrote:
> 
> > On Sat, 12 Apr 2025 at 22:45, Ingo Molnar <mingo@kernel.org> wrote:
> > >
> > >
> > > * Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > > On Sat, 12 Apr 2025 at 22:29, tip-bot2 for Ard Biesheuvel
> > > > <tip-bot2@linutronix.de> wrote:
> > > > >
> > > > > The following commit has been merged into the x86/boot branch of tip:
> > > > >
> > > >
> > > > This may be slightly premature. I took some of Tom's code, hence the
> > > > co-developed-by, but the should really confirm that what I did is
> > > > correct before we queue this up.
> > >
> > > OK, I've zapped it again, especially as the rest of the series wasn't
> > > ready either, please include the latest version of this patch as part
> > > of the boot/setup/ series, which hard-relies upon it.
> > >
> > 
> > I have sent out a v4 here [0].
> > 
> > I am not including it in the next rev of the startup/ refactor series,
> > given that this change is a fix that also needs to go to stable.
> > Please apply it as a fix and merge back the branch into tip/x86/boot -
> > I will rebase the startup/ refactor series on top of that.
> > 
> > Thanks,
> > 
> > [0] https://lore.kernel.org/linux-efi/20250417202120.1002102-2-ardb+git@google.com/T/#u
> 
> Noted, thanks for the heads up!

So there's this new merge commit in tip:x86/boot:

   54f71aa90c84 Merge branch 'x86/urgent' into x86/boot, to merge dependent commit and fixes

which brings this fix into x86/boot:

   a718833cb456 ("x86/boot/sev: Avoid shared GHCB page for early memory acceptance")

thus 54f71aa90c84 should be a proper base the rest of the startup/ 
series, right?

Thanks,

	Ingo

Re: [tip: x86/boot] x86/boot/sev: Avoid shared GHCB page for early memory acceptance

Posted by Ard Biesheuvel 9 months, 3 weeks ago

On Fri, 18 Apr 2025 at 10:14, Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Ingo Molnar <mingo@kernel.org> wrote:
>
...
>
> So there's this new merge commit in tip:x86/boot:
>
>    54f71aa90c84 Merge branch 'x86/urgent' into x86/boot, to merge dependent commit and fixes
>
> which brings this fix into x86/boot:
>
>    a718833cb456 ("x86/boot/sev: Avoid shared GHCB page for early memory acceptance")
>
> thus 54f71aa90c84 should be a proper base the rest of the startup/
> series, right?
>

Yes, perfect. Thanks.

Re: [tip: x86/boot] x86/boot/sev: Avoid shared GHCB page for early memory acceptance

Posted by Ard Biesheuvel 10 months ago

On Sat, 12 Apr 2025 at 22:45, Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Ard Biesheuvel <ardb@kernel.org> wrote:
>
> > On Sat, 12 Apr 2025 at 22:29, tip-bot2 for Ard Biesheuvel
> > <tip-bot2@linutronix.de> wrote:
> > >
> > > The following commit has been merged into the x86/boot branch of tip:
> > >
> >
> > This may be slightly premature. I took some of Tom's code, hence the
> > co-developed-by, but the should really confirm that what I did is
> > correct before we queue this up.
>
> OK, I've zapped it again, especially as the rest of the series wasn't
> ready either, please include the latest version of this patch as part
> of the boot/setup/ series, which hard-relies upon it.
>

OK