[PATCH v3] x86/sev: Do not touch VMSA pages during kdump of SNP guest memory

Ashish Kalra posted 1 patch 9 months, 2 weeks ago
arch/x86/coco/sev/core.c | 241 +++++++++++++++++++++++++--------------
1 file changed, 155 insertions(+), 86 deletions(-)
[PATCH v3] x86/sev: Do not touch VMSA pages during kdump of SNP guest memory
Posted by Ashish Kalra 9 months, 2 weeks ago
From: Ashish Kalra <ashish.kalra@amd.com>

When kdump is running makedumpfile to generate vmcore and dumping SNP
guest memory it touches the VMSA page of the vCPU executing kdump which
then results in unrecoverable #NPF/RMP faults as the VMSA page is
marked busy/in-use when the vCPU is running and subsequently causes
guest softlockup/hang.

Additionally other APs may be halted in guest mode and their VMSA pages
are marked busy and touching these VMSA pages during guest memory dump
will also cause #NPF.

Issue AP_DESTROY GHCB calls on other APs to ensure they are kicked out
of guest mode and then clear the VMSA bit on their VMSA pages.

If the vCPU running kdump is an AP, mark it's VMSA page as offline to
ensure that makedumpfile excludes that page while dumping guest memory.

Cc: stable@vger.kernel.org
Fixes: 3074152e56c9 ("x86/sev: Convert shared memory back to private on kexec")
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/coco/sev/core.c | 241 +++++++++++++++++++++++++--------------
 1 file changed, 155 insertions(+), 86 deletions(-)

diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
index dcfaa698d6cf..f4eb5b645239 100644
--- a/arch/x86/coco/sev/core.c
+++ b/arch/x86/coco/sev/core.c
@@ -877,6 +877,99 @@ void snp_accept_memory(phys_addr_t start, phys_addr_t end)
 	set_pages_state(vaddr, npages, SNP_PAGE_STATE_PRIVATE);
 }
 
+static int vmgexit_ap_control(u64 event, struct sev_es_save_area *vmsa, u32 apic_id)
+{
+	struct ghcb_state state;
+	unsigned long flags;
+	struct ghcb *ghcb;
+	int ret = 0;
+
+	local_irq_save(flags);
+
+	ghcb = __sev_get_ghcb(&state);
+
+	vc_ghcb_invalidate(ghcb);
+	if (event == SVM_VMGEXIT_AP_CREATE)
+		ghcb_set_rax(ghcb, vmsa->sev_features);
+	ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_AP_CREATION);
+	ghcb_set_sw_exit_info_1(ghcb,
+				((u64)apic_id << 32)	|
+				((u64)snp_vmpl << 16)	|
+				event);
+	ghcb_set_sw_exit_info_2(ghcb, __pa(vmsa));
+
+	sev_es_wr_ghcb_msr(__pa(ghcb));
+	VMGEXIT();
+
+	if (!ghcb_sw_exit_info_1_is_valid(ghcb) ||
+	    lower_32_bits(ghcb->save.sw_exit_info_1)) {
+		pr_err("SNP AP %s error\n", (event == SVM_VMGEXIT_AP_CREATE ? "CREATE" : "DESTROY"));
+		ret = -EINVAL;
+	}
+
+	__sev_put_ghcb(&state);
+
+	local_irq_restore(flags);
+
+	return ret;
+}
+
+static int snp_set_vmsa(void *va, void *caa, int apic_id, bool make_vmsa)
+{
+	int ret;
+
+	if (snp_vmpl) {
+		struct svsm_call call = {};
+		unsigned long flags;
+
+		local_irq_save(flags);
+
+		call.caa = this_cpu_read(svsm_caa);
+		call.rcx = __pa(va);
+
+		if (make_vmsa) {
+			/* Protocol 0, Call ID 2 */
+			call.rax = SVSM_CORE_CALL(SVSM_CORE_CREATE_VCPU);
+			call.rdx = __pa(caa);
+			call.r8  = apic_id;
+		} else {
+			/* Protocol 0, Call ID 3 */
+			call.rax = SVSM_CORE_CALL(SVSM_CORE_DELETE_VCPU);
+		}
+
+		ret = svsm_perform_call_protocol(&call);
+
+		local_irq_restore(flags);
+	} else {
+		/*
+		 * If the kernel runs at VMPL0, it can change the VMSA
+		 * bit for a page using the RMPADJUST instruction.
+		 * However, for the instruction to succeed it must
+		 * target the permissions of a lesser privileged (higher
+		 * numbered) VMPL level, so use VMPL1.
+		 */
+		u64 attrs = 1;
+
+		if (make_vmsa)
+			attrs |= RMPADJUST_VMSA_PAGE_BIT;
+
+		ret = rmpadjust((unsigned long)va, RMP_PG_SIZE_4K, attrs);
+	}
+
+	return ret;
+}
+
+static void snp_cleanup_vmsa(struct sev_es_save_area *vmsa, int apic_id)
+{
+	int err;
+
+	err = snp_set_vmsa(vmsa, NULL, apic_id, false);
+	if (err)
+		pr_err("clear VMSA page failed (%u), leaking page\n", err);
+	else
+		free_page((unsigned long)vmsa);
+}
+
 static void set_pte_enc(pte_t *kpte, int level, void *va)
 {
 	struct pte_enc_desc d = {
@@ -973,6 +1066,65 @@ void snp_kexec_begin(void)
 		pr_warn("Failed to stop shared<->private conversions\n");
 }
 
+/*
+ * Shutdown all APs except the one handling kexec/kdump and clearing
+ * the VMSA tag on AP's VMSA pages as they are not being used as
+ * VMSA page anymore.
+ */
+static void shutdown_all_aps(void)
+{
+	struct sev_es_save_area *vmsa;
+	int apic_id, this_cpu, cpu;
+
+	this_cpu = get_cpu();
+
+	/*
+	 * APs are already in HLT loop when enc_kexec_finish() callback
+	 * is invoked.
+	 */
+	for_each_present_cpu(cpu) {
+		vmsa = per_cpu(sev_vmsa, cpu);
+
+		/*
+		 * BSP does not have guest allocated VMSA and there is no need
+		 * to clear the VMSA tag for this page.
+		 */
+		if (!vmsa)
+			continue;
+
+		/*
+		 * Cannot clear the VMSA tag for the currently running vCPU.
+		 */
+		if (this_cpu == cpu) {
+			unsigned long pa;
+			struct page *p;
+
+			pa = __pa(vmsa);
+			/*
+			 * Mark the VMSA page of the running vCPU as offline
+			 * so that is excluded and not touched by makedumpfile
+			 * while generating vmcore during kdump.
+			 */
+			p = pfn_to_online_page(pa >> PAGE_SHIFT);
+			if (p)
+				__SetPageOffline(p);
+			continue;
+		}
+
+		apic_id = cpuid_to_apicid[cpu];
+
+		/*
+		 * Issue AP destroy to ensure AP gets kicked out of guest mode
+		 * to allow using RMPADJUST to remove the VMSA tag on it's
+		 * VMSA page.
+		 */
+		vmgexit_ap_control(SVM_VMGEXIT_AP_DESTROY, vmsa, apic_id);
+		snp_cleanup_vmsa(vmsa, apic_id);
+	}
+
+	put_cpu();
+}
+
 void snp_kexec_finish(void)
 {
 	struct sev_es_runtime_data *data;
@@ -987,6 +1139,8 @@ void snp_kexec_finish(void)
 	if (!IS_ENABLED(CONFIG_KEXEC_CORE))
 		return;
 
+	shutdown_all_aps();
+
 	unshare_all_memory();
 
 	/*
@@ -1008,51 +1162,6 @@ void snp_kexec_finish(void)
 	}
 }
 
-static int snp_set_vmsa(void *va, void *caa, int apic_id, bool make_vmsa)
-{
-	int ret;
-
-	if (snp_vmpl) {
-		struct svsm_call call = {};
-		unsigned long flags;
-
-		local_irq_save(flags);
-
-		call.caa = this_cpu_read(svsm_caa);
-		call.rcx = __pa(va);
-
-		if (make_vmsa) {
-			/* Protocol 0, Call ID 2 */
-			call.rax = SVSM_CORE_CALL(SVSM_CORE_CREATE_VCPU);
-			call.rdx = __pa(caa);
-			call.r8  = apic_id;
-		} else {
-			/* Protocol 0, Call ID 3 */
-			call.rax = SVSM_CORE_CALL(SVSM_CORE_DELETE_VCPU);
-		}
-
-		ret = svsm_perform_call_protocol(&call);
-
-		local_irq_restore(flags);
-	} else {
-		/*
-		 * If the kernel runs at VMPL0, it can change the VMSA
-		 * bit for a page using the RMPADJUST instruction.
-		 * However, for the instruction to succeed it must
-		 * target the permissions of a lesser privileged (higher
-		 * numbered) VMPL level, so use VMPL1.
-		 */
-		u64 attrs = 1;
-
-		if (make_vmsa)
-			attrs |= RMPADJUST_VMSA_PAGE_BIT;
-
-		ret = rmpadjust((unsigned long)va, RMP_PG_SIZE_4K, attrs);
-	}
-
-	return ret;
-}
-
 #define __ATTR_BASE		(SVM_SELECTOR_P_MASK | SVM_SELECTOR_S_MASK)
 #define INIT_CS_ATTRIBS		(__ATTR_BASE | SVM_SELECTOR_READ_MASK | SVM_SELECTOR_CODE_MASK)
 #define INIT_DS_ATTRIBS		(__ATTR_BASE | SVM_SELECTOR_WRITE_MASK)
@@ -1084,24 +1193,10 @@ static void *snp_alloc_vmsa_page(int cpu)
 	return page_address(p + 1);
 }
 
-static void snp_cleanup_vmsa(struct sev_es_save_area *vmsa, int apic_id)
-{
-	int err;
-
-	err = snp_set_vmsa(vmsa, NULL, apic_id, false);
-	if (err)
-		pr_err("clear VMSA page failed (%u), leaking page\n", err);
-	else
-		free_page((unsigned long)vmsa);
-}
-
 static int wakeup_cpu_via_vmgexit(u32 apic_id, unsigned long start_ip)
 {
 	struct sev_es_save_area *cur_vmsa, *vmsa;
-	struct ghcb_state state;
 	struct svsm_ca *caa;
-	unsigned long flags;
-	struct ghcb *ghcb;
 	u8 sipi_vector;
 	int cpu, ret;
 	u64 cr4;
@@ -1215,33 +1310,7 @@ static int wakeup_cpu_via_vmgexit(u32 apic_id, unsigned long start_ip)
 	}
 
 	/* Issue VMGEXIT AP Creation NAE event */
-	local_irq_save(flags);
-
-	ghcb = __sev_get_ghcb(&state);
-
-	vc_ghcb_invalidate(ghcb);
-	ghcb_set_rax(ghcb, vmsa->sev_features);
-	ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_AP_CREATION);
-	ghcb_set_sw_exit_info_1(ghcb,
-				((u64)apic_id << 32)	|
-				((u64)snp_vmpl << 16)	|
-				SVM_VMGEXIT_AP_CREATE);
-	ghcb_set_sw_exit_info_2(ghcb, __pa(vmsa));
-
-	sev_es_wr_ghcb_msr(__pa(ghcb));
-	VMGEXIT();
-
-	if (!ghcb_sw_exit_info_1_is_valid(ghcb) ||
-	    lower_32_bits(ghcb->save.sw_exit_info_1)) {
-		pr_err("SNP AP Creation error\n");
-		ret = -EINVAL;
-	}
-
-	__sev_put_ghcb(&state);
-
-	local_irq_restore(flags);
-
-	/* Perform cleanup if there was an error */
+	ret = vmgexit_ap_control(SVM_VMGEXIT_AP_CREATE, vmsa, apic_id);
 	if (ret) {
 		snp_cleanup_vmsa(vmsa, apic_id);
 		vmsa = NULL;
-- 
2.34.1
Re: [PATCH v3] x86/sev: Do not touch VMSA pages during kdump of SNP guest memory
Posted by Borislav Petkov 9 months, 1 week ago
On Mon, Apr 28, 2025 at 09:41:51PM +0000, Ashish Kalra wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> When kdump is running makedumpfile to generate vmcore and dumping SNP
> guest memory it touches the VMSA page of the vCPU executing kdump which
> then results in unrecoverable #NPF/RMP faults as the VMSA page is
> marked busy/in-use when the vCPU is running and subsequently causes
> guest softlockup/hang.
> 
> Additionally other APs may be halted in guest mode and their VMSA pages
> are marked busy and touching these VMSA pages during guest memory dump
> will also cause #NPF.
> 
> Issue AP_DESTROY GHCB calls on other APs to ensure they are kicked out
> of guest mode and then clear the VMSA bit on their VMSA pages.
> 
> If the vCPU running kdump is an AP, mark it's VMSA page as offline to
> ensure that makedumpfile excludes that page while dumping guest memory.
> 
> Cc: stable@vger.kernel.org
> Fixes: 3074152e56c9 ("x86/sev: Convert shared memory back to private on kexec")
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>  arch/x86/coco/sev/core.c | 241 +++++++++++++++++++++++++--------------
>  1 file changed, 155 insertions(+), 86 deletions(-)

Some minor cleanups ontop:

diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
index b031cabb2ccf..9ac902d022bf 100644
--- a/arch/x86/coco/sev/core.c
+++ b/arch/x86/coco/sev/core.c
@@ -961,6 +961,7 @@ void snp_accept_memory(phys_addr_t start, phys_addr_t end)
 
 static int vmgexit_ap_control(u64 event, struct sev_es_save_area *vmsa, u32 apic_id)
 {
+	bool create = event == SVM_VMGEXIT_AP_CREATE;
 	struct ghcb_state state;
 	unsigned long flags;
 	struct ghcb *ghcb;
@@ -971,8 +972,10 @@ static int vmgexit_ap_control(u64 event, struct sev_es_save_area *vmsa, u32 apic
 	ghcb = __sev_get_ghcb(&state);
 
 	vc_ghcb_invalidate(ghcb);
-	if (event == SVM_VMGEXIT_AP_CREATE)
+
+	if (create)
 		ghcb_set_rax(ghcb, vmsa->sev_features);
+
 	ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_AP_CREATION);
 	ghcb_set_sw_exit_info_1(ghcb,
 				((u64)apic_id << 32)	|
@@ -985,7 +988,7 @@ static int vmgexit_ap_control(u64 event, struct sev_es_save_area *vmsa, u32 apic
 
 	if (!ghcb_sw_exit_info_1_is_valid(ghcb) ||
 	    lower_32_bits(ghcb->save.sw_exit_info_1)) {
-		pr_err("SNP AP %s error\n", (event == SVM_VMGEXIT_AP_CREATE ? "CREATE" : "DESTROY"));
+		pr_err("SNP AP %s error\n", (create ? "CREATE" : "DESTROY"));
 		ret = -EINVAL;
 	}
 
@@ -1168,8 +1171,8 @@ static void shutdown_all_aps(void)
 		vmsa = per_cpu(sev_vmsa, cpu);
 
 		/*
-		 * BSP does not have guest allocated VMSA and there is no need
-		 * to clear the VMSA tag for this page.
+		 * The BSP or offlined APs do not have guest allocated VMSA
+		 * and there is no need  to clear the VMSA tag for this page.
 		 */
 		if (!vmsa)
 			continue;

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH v3] x86/sev: Do not touch VMSA pages during kdump of SNP guest memory
Posted by Gupta, Pankaj 9 months, 1 week ago
> When kdump is running makedumpfile to generate vmcore and dumping SNP
> guest memory it touches the VMSA page of the vCPU executing kdump which
> then results in unrecoverable #NPF/RMP faults as the VMSA page is
> marked busy/in-use when the vCPU is running and subsequently causes
> guest softlockup/hang.
> 
> Additionally other APs may be halted in guest mode and their VMSA pages
> are marked busy and touching these VMSA pages during guest memory dump
> will also cause #NPF.
> 
> Issue AP_DESTROY GHCB calls on other APs to ensure they are kicked out
> of guest mode and then clear the VMSA bit on their VMSA pages.
> 
> If the vCPU running kdump is an AP, mark it's VMSA page as offline to
> ensure that makedumpfile excludes that page while dumping guest memory.
> 
> Cc: stable@vger.kernel.org
> Fixes: 3074152e56c9 ("x86/sev: Convert shared memory back to private on kexec")
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>

Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>

> ---
>   arch/x86/coco/sev/core.c | 241 +++++++++++++++++++++++++--------------
>   1 file changed, 155 insertions(+), 86 deletions(-)
> 
> diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
> index dcfaa698d6cf..f4eb5b645239 100644
> --- a/arch/x86/coco/sev/core.c
> +++ b/arch/x86/coco/sev/core.c
> @@ -877,6 +877,99 @@ void snp_accept_memory(phys_addr_t start, phys_addr_t end)
>   	set_pages_state(vaddr, npages, SNP_PAGE_STATE_PRIVATE);
>   }
>   
> +static int vmgexit_ap_control(u64 event, struct sev_es_save_area *vmsa, u32 apic_id)
> +{
> +	struct ghcb_state state;
> +	unsigned long flags;
> +	struct ghcb *ghcb;
> +	int ret = 0;
> +
> +	local_irq_save(flags);
> +
> +	ghcb = __sev_get_ghcb(&state);
> +
> +	vc_ghcb_invalidate(ghcb);
> +	if (event == SVM_VMGEXIT_AP_CREATE)
> +		ghcb_set_rax(ghcb, vmsa->sev_features);
> +	ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_AP_CREATION);
> +	ghcb_set_sw_exit_info_1(ghcb,
> +				((u64)apic_id << 32)	|
> +				((u64)snp_vmpl << 16)	|
> +				event);
> +	ghcb_set_sw_exit_info_2(ghcb, __pa(vmsa));
> +
> +	sev_es_wr_ghcb_msr(__pa(ghcb));
> +	VMGEXIT();
> +
> +	if (!ghcb_sw_exit_info_1_is_valid(ghcb) ||
> +	    lower_32_bits(ghcb->save.sw_exit_info_1)) {
> +		pr_err("SNP AP %s error\n", (event == SVM_VMGEXIT_AP_CREATE ? "CREATE" : "DESTROY"));
> +		ret = -EINVAL;
> +	}
> +
> +	__sev_put_ghcb(&state);
> +
> +	local_irq_restore(flags);
> +
> +	return ret;
> +}
> +
> +static int snp_set_vmsa(void *va, void *caa, int apic_id, bool make_vmsa)
> +{
> +	int ret;
> +
> +	if (snp_vmpl) {
> +		struct svsm_call call = {};
> +		unsigned long flags;
> +
> +		local_irq_save(flags);
> +
> +		call.caa = this_cpu_read(svsm_caa);
> +		call.rcx = __pa(va);
> +
> +		if (make_vmsa) {
> +			/* Protocol 0, Call ID 2 */
> +			call.rax = SVSM_CORE_CALL(SVSM_CORE_CREATE_VCPU);
> +			call.rdx = __pa(caa);
> +			call.r8  = apic_id;
> +		} else {
> +			/* Protocol 0, Call ID 3 */
> +			call.rax = SVSM_CORE_CALL(SVSM_CORE_DELETE_VCPU);
> +		}
> +
> +		ret = svsm_perform_call_protocol(&call);
> +
> +		local_irq_restore(flags);
> +	} else {
> +		/*
> +		 * If the kernel runs at VMPL0, it can change the VMSA
> +		 * bit for a page using the RMPADJUST instruction.
> +		 * However, for the instruction to succeed it must
> +		 * target the permissions of a lesser privileged (higher
> +		 * numbered) VMPL level, so use VMPL1.
> +		 */
> +		u64 attrs = 1;
> +
> +		if (make_vmsa)
> +			attrs |= RMPADJUST_VMSA_PAGE_BIT;
> +
> +		ret = rmpadjust((unsigned long)va, RMP_PG_SIZE_4K, attrs);
> +	}
> +
> +	return ret;
> +}
> +
> +static void snp_cleanup_vmsa(struct sev_es_save_area *vmsa, int apic_id)
> +{
> +	int err;
> +
> +	err = snp_set_vmsa(vmsa, NULL, apic_id, false);
> +	if (err)
> +		pr_err("clear VMSA page failed (%u), leaking page\n", err);
> +	else
> +		free_page((unsigned long)vmsa);
> +}
> +
>   static void set_pte_enc(pte_t *kpte, int level, void *va)
>   {
>   	struct pte_enc_desc d = {
> @@ -973,6 +1066,65 @@ void snp_kexec_begin(void)
>   		pr_warn("Failed to stop shared<->private conversions\n");
>   }
>   
> +/*
> + * Shutdown all APs except the one handling kexec/kdump and clearing
> + * the VMSA tag on AP's VMSA pages as they are not being used as
> + * VMSA page anymore.
> + */
> +static void shutdown_all_aps(void)
> +{
> +	struct sev_es_save_area *vmsa;
> +	int apic_id, this_cpu, cpu;
> +
> +	this_cpu = get_cpu();
> +
> +	/*
> +	 * APs are already in HLT loop when enc_kexec_finish() callback
> +	 * is invoked.
> +	 */
> +	for_each_present_cpu(cpu) {
> +		vmsa = per_cpu(sev_vmsa, cpu);
> +
> +		/*
> +		 * BSP does not have guest allocated VMSA and there is no need
> +		 * to clear the VMSA tag for this page.
> +		 */
> +		if (!vmsa)
> +			continue;
> +
> +		/*
> +		 * Cannot clear the VMSA tag for the currently running vCPU.
> +		 */
> +		if (this_cpu == cpu) {
> +			unsigned long pa;
> +			struct page *p;
> +
> +			pa = __pa(vmsa);
> +			/*
> +			 * Mark the VMSA page of the running vCPU as offline
> +			 * so that is excluded and not touched by makedumpfile
> +			 * while generating vmcore during kdump.
> +			 */
> +			p = pfn_to_online_page(pa >> PAGE_SHIFT);
> +			if (p)
> +				__SetPageOffline(p);
> +			continue;
> +		}
> +
> +		apic_id = cpuid_to_apicid[cpu];
> +
> +		/*
> +		 * Issue AP destroy to ensure AP gets kicked out of guest mode
> +		 * to allow using RMPADJUST to remove the VMSA tag on it's
> +		 * VMSA page.
> +		 */
> +		vmgexit_ap_control(SVM_VMGEXIT_AP_DESTROY, vmsa, apic_id);
> +		snp_cleanup_vmsa(vmsa, apic_id);
> +	}
> +
> +	put_cpu();
> +}
> +
>   void snp_kexec_finish(void)
>   {
>   	struct sev_es_runtime_data *data;
> @@ -987,6 +1139,8 @@ void snp_kexec_finish(void)
>   	if (!IS_ENABLED(CONFIG_KEXEC_CORE))
>   		return;
>   
> +	shutdown_all_aps();
> +
>   	unshare_all_memory();
>   
>   	/*
> @@ -1008,51 +1162,6 @@ void snp_kexec_finish(void)
>   	}
>   }
>   
> -static int snp_set_vmsa(void *va, void *caa, int apic_id, bool make_vmsa)
> -{
> -	int ret;
> -
> -	if (snp_vmpl) {
> -		struct svsm_call call = {};
> -		unsigned long flags;
> -
> -		local_irq_save(flags);
> -
> -		call.caa = this_cpu_read(svsm_caa);
> -		call.rcx = __pa(va);
> -
> -		if (make_vmsa) {
> -			/* Protocol 0, Call ID 2 */
> -			call.rax = SVSM_CORE_CALL(SVSM_CORE_CREATE_VCPU);
> -			call.rdx = __pa(caa);
> -			call.r8  = apic_id;
> -		} else {
> -			/* Protocol 0, Call ID 3 */
> -			call.rax = SVSM_CORE_CALL(SVSM_CORE_DELETE_VCPU);
> -		}
> -
> -		ret = svsm_perform_call_protocol(&call);
> -
> -		local_irq_restore(flags);
> -	} else {
> -		/*
> -		 * If the kernel runs at VMPL0, it can change the VMSA
> -		 * bit for a page using the RMPADJUST instruction.
> -		 * However, for the instruction to succeed it must
> -		 * target the permissions of a lesser privileged (higher
> -		 * numbered) VMPL level, so use VMPL1.
> -		 */
> -		u64 attrs = 1;
> -
> -		if (make_vmsa)
> -			attrs |= RMPADJUST_VMSA_PAGE_BIT;
> -
> -		ret = rmpadjust((unsigned long)va, RMP_PG_SIZE_4K, attrs);
> -	}
> -
> -	return ret;
> -}
> -
>   #define __ATTR_BASE		(SVM_SELECTOR_P_MASK | SVM_SELECTOR_S_MASK)
>   #define INIT_CS_ATTRIBS		(__ATTR_BASE | SVM_SELECTOR_READ_MASK | SVM_SELECTOR_CODE_MASK)
>   #define INIT_DS_ATTRIBS		(__ATTR_BASE | SVM_SELECTOR_WRITE_MASK)
> @@ -1084,24 +1193,10 @@ static void *snp_alloc_vmsa_page(int cpu)
>   	return page_address(p + 1);
>   }
>   
> -static void snp_cleanup_vmsa(struct sev_es_save_area *vmsa, int apic_id)
> -{
> -	int err;
> -
> -	err = snp_set_vmsa(vmsa, NULL, apic_id, false);
> -	if (err)
> -		pr_err("clear VMSA page failed (%u), leaking page\n", err);
> -	else
> -		free_page((unsigned long)vmsa);
> -}
> -
>   static int wakeup_cpu_via_vmgexit(u32 apic_id, unsigned long start_ip)
>   {
>   	struct sev_es_save_area *cur_vmsa, *vmsa;
> -	struct ghcb_state state;
>   	struct svsm_ca *caa;
> -	unsigned long flags;
> -	struct ghcb *ghcb;
>   	u8 sipi_vector;
>   	int cpu, ret;
>   	u64 cr4;
> @@ -1215,33 +1310,7 @@ static int wakeup_cpu_via_vmgexit(u32 apic_id, unsigned long start_ip)
>   	}
>   
>   	/* Issue VMGEXIT AP Creation NAE event */
> -	local_irq_save(flags);
> -
> -	ghcb = __sev_get_ghcb(&state);
> -
> -	vc_ghcb_invalidate(ghcb);
> -	ghcb_set_rax(ghcb, vmsa->sev_features);
> -	ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_AP_CREATION);
> -	ghcb_set_sw_exit_info_1(ghcb,
> -				((u64)apic_id << 32)	|
> -				((u64)snp_vmpl << 16)	|
> -				SVM_VMGEXIT_AP_CREATE);
> -	ghcb_set_sw_exit_info_2(ghcb, __pa(vmsa));
> -
> -	sev_es_wr_ghcb_msr(__pa(ghcb));
> -	VMGEXIT();
> -
> -	if (!ghcb_sw_exit_info_1_is_valid(ghcb) ||
> -	    lower_32_bits(ghcb->save.sw_exit_info_1)) {
> -		pr_err("SNP AP Creation error\n");
> -		ret = -EINVAL;
> -	}
> -
> -	__sev_put_ghcb(&state);
> -
> -	local_irq_restore(flags);
> -
> -	/* Perform cleanup if there was an error */
> +	ret = vmgexit_ap_control(SVM_VMGEXIT_AP_CREATE, vmsa, apic_id);
>   	if (ret) {
>   		snp_cleanup_vmsa(vmsa, apic_id);
>   		vmsa = NULL;
Re: [PATCH v3] x86/sev: Do not touch VMSA pages during kdump of SNP guest memory
Posted by Tom Lendacky 9 months, 1 week ago
On 4/28/25 16:41, Ashish Kalra wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> When kdump is running makedumpfile to generate vmcore and dumping SNP
> guest memory it touches the VMSA page of the vCPU executing kdump which
> then results in unrecoverable #NPF/RMP faults as the VMSA page is
> marked busy/in-use when the vCPU is running and subsequently causes
> guest softlockup/hang.
> 
> Additionally other APs may be halted in guest mode and their VMSA pages
> are marked busy and touching these VMSA pages during guest memory dump
> will also cause #NPF.
> 
> Issue AP_DESTROY GHCB calls on other APs to ensure they are kicked out
> of guest mode and then clear the VMSA bit on their VMSA pages.
> 
> If the vCPU running kdump is an AP, mark it's VMSA page as offline to
> ensure that makedumpfile excludes that page while dumping guest memory.
> 
> Cc: stable@vger.kernel.org
> Fixes: 3074152e56c9 ("x86/sev: Convert shared memory back to private on kexec")
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>

Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>

> ---
>  arch/x86/coco/sev/core.c | 241 +++++++++++++++++++++++++--------------
>  1 file changed, 155 insertions(+), 86 deletions(-)
>
[tip: x86/urgent] x86/sev: Do not touch VMSA pages during SNP guest memory kdump
Posted by tip-bot2 for Ashish Kalra 8 months, 4 weeks ago
The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     d2062cc1b1c367d5d019f595ef860159e1301351
Gitweb:        https://git.kernel.org/tip/d2062cc1b1c367d5d019f595ef860159e1301351
Author:        Ashish Kalra <ashish.kalra@amd.com>
AuthorDate:    Mon, 28 Apr 2025 21:41:51 
Committer:     Borislav Petkov (AMD) <bp@alien8.de>
CommitterDate: Tue, 13 May 2025 19:40:44 +02:00

x86/sev: Do not touch VMSA pages during SNP guest memory kdump

When kdump is running makedumpfile to generate vmcore and dump SNP guest
memory it touches the VMSA page of the vCPU executing kdump.

It then results in unrecoverable #NPF/RMP faults as the VMSA page is
marked busy/in-use when the vCPU is running and subsequently a causes
guest softlockup/hang.

Additionally, other APs may be halted in guest mode and their VMSA pages
are marked busy and touching these VMSA pages during guest memory dump
will also cause #NPF.

Issue AP_DESTROY GHCB calls on other APs to ensure they are kicked out
of guest mode and then clear the VMSA bit on their VMSA pages.

If the vCPU running kdump is an AP, mark it's VMSA page as offline to
ensure that makedumpfile excludes that page while dumping guest memory.

Fixes: 3074152e56c9 ("x86/sev: Convert shared memory back to private on kexec")
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Srikanth Aithal <sraithal@amd.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20250428214151.155464-1-Ashish.Kalra@amd.com
---
 arch/x86/coco/sev/core.c | 244 ++++++++++++++++++++++++--------------
 1 file changed, 158 insertions(+), 86 deletions(-)

diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
index b0c1a7a..41060ba 100644
--- a/arch/x86/coco/sev/core.c
+++ b/arch/x86/coco/sev/core.c
@@ -959,6 +959,102 @@ void snp_accept_memory(phys_addr_t start, phys_addr_t end)
 	set_pages_state(vaddr, npages, SNP_PAGE_STATE_PRIVATE);
 }
 
+static int vmgexit_ap_control(u64 event, struct sev_es_save_area *vmsa, u32 apic_id)
+{
+	bool create = event != SVM_VMGEXIT_AP_DESTROY;
+	struct ghcb_state state;
+	unsigned long flags;
+	struct ghcb *ghcb;
+	int ret = 0;
+
+	local_irq_save(flags);
+
+	ghcb = __sev_get_ghcb(&state);
+
+	vc_ghcb_invalidate(ghcb);
+
+	if (create)
+		ghcb_set_rax(ghcb, vmsa->sev_features);
+
+	ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_AP_CREATION);
+	ghcb_set_sw_exit_info_1(ghcb,
+				((u64)apic_id << 32)	|
+				((u64)snp_vmpl << 16)	|
+				event);
+	ghcb_set_sw_exit_info_2(ghcb, __pa(vmsa));
+
+	sev_es_wr_ghcb_msr(__pa(ghcb));
+	VMGEXIT();
+
+	if (!ghcb_sw_exit_info_1_is_valid(ghcb) ||
+	    lower_32_bits(ghcb->save.sw_exit_info_1)) {
+		pr_err("SNP AP %s error\n", (create ? "CREATE" : "DESTROY"));
+		ret = -EINVAL;
+	}
+
+	__sev_put_ghcb(&state);
+
+	local_irq_restore(flags);
+
+	return ret;
+}
+
+static int snp_set_vmsa(void *va, void *caa, int apic_id, bool make_vmsa)
+{
+	int ret;
+
+	if (snp_vmpl) {
+		struct svsm_call call = {};
+		unsigned long flags;
+
+		local_irq_save(flags);
+
+		call.caa = this_cpu_read(svsm_caa);
+		call.rcx = __pa(va);
+
+		if (make_vmsa) {
+			/* Protocol 0, Call ID 2 */
+			call.rax = SVSM_CORE_CALL(SVSM_CORE_CREATE_VCPU);
+			call.rdx = __pa(caa);
+			call.r8  = apic_id;
+		} else {
+			/* Protocol 0, Call ID 3 */
+			call.rax = SVSM_CORE_CALL(SVSM_CORE_DELETE_VCPU);
+		}
+
+		ret = svsm_perform_call_protocol(&call);
+
+		local_irq_restore(flags);
+	} else {
+		/*
+		 * If the kernel runs at VMPL0, it can change the VMSA
+		 * bit for a page using the RMPADJUST instruction.
+		 * However, for the instruction to succeed it must
+		 * target the permissions of a lesser privileged (higher
+		 * numbered) VMPL level, so use VMPL1.
+		 */
+		u64 attrs = 1;
+
+		if (make_vmsa)
+			attrs |= RMPADJUST_VMSA_PAGE_BIT;
+
+		ret = rmpadjust((unsigned long)va, RMP_PG_SIZE_4K, attrs);
+	}
+
+	return ret;
+}
+
+static void snp_cleanup_vmsa(struct sev_es_save_area *vmsa, int apic_id)
+{
+	int err;
+
+	err = snp_set_vmsa(vmsa, NULL, apic_id, false);
+	if (err)
+		pr_err("clear VMSA page failed (%u), leaking page\n", err);
+	else
+		free_page((unsigned long)vmsa);
+}
+
 static void set_pte_enc(pte_t *kpte, int level, void *va)
 {
 	struct pte_enc_desc d = {
@@ -1055,6 +1151,65 @@ void snp_kexec_begin(void)
 		pr_warn("Failed to stop shared<->private conversions\n");
 }
 
+/*
+ * Shutdown all APs except the one handling kexec/kdump and clearing
+ * the VMSA tag on AP's VMSA pages as they are not being used as
+ * VMSA page anymore.
+ */
+static void shutdown_all_aps(void)
+{
+	struct sev_es_save_area *vmsa;
+	int apic_id, this_cpu, cpu;
+
+	this_cpu = get_cpu();
+
+	/*
+	 * APs are already in HLT loop when enc_kexec_finish() callback
+	 * is invoked.
+	 */
+	for_each_present_cpu(cpu) {
+		vmsa = per_cpu(sev_vmsa, cpu);
+
+		/*
+		 * The BSP or offlined APs do not have guest allocated VMSA
+		 * and there is no need  to clear the VMSA tag for this page.
+		 */
+		if (!vmsa)
+			continue;
+
+		/*
+		 * Cannot clear the VMSA tag for the currently running vCPU.
+		 */
+		if (this_cpu == cpu) {
+			unsigned long pa;
+			struct page *p;
+
+			pa = __pa(vmsa);
+			/*
+			 * Mark the VMSA page of the running vCPU as offline
+			 * so that is excluded and not touched by makedumpfile
+			 * while generating vmcore during kdump.
+			 */
+			p = pfn_to_online_page(pa >> PAGE_SHIFT);
+			if (p)
+				__SetPageOffline(p);
+			continue;
+		}
+
+		apic_id = cpuid_to_apicid[cpu];
+
+		/*
+		 * Issue AP destroy to ensure AP gets kicked out of guest mode
+		 * to allow using RMPADJUST to remove the VMSA tag on it's
+		 * VMSA page.
+		 */
+		vmgexit_ap_control(SVM_VMGEXIT_AP_DESTROY, vmsa, apic_id);
+		snp_cleanup_vmsa(vmsa, apic_id);
+	}
+
+	put_cpu();
+}
+
 void snp_kexec_finish(void)
 {
 	struct sev_es_runtime_data *data;
@@ -1069,6 +1224,8 @@ void snp_kexec_finish(void)
 	if (!IS_ENABLED(CONFIG_KEXEC_CORE))
 		return;
 
+	shutdown_all_aps();
+
 	unshare_all_memory();
 
 	/*
@@ -1090,51 +1247,6 @@ void snp_kexec_finish(void)
 	}
 }
 
-static int snp_set_vmsa(void *va, void *caa, int apic_id, bool make_vmsa)
-{
-	int ret;
-
-	if (snp_vmpl) {
-		struct svsm_call call = {};
-		unsigned long flags;
-
-		local_irq_save(flags);
-
-		call.caa = this_cpu_read(svsm_caa);
-		call.rcx = __pa(va);
-
-		if (make_vmsa) {
-			/* Protocol 0, Call ID 2 */
-			call.rax = SVSM_CORE_CALL(SVSM_CORE_CREATE_VCPU);
-			call.rdx = __pa(caa);
-			call.r8  = apic_id;
-		} else {
-			/* Protocol 0, Call ID 3 */
-			call.rax = SVSM_CORE_CALL(SVSM_CORE_DELETE_VCPU);
-		}
-
-		ret = svsm_perform_call_protocol(&call);
-
-		local_irq_restore(flags);
-	} else {
-		/*
-		 * If the kernel runs at VMPL0, it can change the VMSA
-		 * bit for a page using the RMPADJUST instruction.
-		 * However, for the instruction to succeed it must
-		 * target the permissions of a lesser privileged (higher
-		 * numbered) VMPL level, so use VMPL1.
-		 */
-		u64 attrs = 1;
-
-		if (make_vmsa)
-			attrs |= RMPADJUST_VMSA_PAGE_BIT;
-
-		ret = rmpadjust((unsigned long)va, RMP_PG_SIZE_4K, attrs);
-	}
-
-	return ret;
-}
-
 #define __ATTR_BASE		(SVM_SELECTOR_P_MASK | SVM_SELECTOR_S_MASK)
 #define INIT_CS_ATTRIBS		(__ATTR_BASE | SVM_SELECTOR_READ_MASK | SVM_SELECTOR_CODE_MASK)
 #define INIT_DS_ATTRIBS		(__ATTR_BASE | SVM_SELECTOR_WRITE_MASK)
@@ -1166,24 +1278,10 @@ static void *snp_alloc_vmsa_page(int cpu)
 	return page_address(p + 1);
 }
 
-static void snp_cleanup_vmsa(struct sev_es_save_area *vmsa, int apic_id)
-{
-	int err;
-
-	err = snp_set_vmsa(vmsa, NULL, apic_id, false);
-	if (err)
-		pr_err("clear VMSA page failed (%u), leaking page\n", err);
-	else
-		free_page((unsigned long)vmsa);
-}
-
 static int wakeup_cpu_via_vmgexit(u32 apic_id, unsigned long start_ip)
 {
 	struct sev_es_save_area *cur_vmsa, *vmsa;
-	struct ghcb_state state;
 	struct svsm_ca *caa;
-	unsigned long flags;
-	struct ghcb *ghcb;
 	u8 sipi_vector;
 	int cpu, ret;
 	u64 cr4;
@@ -1297,33 +1395,7 @@ static int wakeup_cpu_via_vmgexit(u32 apic_id, unsigned long start_ip)
 	}
 
 	/* Issue VMGEXIT AP Creation NAE event */
-	local_irq_save(flags);
-
-	ghcb = __sev_get_ghcb(&state);
-
-	vc_ghcb_invalidate(ghcb);
-	ghcb_set_rax(ghcb, vmsa->sev_features);
-	ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_AP_CREATION);
-	ghcb_set_sw_exit_info_1(ghcb,
-				((u64)apic_id << 32)	|
-				((u64)snp_vmpl << 16)	|
-				SVM_VMGEXIT_AP_CREATE);
-	ghcb_set_sw_exit_info_2(ghcb, __pa(vmsa));
-
-	sev_es_wr_ghcb_msr(__pa(ghcb));
-	VMGEXIT();
-
-	if (!ghcb_sw_exit_info_1_is_valid(ghcb) ||
-	    lower_32_bits(ghcb->save.sw_exit_info_1)) {
-		pr_err("SNP AP Creation error\n");
-		ret = -EINVAL;
-	}
-
-	__sev_put_ghcb(&state);
-
-	local_irq_restore(flags);
-
-	/* Perform cleanup if there was an error */
+	ret = vmgexit_ap_control(SVM_VMGEXIT_AP_CREATE, vmsa, apic_id);
 	if (ret) {
 		snp_cleanup_vmsa(vmsa, apic_id);
 		vmsa = NULL;
Re: [tip: x86/urgent] x86/sev: Do not touch VMSA pages during SNP guest memory kdump
Posted by Ingo Molnar 8 months, 4 weeks ago
* tip-bot2 for Ashish Kalra <tip-bot2@linutronix.de> wrote:

> The following commit has been merged into the x86/urgent branch of tip:
> 
> Commit-ID:     d2062cc1b1c367d5d019f595ef860159e1301351
> Gitweb:        https://git.kernel.org/tip/d2062cc1b1c367d5d019f595ef860159e1301351
> Author:        Ashish Kalra <ashish.kalra@amd.com>
> AuthorDate:    Mon, 28 Apr 2025 21:41:51 
> Committer:     Borislav Petkov (AMD) <bp@alien8.de>
> CommitterDate: Tue, 13 May 2025 19:40:44 +02:00
> 
> x86/sev: Do not touch VMSA pages during SNP guest memory kdump
> 
> When kdump is running makedumpfile to generate vmcore and dump SNP guest
> memory it touches the VMSA page of the vCPU executing kdump.
> 
> It then results in unrecoverable #NPF/RMP faults as the VMSA page is
> marked busy/in-use when the vCPU is running and subsequently a causes
> guest softlockup/hang.

s/subsequently a causes
 /subsequently causes

> Additionally, other APs may be halted in guest mode and their VMSA pages
> are marked busy and touching these VMSA pages during guest memory dump
> will also cause #NPF.
> 
> Issue AP_DESTROY GHCB calls on other APs to ensure they are kicked out
> of guest mode and then clear the VMSA bit on their VMSA pages.
> 
> If the vCPU running kdump is an AP, mark it's VMSA page as offline to
> ensure that makedumpfile excludes that page while dumping guest memory.

s/mark it's VMSA page
 /mark its VMSA page

> 
> Fixes: 3074152e56c9 ("x86/sev: Convert shared memory back to private on kexec")
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
> Tested-by: Srikanth Aithal <sraithal@amd.com>
> Cc: stable@vger.kernel.org
> Link: https://lore.kernel.org/20250428214151.155464-1-Ashish.Kalra@amd.com
> ---
>  arch/x86/coco/sev/core.c | 244 ++++++++++++++++++++++++--------------
>  1 file changed, 158 insertions(+), 86 deletions(-)
> 
> diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
> index b0c1a7a..41060ba 100644
> --- a/arch/x86/coco/sev/core.c
> +++ b/arch/x86/coco/sev/core.c
> @@ -959,6 +959,102 @@ void snp_accept_memory(phys_addr_t start, phys_addr_t end)
>  	set_pages_state(vaddr, npages, SNP_PAGE_STATE_PRIVATE);
>  }
>  
> +static int vmgexit_ap_control(u64 event, struct sev_es_save_area *vmsa, u32 apic_id)
> +{
> +	bool create = event != SVM_VMGEXIT_AP_DESTROY;
> +	struct ghcb_state state;
> +	unsigned long flags;
> +	struct ghcb *ghcb;
> +	int ret = 0;
> +
> +	local_irq_save(flags);
> +
> +	ghcb = __sev_get_ghcb(&state);
> +
> +	vc_ghcb_invalidate(ghcb);
> +
> +	if (create)
> +		ghcb_set_rax(ghcb, vmsa->sev_features);
> +
> +	ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_AP_CREATION);
> +	ghcb_set_sw_exit_info_1(ghcb,
> +				((u64)apic_id << 32)	|
> +				((u64)snp_vmpl << 16)	|
> +				event);
> +	ghcb_set_sw_exit_info_2(ghcb, __pa(vmsa));
> +
> +	sev_es_wr_ghcb_msr(__pa(ghcb));
> +	VMGEXIT();
> +
> +	if (!ghcb_sw_exit_info_1_is_valid(ghcb) ||
> +	    lower_32_bits(ghcb->save.sw_exit_info_1)) {
> +		pr_err("SNP AP %s error\n", (create ? "CREATE" : "DESTROY"));
> +		ret = -EINVAL;
> +	}
> +
> +	__sev_put_ghcb(&state);
> +
> +	local_irq_restore(flags);
> +
> +	return ret;
> +}
> +
> +static int snp_set_vmsa(void *va, void *caa, int apic_id, bool make_vmsa)
> +{
> +	int ret;
> +
> +	if (snp_vmpl) {
> +		struct svsm_call call = {};
> +		unsigned long flags;
> +
> +		local_irq_save(flags);
> +
> +		call.caa = this_cpu_read(svsm_caa);
> +		call.rcx = __pa(va);
> +
> +		if (make_vmsa) {
> +			/* Protocol 0, Call ID 2 */
> +			call.rax = SVSM_CORE_CALL(SVSM_CORE_CREATE_VCPU);
> +			call.rdx = __pa(caa);

This can probably use svsm_caa_pa instead of __pa(), like 
sev_es_init_vc_handling() does, see below.

> +			call.r8  = apic_id;
> +		} else {
> +			/* Protocol 0, Call ID 3 */
> +			call.rax = SVSM_CORE_CALL(SVSM_CORE_DELETE_VCPU);
> +		}
> +
> +		ret = svsm_perform_call_protocol(&call);
> +
> +		local_irq_restore(flags);
> +	} else {
> +		/*
> +		 * If the kernel runs at VMPL0, it can change the VMSA
> +		 * bit for a page using the RMPADJUST instruction.
> +		 * However, for the instruction to succeed it must
> +		 * target the permissions of a lesser privileged (higher
> +		 * numbered) VMPL level, so use VMPL1.
> +		 */
> +		u64 attrs = 1;
> +
> +		if (make_vmsa)
> +			attrs |= RMPADJUST_VMSA_PAGE_BIT;
> +
> +		ret = rmpadjust((unsigned long)va, RMP_PG_SIZE_4K, attrs);
> +	}
> +
> +	return ret;
> +}
> +
> +static void snp_cleanup_vmsa(struct sev_es_save_area *vmsa, int apic_id)
> +{
> +	int err;
> +
> +	err = snp_set_vmsa(vmsa, NULL, apic_id, false);
> +	if (err)
> +		pr_err("clear VMSA page failed (%u), leaking page\n", err);
> +	else
> +		free_page((unsigned long)vmsa);

So the argument types here are really messy:

 - We pass in a 'struct sev_es_save_area *vmsa' to snp_cleanup_vmsa(), 
   which passes it down to snp_set_vmsa() as a void *, where it's 
   force-type-cast to 'unsigned long' ...

 - While within snp_cleanup_vmsa() we also force-cast it to 'unsigned 
   long' yet again.

It would be much cleaner to do a single, obvious force-cast to a 
virtual address type within snp_cleanup_vmsa():

   unsigned long vmsa_va = (unsigned long)vmsa;

And change snp_set_vmsa()'s parameter to 'unsigned long vmsa_va', to 
get rid of a lot of forced/dangerous type conversions.

Plus the handling of 'caa' pointers it really messy AFAICS:

 - alloc_runtime_data() calculates svsm_caa_pa physical addresses for 
   each CPU:

                per_cpu(svsm_caa_pa, cpu) = __pa(caa);

   Which is used by sev_es_init_vc_handling():

                call.rcx = this_cpu_read(svsm_caa_pa);

   But snp_set_vmsa() calculates the physical address *again* instead 
   of using svsm_caa_pa:

                call.caa = this_cpu_read(svsm_caa);
                ...
                        call.rdx = __pa(caa);

   Same for snp_set_vmsa():

                call.caa = this_cpu_read(svsm_caa);
                call.rcx = __pa(va);

Why? Either this is something subtle and undocumented, or at minimum 
this unnecessarily complicates the code and creates inconsistent 
patterns of implementing the same functionality.

> +}
> +
>  static void set_pte_enc(pte_t *kpte, int level, void *va)
>  {
>  	struct pte_enc_desc d = {
> @@ -1055,6 +1151,65 @@ void snp_kexec_begin(void)
>  		pr_warn("Failed to stop shared<->private conversions\n");
>  }
>  
> +/*
> + * Shutdown all APs except the one handling kexec/kdump and clearing
> + * the VMSA tag on AP's VMSA pages as they are not being used as
> + * VMSA page anymore.

s/Shutdown
  Shut down

'shutdown' is a noun, the verb is 'to shut down'.

> + */
> +static void shutdown_all_aps(void)
> +{
> +	struct sev_es_save_area *vmsa;
> +	int apic_id, this_cpu, cpu;
> +
> +	this_cpu = get_cpu();
> +
> +	/*
> +	 * APs are already in HLT loop when enc_kexec_finish() callback
> +	 * is invoked.
> +	 */
> +	for_each_present_cpu(cpu) {
> +		vmsa = per_cpu(sev_vmsa, cpu);
> +
> +		/*
> +		 * The BSP or offlined APs do not have guest allocated VMSA
> +		 * and there is no need  to clear the VMSA tag for this page.

Whitespace noise:

   s/  / /

> +		 */
> +		if (!vmsa)
> +			continue;
> +
> +		/*
> +		 * Cannot clear the VMSA tag for the currently running vCPU.
> +		 */
> +		if (this_cpu == cpu) {
> +			unsigned long pa;
> +			struct page *p;
> +
> +			pa = __pa(vmsa);
> +			/*
> +			 * Mark the VMSA page of the running vCPU as offline
> +			 * so that is excluded and not touched by makedumpfile
> +			 * while generating vmcore during kdump.

s/so that is excluded
 /so that it is excluded

> +			 */
> +			p = pfn_to_online_page(pa >> PAGE_SHIFT);
> +			if (p)
> +				__SetPageOffline(p);
> +			continue;
> +		}
> +
> +		apic_id = cpuid_to_apicid[cpu];
> +
> +		/*
> +		 * Issue AP destroy to ensure AP gets kicked out of guest mode
> +		 * to allow using RMPADJUST to remove the VMSA tag on it's
> +		 * VMSA page.

s/on it's VMSA page
 /on its VMSA page

> +		 */
> +		vmgexit_ap_control(SVM_VMGEXIT_AP_DESTROY, vmsa, apic_id);
> +		snp_cleanup_vmsa(vmsa, apic_id);

Boris, please don't rush these SEV patches without proper review first! ;-)

Thanks,

	Ingo
Re: [tip: x86/urgent] x86/sev: Do not touch VMSA pages during SNP guest memory kdump
Posted by Borislav Petkov 8 months, 4 weeks ago
On Wed, May 14, 2025 at 09:20:58AM +0200, Ingo Molnar wrote:
> Boris, please don't rush these SEV patches without proper review first! ;-)

You didn't read the R-by and SOB tags at the beginning?

Feel free to propose fixes, Tom and I will review them and even test them for
you!

But ontop of those: those are fixes and the "issues" you've pointed out are to
existing code which this patch only moves.

I would usually say "Thx" here but not this time.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [tip: x86/urgent] x86/sev: Do not touch VMSA pages during SNP guest memory kdump
Posted by Ingo Molnar 8 months, 4 weeks ago
* Borislav Petkov <bp@alien8.de> wrote:

> On Wed, May 14, 2025 at 09:20:58AM +0200, Ingo Molnar wrote:
> > Boris, please don't rush these SEV patches without proper review first! ;-)
> 
> You didn't read the R-by and SOB tags at the beginning?

Reviewed-by tags and SOB tags don't necessarily imply a proper review, 
as my review feedback here amply demonstrates.

> Feel free to propose fixes, Tom and I will review them and even test 
> them for you!
> 
> But ontop of those: those are fixes and the "issues" you've pointed 
> out are to existing code which this patch only moves.

Firstly, while you may be inclined to ignore the half dozen typos in 
the changelog and the comments as inconsequential, do your scare-quotes 
around 'issues' imply that you don't accept the other issues my review 
identified, such as the messy type conversions and the inconsistent 
handling of svsm_caa_pa as valid? That would be sad.

Secondly, the fact that half of the patch is moving/refactoring code, 
while the other half is adding new code is no excuse to ignore review 
feedback for the code that gets moved/refactored - reviewers obviously 
need to read and understand the code that gets moved too. This is 
kernel maintenance 101.

And the new functionality introduced obviously expands on the bad 
practices & fragile code I outlined.

This is a basic requirement when implementing new functionality (and 
kdump never really worked on SEV-SNP I suppose, at least since August 
laste year, so it's basically new functionality), is to have a clean 
codebase it is extending, especially if the changes are so large:

   1 file changed, 158 insertions(+), 86 deletions(-)

All these problems accumulate and may result in fragility and bugs.

Third, this patch should have been split into two parts to begin with: 
the first one refactors the code into vmgexit_ap_control() and moves 
snp_set_vmsa() and snp_cleanup_vmsa() - and a second, smaller, easier 
to review patch that does the real changes. Right now the actual 
changes are hidden within the noise of code movement and refactoring.

> I would usually say "Thx" here but not this time.

Oh wow, you really don't take constructive criticism of patches very 
well. Review feedback isn't a personal attack against you. Please don't 
shoot the messenger.

Thanks,

	Ingo
Re: [tip: x86/urgent] x86/sev: Do not touch VMSA pages during SNP guest memory kdump
Posted by Borislav Petkov 8 months, 4 weeks ago
On Wed, May 14, 2025 at 11:15:41AM +0200, Ingo Molnar wrote:
> imply that you don't accept the other issues my review identified, such as
> the messy type conversions and the inconsistent handling of svsm_caa_pa as
> valid? That would be sad.

Another proof that you're not really reading my emails:

"Feel free to propose fixes, Tom and I will review them and even test them for
you!"

> Secondly, the fact that half of the patch is moving/refactoring code, 
> while the other half is adding new code is no excuse to ignore review 
> feedback for the code that gets moved/refactored - reviewers obviously 
> need to read and understand the code that gets moved too. This is 
> kernel maintenance 101.

See above.

> All these problems accumulate and may result in fragility and bugs.

LOL, this is very ironic coming from you: to talk about problems accumulating
from patches *you* applied without anyone else reviewing. Hillarious.

> Oh wow, you really don't take constructive criticism of patches very 
> well. Review feedback isn't a personal attack against you. Please don't 
> shoot the messenger.

Sorry, the time for constructive criticism with you is long over. You have
proved yourself over and over again that normal way of working with you just
doesn't fly.

I have told you here why it is ok to do this patch this way. You ignored it.

This patch was tested with everything we've got. No issues.

I suggested you propose changes to that code and we will review and test them.
You ignore that too.

Well, ignoring people goes both ways.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette