[PATCH v3 4/4] iommu/amd: Fix host kdump support for SNP

Ashish Kalra posted 4 patches 2 months, 3 weeks ago
[PATCH v3 4/4] iommu/amd: Fix host kdump support for SNP
Posted by Ashish Kalra 2 months, 3 weeks ago
From: Ashish Kalra <ashish.kalra@amd.com>

When a crash is triggered the kernel attempts to shut down SEV-SNP
using the SNP_SHUTDOWN_EX command. If active SNP VMs are present,
SNP_SHUTDOWN_EX fails as firmware checks all encryption-capable ASIDs
to ensure none are in use and that a DF_FLUSH is not required. If a
DF_FLUSH is required, the firmware returns DFFLUSH_REQUIRED, causing
SNP_SHUTDOWN_EX to fail.

This casues the kdump kernel to boot with IOMMU SNP enforcement still
enabled and IOMMU completion wait buffers (CWBs), command buffers,
device tables and event buffer registers remain locked and exclusive
to the previous kernel. Attempts to allocate and use new buffers in
the kdump kernel fail, as the hardware ignores writes to the locked
MMIO registers (per AMD IOMMU spec Section 2.12.2.1).

As a result, the kdump kernel cannot initialize the IOMMU or enable IRQ
remapping which is required for proper operation.

This results in repeated "Completion-Wait loop timed out" errors and a
second kernel panic: "Kernel panic - not syncing: timer doesn't work
through Interrupt-remapped IO-APIC"

The following MMIO registers are locked and ignore writes after failed
SNP shutdown:
Device Table Base Address Register
Command Buffer Base Address Register
Event Buffer Base Address Register
Completion Store Base Register/Exclusion Base Register
Completion Store Limit Register/Exclusion Range Limit Register

Instead of allocating new buffers, re-use the previous kernel’s pages
for completion wait buffers, command buffers, event buffers and device
tables and operate with the already enabled SNP configuration and
existing data structures.

This approach is now used for kdump boot regardless of whether SNP is
enabled during kdump.

The fix enables successful crashkernel/kdump operation on SNP hosts
even when SNP_SHUTDOWN_EX fails.

Fixes: c3b86e61b756 ("x86/cpufeatures: Enable/unmask SEV-SNP CPU feature")
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 drivers/iommu/amd/init.c | 28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 18bd869a82d9..3f24fd775d6e 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -818,11 +818,16 @@ static void iommu_enable_command_buffer(struct amd_iommu *iommu)
 
 	BUG_ON(iommu->cmd_buf == NULL);
 
-	entry = iommu_virt_to_phys(iommu->cmd_buf);
-	entry |= MMIO_CMD_SIZE_512;
-
-	memcpy_toio(iommu->mmio_base + MMIO_CMD_BUF_OFFSET,
-		    &entry, sizeof(entry));
+	if (!is_kdump_kernel()) {
+		/*
+		 * Command buffer is re-used for kdump kernel and setting
+		 * of MMIO register is not required.
+		 */
+		entry = iommu_virt_to_phys(iommu->cmd_buf);
+		entry |= MMIO_CMD_SIZE_512;
+		memcpy_toio(iommu->mmio_base + MMIO_CMD_BUF_OFFSET,
+			    &entry, sizeof(entry));
+	}
 
 	amd_iommu_reset_cmd_buffer(iommu);
 }
@@ -873,10 +878,15 @@ static void iommu_enable_event_buffer(struct amd_iommu *iommu)
 
 	BUG_ON(iommu->evt_buf == NULL);
 
-	entry = iommu_virt_to_phys(iommu->evt_buf) | EVT_LEN_MASK;
-
-	memcpy_toio(iommu->mmio_base + MMIO_EVT_BUF_OFFSET,
-		    &entry, sizeof(entry));
+	if (!is_kdump_kernel()) {
+		/*
+		 * Event buffer is re-used for kdump kernel and setting
+		 * of MMIO register is not required.
+		 */
+		entry = iommu_virt_to_phys(iommu->evt_buf) | EVT_LEN_MASK;
+		memcpy_toio(iommu->mmio_base + MMIO_EVT_BUF_OFFSET,
+			    &entry, sizeof(entry));
+	}
 
 	/* set head and tail to zero manually */
 	writel(0x00, iommu->mmio_base + MMIO_EVT_HEAD_OFFSET);
-- 
2.34.1

Re: [PATCH v3 4/4] iommu/amd: Fix host kdump support for SNP
Posted by Vasant Hegde 2 months, 3 weeks ago

On 7/16/2025 12:57 AM, Ashish Kalra wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> When a crash is triggered the kernel attempts to shut down SEV-SNP
> using the SNP_SHUTDOWN_EX command. If active SNP VMs are present,
> SNP_SHUTDOWN_EX fails as firmware checks all encryption-capable ASIDs
> to ensure none are in use and that a DF_FLUSH is not required. If a
> DF_FLUSH is required, the firmware returns DFFLUSH_REQUIRED, causing
> SNP_SHUTDOWN_EX to fail.
> 
> This casues the kdump kernel to boot with IOMMU SNP enforcement still
> enabled and IOMMU completion wait buffers (CWBs), command buffers,
> device tables and event buffer registers remain locked and exclusive
> to the previous kernel. Attempts to allocate and use new buffers in
> the kdump kernel fail, as the hardware ignores writes to the locked
> MMIO registers (per AMD IOMMU spec Section 2.12.2.1).
> 
> As a result, the kdump kernel cannot initialize the IOMMU or enable IRQ
> remapping which is required for proper operation.
> 
> This results in repeated "Completion-Wait loop timed out" errors and a
> second kernel panic: "Kernel panic - not syncing: timer doesn't work
> through Interrupt-remapped IO-APIC"
> 
> The following MMIO registers are locked and ignore writes after failed
> SNP shutdown:
> Device Table Base Address Register
> Command Buffer Base Address Register
> Event Buffer Base Address Register
> Completion Store Base Register/Exclusion Base Register
> Completion Store Limit Register/Exclusion Range Limit Register
> 

May be you can rephrase the description as first patch covered some of these
details.

> Instead of allocating new buffers, re-use the previous kernel’s pages
> for completion wait buffers, command buffers, event buffers and device
> tables and operate with the already enabled SNP configuration and
> existing data structures.
> 
> This approach is now used for kdump boot regardless of whether SNP is
> enabled during kdump.
> 
> The fix enables successful crashkernel/kdump operation on SNP hosts
> even when SNP_SHUTDOWN_EX fails.
> 
> Fixes: c3b86e61b756 ("x86/cpufeatures: Enable/unmask SEV-SNP CPU feature")

I am not sure why you have marked only this patch as Fixes? Also it won't fix
the kdump if someone just backports only this patch right?

-Vasant


Re: [PATCH v3 4/4] iommu/amd: Fix host kdump support for SNP
Posted by Kalra, Ashish 2 months, 3 weeks ago
Hello Vasant,

On 7/16/2025 4:46 AM, Vasant Hegde wrote:
> 
> 
> On 7/16/2025 12:57 AM, Ashish Kalra wrote:
>> From: Ashish Kalra <ashish.kalra@amd.com>
>>
>> When a crash is triggered the kernel attempts to shut down SEV-SNP
>> using the SNP_SHUTDOWN_EX command. If active SNP VMs are present,
>> SNP_SHUTDOWN_EX fails as firmware checks all encryption-capable ASIDs
>> to ensure none are in use and that a DF_FLUSH is not required. If a
>> DF_FLUSH is required, the firmware returns DFFLUSH_REQUIRED, causing
>> SNP_SHUTDOWN_EX to fail.
>>
>> This casues the kdump kernel to boot with IOMMU SNP enforcement still
>> enabled and IOMMU completion wait buffers (CWBs), command buffers,
>> device tables and event buffer registers remain locked and exclusive
>> to the previous kernel. Attempts to allocate and use new buffers in
>> the kdump kernel fail, as the hardware ignores writes to the locked
>> MMIO registers (per AMD IOMMU spec Section 2.12.2.1).
>>
>> As a result, the kdump kernel cannot initialize the IOMMU or enable IRQ
>> remapping which is required for proper operation.
>>
>> This results in repeated "Completion-Wait loop timed out" errors and a
>> second kernel panic: "Kernel panic - not syncing: timer doesn't work
>> through Interrupt-remapped IO-APIC"
>>
>> The following MMIO registers are locked and ignore writes after failed
>> SNP shutdown:
>> Device Table Base Address Register
>> Command Buffer Base Address Register
>> Event Buffer Base Address Register
>> Completion Store Base Register/Exclusion Base Register
>> Completion Store Limit Register/Exclusion Range Limit Register
>>
> 
> May be you can rephrase the description as first patch covered some of these
> details

We do need to include the complete description here as this is the final
patch of the series which fixes the kdump boot.

Do note, that the description in the first patch only mentions the 
IOMMU buffers - command, CWB and event buffers for reuse and this commit
log covers all reusing and remapping required - IOMMU buffers, device table,
etc.
 
>> Instead of allocating new buffers, re-use the previous kernel’s pages
>> for completion wait buffers, command buffers, event buffers and device
>> tables and operate with the already enabled SNP configuration and
>> existing data structures.
>>
>> This approach is now used for kdump boot regardless of whether SNP is
>> enabled during kdump.
>>
>> The fix enables successful crashkernel/kdump operation on SNP hosts
>> even when SNP_SHUTDOWN_EX fails.
>>
>> Fixes: c3b86e61b756 ("x86/cpufeatures: Enable/unmask SEV-SNP CPU feature")
> 
> I am not sure why you have marked only this patch as Fixes? Also it won't fix
> the kdump if someone just backports only this patch right?
> 

As mentioned in the cover letter, this is the final patch of the series which 
actually fixes the SNP kdump boot, so i kept Fixes: tag as part of this patch.

I am not sure if i can add Fixes: tag to all the four patches in this series ?

Thanks,
Ashish
Re: [PATCH v3 4/4] iommu/amd: Fix host kdump support for SNP
Posted by Vasant Hegde 2 months, 3 weeks ago

On 7/17/2025 3:42 AM, Kalra, Ashish wrote:
> Hello Vasant,
> 
> On 7/16/2025 4:46 AM, Vasant Hegde wrote:
>>
>>
>> On 7/16/2025 12:57 AM, Ashish Kalra wrote:
>>> From: Ashish Kalra <ashish.kalra@amd.com>
>>>
>>> When a crash is triggered the kernel attempts to shut down SEV-SNP
>>> using the SNP_SHUTDOWN_EX command. If active SNP VMs are present,
>>> SNP_SHUTDOWN_EX fails as firmware checks all encryption-capable ASIDs
>>> to ensure none are in use and that a DF_FLUSH is not required. If a
>>> DF_FLUSH is required, the firmware returns DFFLUSH_REQUIRED, causing
>>> SNP_SHUTDOWN_EX to fail.
>>>
>>> This casues the kdump kernel to boot with IOMMU SNP enforcement still
>>> enabled and IOMMU completion wait buffers (CWBs), command buffers,
>>> device tables and event buffer registers remain locked and exclusive
>>> to the previous kernel. Attempts to allocate and use new buffers in
>>> the kdump kernel fail, as the hardware ignores writes to the locked
>>> MMIO registers (per AMD IOMMU spec Section 2.12.2.1).
>>>
>>> As a result, the kdump kernel cannot initialize the IOMMU or enable IRQ
>>> remapping which is required for proper operation.
>>>
>>> This results in repeated "Completion-Wait loop timed out" errors and a
>>> second kernel panic: "Kernel panic - not syncing: timer doesn't work
>>> through Interrupt-remapped IO-APIC"
>>>
>>> The following MMIO registers are locked and ignore writes after failed
>>> SNP shutdown:
>>> Device Table Base Address Register
>>> Command Buffer Base Address Register
>>> Event Buffer Base Address Register
>>> Completion Store Base Register/Exclusion Base Register
>>> Completion Store Limit Register/Exclusion Range Limit Register
>>>
>>
>> May be you can rephrase the description as first patch covered some of these
>> details
> 
> We do need to include the complete description here as this is the final
> patch of the series which fixes the kdump boot.
> 
> Do note, that the description in the first patch only mentions the 
> IOMMU buffers - command, CWB and event buffers for reuse and this commit
> log covers all reusing and remapping required - IOMMU buffers, device table,
> etc.
>  
>>> Instead of allocating new buffers, re-use the previous kernel’s pages
>>> for completion wait buffers, command buffers, event buffers and device
>>> tables and operate with the already enabled SNP configuration and
>>> existing data structures.
>>>
>>> This approach is now used for kdump boot regardless of whether SNP is
>>> enabled during kdump.
>>>
>>> The fix enables successful crashkernel/kdump operation on SNP hosts
>>> even when SNP_SHUTDOWN_EX fails.
>>>
>>> Fixes: c3b86e61b756 ("x86/cpufeatures: Enable/unmask SEV-SNP CPU feature")
>>
>> I am not sure why you have marked only this patch as Fixes? Also it won't fix
>> the kdump if someone just backports only this patch right?
>>
> 
> As mentioned in the cover letter, this is the final patch of the series which 
> actually fixes the SNP kdump boot, so i kept Fixes: tag as part of this patch.
> > I am not sure if i can add Fixes: tag to all the four patches in this series ?

But just adding Fixes to this one patch is adding more confusion and
complicating backport process.

Is this really a fix? Did kdump ever worked on SNP enabled system? If yes then
add Fixes to all patches. If not call it as an enhancement.


-Vasant


Re: [PATCH v3 4/4] iommu/amd: Fix host kdump support for SNP
Posted by Kalra, Ashish 2 months, 3 weeks ago
Hello Vasant,

On 7/17/2025 1:22 AM, Vasant Hegde wrote:
> 
> 
> On 7/17/2025 3:42 AM, Kalra, Ashish wrote:
>> Hello Vasant,
>>
>> On 7/16/2025 4:46 AM, Vasant Hegde wrote:
>>>
>>>
>>> On 7/16/2025 12:57 AM, Ashish Kalra wrote:
>>>> From: Ashish Kalra <ashish.kalra@amd.com>
>>>>
>>>> When a crash is triggered the kernel attempts to shut down SEV-SNP
>>>> using the SNP_SHUTDOWN_EX command. If active SNP VMs are present,
>>>> SNP_SHUTDOWN_EX fails as firmware checks all encryption-capable ASIDs
>>>> to ensure none are in use and that a DF_FLUSH is not required. If a
>>>> DF_FLUSH is required, the firmware returns DFFLUSH_REQUIRED, causing
>>>> SNP_SHUTDOWN_EX to fail.
>>>>
>>>> This casues the kdump kernel to boot with IOMMU SNP enforcement still
>>>> enabled and IOMMU completion wait buffers (CWBs), command buffers,
>>>> device tables and event buffer registers remain locked and exclusive
>>>> to the previous kernel. Attempts to allocate and use new buffers in
>>>> the kdump kernel fail, as the hardware ignores writes to the locked
>>>> MMIO registers (per AMD IOMMU spec Section 2.12.2.1).
>>>>
>>>> As a result, the kdump kernel cannot initialize the IOMMU or enable IRQ
>>>> remapping which is required for proper operation.
>>>>
>>>> This results in repeated "Completion-Wait loop timed out" errors and a
>>>> second kernel panic: "Kernel panic - not syncing: timer doesn't work
>>>> through Interrupt-remapped IO-APIC"
>>>>
>>>> The following MMIO registers are locked and ignore writes after failed
>>>> SNP shutdown:
>>>> Device Table Base Address Register
>>>> Command Buffer Base Address Register
>>>> Event Buffer Base Address Register
>>>> Completion Store Base Register/Exclusion Base Register
>>>> Completion Store Limit Register/Exclusion Range Limit Register
>>>>
>>>
>>> May be you can rephrase the description as first patch covered some of these
>>> details
>>
>> We do need to include the complete description here as this is the final
>> patch of the series which fixes the kdump boot.
>>
>> Do note, that the description in the first patch only mentions the 
>> IOMMU buffers - command, CWB and event buffers for reuse and this commit
>> log covers all reusing and remapping required - IOMMU buffers, device table,
>> etc.
>>  
>>>> Instead of allocating new buffers, re-use the previous kernel’s pages
>>>> for completion wait buffers, command buffers, event buffers and device
>>>> tables and operate with the already enabled SNP configuration and
>>>> existing data structures.
>>>>
>>>> This approach is now used for kdump boot regardless of whether SNP is
>>>> enabled during kdump.
>>>>
>>>> The fix enables successful crashkernel/kdump operation on SNP hosts
>>>> even when SNP_SHUTDOWN_EX fails.
>>>>
>>>> Fixes: c3b86e61b756 ("x86/cpufeatures: Enable/unmask SEV-SNP CPU feature")
>>>
>>> I am not sure why you have marked only this patch as Fixes? Also it won't fix
>>> the kdump if someone just backports only this patch right?
>>>
>>
>> As mentioned in the cover letter, this is the final patch of the series which 
>> actually fixes the SNP kdump boot, so i kept Fixes: tag as part of this patch.
>>> I am not sure if i can add Fixes: tag to all the four patches in this series ?
> 
> But just adding Fixes to this one patch is adding more confusion and
> complicating backport process.
> 
> Is this really a fix? Did kdump ever worked on SNP enabled system? If yes then
> add Fixes to all patches. If not call it as an enhancement.
> 

Well, kdump only worked on SNP enabled systems if there are no active SNP VMs.

But i think it makes more sense to remove the Fixes: tag from these patch-series
as this SNP kdump support is more or less a feature enhancement for SNP.

Thanks,
Ashish