From: Ashish Kalra <ashish.kalra@amd.com>
When a crash is triggered the kernel attempts to shut down SEV-SNP
using the SNP_SHUTDOWN_EX command. If active SNP VMs are present,
SNP_SHUTDOWN_EX fails as firmware checks all encryption-capable ASIDs
to ensure none are in use and that a DF_FLUSH is not required. If a
DF_FLUSH is required, the firmware returns DFFLUSH_REQUIRED, causing
SNP_SHUTDOWN_EX to fail.
This casues the kdump kernel to boot with IOMMU SNP enforcement still
enabled and IOMMU completion wait buffers (CWBs), command buffers,
device tables and event buffer registers remain locked and exclusive
to the previous kernel. Attempts to allocate and use new buffers in
the kdump kernel fail, as the hardware ignores writes to the locked
MMIO registers (per AMD IOMMU spec Section 2.12.2.1).
As a result, the kdump kernel cannot initialize the IOMMU or enable IRQ
remapping which is required for proper operation.
This results in repeated "Completion-Wait loop timed out" errors and a
second kernel panic: "Kernel panic - not syncing: timer doesn't work
through Interrupt-remapped IO-APIC"
The following MMIO registers are locked and ignore writes after failed
SNP shutdown:
Device Table Base Address Register
Command Buffer Base Address Register
Event Buffer Base Address Register
Completion Store Base Register/Exclusion Base Register
Completion Store Limit Register/Exclusion Range Limit Register
Instead of allocating new buffers, re-use the previous kernel’s pages
for completion wait buffers, command buffers, event buffers and device
tables and operate with the already enabled SNP configuration and
existing data structures.
This approach is now used for kdump boot regardless of whether SNP is
enabled during kdump.
The fix enables successful crashkernel/kdump operation on SNP hosts
even when SNP_SHUTDOWN_EX fails.
Fixes: c3b86e61b756 ("x86/cpufeatures: Enable/unmask SEV-SNP CPU feature")
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
drivers/iommu/amd/init.c | 28 +++++++++++++++++++---------
1 file changed, 19 insertions(+), 9 deletions(-)
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 18bd869a82d9..3f24fd775d6e 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -818,11 +818,16 @@ static void iommu_enable_command_buffer(struct amd_iommu *iommu)
BUG_ON(iommu->cmd_buf == NULL);
- entry = iommu_virt_to_phys(iommu->cmd_buf);
- entry |= MMIO_CMD_SIZE_512;
-
- memcpy_toio(iommu->mmio_base + MMIO_CMD_BUF_OFFSET,
- &entry, sizeof(entry));
+ if (!is_kdump_kernel()) {
+ /*
+ * Command buffer is re-used for kdump kernel and setting
+ * of MMIO register is not required.
+ */
+ entry = iommu_virt_to_phys(iommu->cmd_buf);
+ entry |= MMIO_CMD_SIZE_512;
+ memcpy_toio(iommu->mmio_base + MMIO_CMD_BUF_OFFSET,
+ &entry, sizeof(entry));
+ }
amd_iommu_reset_cmd_buffer(iommu);
}
@@ -873,10 +878,15 @@ static void iommu_enable_event_buffer(struct amd_iommu *iommu)
BUG_ON(iommu->evt_buf == NULL);
- entry = iommu_virt_to_phys(iommu->evt_buf) | EVT_LEN_MASK;
-
- memcpy_toio(iommu->mmio_base + MMIO_EVT_BUF_OFFSET,
- &entry, sizeof(entry));
+ if (!is_kdump_kernel()) {
+ /*
+ * Event buffer is re-used for kdump kernel and setting
+ * of MMIO register is not required.
+ */
+ entry = iommu_virt_to_phys(iommu->evt_buf) | EVT_LEN_MASK;
+ memcpy_toio(iommu->mmio_base + MMIO_EVT_BUF_OFFSET,
+ &entry, sizeof(entry));
+ }
/* set head and tail to zero manually */
writel(0x00, iommu->mmio_base + MMIO_EVT_HEAD_OFFSET);
--
2.34.1
On 7/16/2025 12:57 AM, Ashish Kalra wrote: > From: Ashish Kalra <ashish.kalra@amd.com> > > When a crash is triggered the kernel attempts to shut down SEV-SNP > using the SNP_SHUTDOWN_EX command. If active SNP VMs are present, > SNP_SHUTDOWN_EX fails as firmware checks all encryption-capable ASIDs > to ensure none are in use and that a DF_FLUSH is not required. If a > DF_FLUSH is required, the firmware returns DFFLUSH_REQUIRED, causing > SNP_SHUTDOWN_EX to fail. > > This casues the kdump kernel to boot with IOMMU SNP enforcement still > enabled and IOMMU completion wait buffers (CWBs), command buffers, > device tables and event buffer registers remain locked and exclusive > to the previous kernel. Attempts to allocate and use new buffers in > the kdump kernel fail, as the hardware ignores writes to the locked > MMIO registers (per AMD IOMMU spec Section 2.12.2.1). > > As a result, the kdump kernel cannot initialize the IOMMU or enable IRQ > remapping which is required for proper operation. > > This results in repeated "Completion-Wait loop timed out" errors and a > second kernel panic: "Kernel panic - not syncing: timer doesn't work > through Interrupt-remapped IO-APIC" > > The following MMIO registers are locked and ignore writes after failed > SNP shutdown: > Device Table Base Address Register > Command Buffer Base Address Register > Event Buffer Base Address Register > Completion Store Base Register/Exclusion Base Register > Completion Store Limit Register/Exclusion Range Limit Register > May be you can rephrase the description as first patch covered some of these details. > Instead of allocating new buffers, re-use the previous kernel’s pages > for completion wait buffers, command buffers, event buffers and device > tables and operate with the already enabled SNP configuration and > existing data structures. > > This approach is now used for kdump boot regardless of whether SNP is > enabled during kdump. > > The fix enables successful crashkernel/kdump operation on SNP hosts > even when SNP_SHUTDOWN_EX fails. > > Fixes: c3b86e61b756 ("x86/cpufeatures: Enable/unmask SEV-SNP CPU feature") I am not sure why you have marked only this patch as Fixes? Also it won't fix the kdump if someone just backports only this patch right? -Vasant
Hello Vasant, On 7/16/2025 4:46 AM, Vasant Hegde wrote: > > > On 7/16/2025 12:57 AM, Ashish Kalra wrote: >> From: Ashish Kalra <ashish.kalra@amd.com> >> >> When a crash is triggered the kernel attempts to shut down SEV-SNP >> using the SNP_SHUTDOWN_EX command. If active SNP VMs are present, >> SNP_SHUTDOWN_EX fails as firmware checks all encryption-capable ASIDs >> to ensure none are in use and that a DF_FLUSH is not required. If a >> DF_FLUSH is required, the firmware returns DFFLUSH_REQUIRED, causing >> SNP_SHUTDOWN_EX to fail. >> >> This casues the kdump kernel to boot with IOMMU SNP enforcement still >> enabled and IOMMU completion wait buffers (CWBs), command buffers, >> device tables and event buffer registers remain locked and exclusive >> to the previous kernel. Attempts to allocate and use new buffers in >> the kdump kernel fail, as the hardware ignores writes to the locked >> MMIO registers (per AMD IOMMU spec Section 2.12.2.1). >> >> As a result, the kdump kernel cannot initialize the IOMMU or enable IRQ >> remapping which is required for proper operation. >> >> This results in repeated "Completion-Wait loop timed out" errors and a >> second kernel panic: "Kernel panic - not syncing: timer doesn't work >> through Interrupt-remapped IO-APIC" >> >> The following MMIO registers are locked and ignore writes after failed >> SNP shutdown: >> Device Table Base Address Register >> Command Buffer Base Address Register >> Event Buffer Base Address Register >> Completion Store Base Register/Exclusion Base Register >> Completion Store Limit Register/Exclusion Range Limit Register >> > > May be you can rephrase the description as first patch covered some of these > details We do need to include the complete description here as this is the final patch of the series which fixes the kdump boot. Do note, that the description in the first patch only mentions the IOMMU buffers - command, CWB and event buffers for reuse and this commit log covers all reusing and remapping required - IOMMU buffers, device table, etc. >> Instead of allocating new buffers, re-use the previous kernel’s pages >> for completion wait buffers, command buffers, event buffers and device >> tables and operate with the already enabled SNP configuration and >> existing data structures. >> >> This approach is now used for kdump boot regardless of whether SNP is >> enabled during kdump. >> >> The fix enables successful crashkernel/kdump operation on SNP hosts >> even when SNP_SHUTDOWN_EX fails. >> >> Fixes: c3b86e61b756 ("x86/cpufeatures: Enable/unmask SEV-SNP CPU feature") > > I am not sure why you have marked only this patch as Fixes? Also it won't fix > the kdump if someone just backports only this patch right? > As mentioned in the cover letter, this is the final patch of the series which actually fixes the SNP kdump boot, so i kept Fixes: tag as part of this patch. I am not sure if i can add Fixes: tag to all the four patches in this series ? Thanks, Ashish
On 7/17/2025 3:42 AM, Kalra, Ashish wrote: > Hello Vasant, > > On 7/16/2025 4:46 AM, Vasant Hegde wrote: >> >> >> On 7/16/2025 12:57 AM, Ashish Kalra wrote: >>> From: Ashish Kalra <ashish.kalra@amd.com> >>> >>> When a crash is triggered the kernel attempts to shut down SEV-SNP >>> using the SNP_SHUTDOWN_EX command. If active SNP VMs are present, >>> SNP_SHUTDOWN_EX fails as firmware checks all encryption-capable ASIDs >>> to ensure none are in use and that a DF_FLUSH is not required. If a >>> DF_FLUSH is required, the firmware returns DFFLUSH_REQUIRED, causing >>> SNP_SHUTDOWN_EX to fail. >>> >>> This casues the kdump kernel to boot with IOMMU SNP enforcement still >>> enabled and IOMMU completion wait buffers (CWBs), command buffers, >>> device tables and event buffer registers remain locked and exclusive >>> to the previous kernel. Attempts to allocate and use new buffers in >>> the kdump kernel fail, as the hardware ignores writes to the locked >>> MMIO registers (per AMD IOMMU spec Section 2.12.2.1). >>> >>> As a result, the kdump kernel cannot initialize the IOMMU or enable IRQ >>> remapping which is required for proper operation. >>> >>> This results in repeated "Completion-Wait loop timed out" errors and a >>> second kernel panic: "Kernel panic - not syncing: timer doesn't work >>> through Interrupt-remapped IO-APIC" >>> >>> The following MMIO registers are locked and ignore writes after failed >>> SNP shutdown: >>> Device Table Base Address Register >>> Command Buffer Base Address Register >>> Event Buffer Base Address Register >>> Completion Store Base Register/Exclusion Base Register >>> Completion Store Limit Register/Exclusion Range Limit Register >>> >> >> May be you can rephrase the description as first patch covered some of these >> details > > We do need to include the complete description here as this is the final > patch of the series which fixes the kdump boot. > > Do note, that the description in the first patch only mentions the > IOMMU buffers - command, CWB and event buffers for reuse and this commit > log covers all reusing and remapping required - IOMMU buffers, device table, > etc. > >>> Instead of allocating new buffers, re-use the previous kernel’s pages >>> for completion wait buffers, command buffers, event buffers and device >>> tables and operate with the already enabled SNP configuration and >>> existing data structures. >>> >>> This approach is now used for kdump boot regardless of whether SNP is >>> enabled during kdump. >>> >>> The fix enables successful crashkernel/kdump operation on SNP hosts >>> even when SNP_SHUTDOWN_EX fails. >>> >>> Fixes: c3b86e61b756 ("x86/cpufeatures: Enable/unmask SEV-SNP CPU feature") >> >> I am not sure why you have marked only this patch as Fixes? Also it won't fix >> the kdump if someone just backports only this patch right? >> > > As mentioned in the cover letter, this is the final patch of the series which > actually fixes the SNP kdump boot, so i kept Fixes: tag as part of this patch. > > I am not sure if i can add Fixes: tag to all the four patches in this series ? But just adding Fixes to this one patch is adding more confusion and complicating backport process. Is this really a fix? Did kdump ever worked on SNP enabled system? If yes then add Fixes to all patches. If not call it as an enhancement. -Vasant
Hello Vasant, On 7/17/2025 1:22 AM, Vasant Hegde wrote: > > > On 7/17/2025 3:42 AM, Kalra, Ashish wrote: >> Hello Vasant, >> >> On 7/16/2025 4:46 AM, Vasant Hegde wrote: >>> >>> >>> On 7/16/2025 12:57 AM, Ashish Kalra wrote: >>>> From: Ashish Kalra <ashish.kalra@amd.com> >>>> >>>> When a crash is triggered the kernel attempts to shut down SEV-SNP >>>> using the SNP_SHUTDOWN_EX command. If active SNP VMs are present, >>>> SNP_SHUTDOWN_EX fails as firmware checks all encryption-capable ASIDs >>>> to ensure none are in use and that a DF_FLUSH is not required. If a >>>> DF_FLUSH is required, the firmware returns DFFLUSH_REQUIRED, causing >>>> SNP_SHUTDOWN_EX to fail. >>>> >>>> This casues the kdump kernel to boot with IOMMU SNP enforcement still >>>> enabled and IOMMU completion wait buffers (CWBs), command buffers, >>>> device tables and event buffer registers remain locked and exclusive >>>> to the previous kernel. Attempts to allocate and use new buffers in >>>> the kdump kernel fail, as the hardware ignores writes to the locked >>>> MMIO registers (per AMD IOMMU spec Section 2.12.2.1). >>>> >>>> As a result, the kdump kernel cannot initialize the IOMMU or enable IRQ >>>> remapping which is required for proper operation. >>>> >>>> This results in repeated "Completion-Wait loop timed out" errors and a >>>> second kernel panic: "Kernel panic - not syncing: timer doesn't work >>>> through Interrupt-remapped IO-APIC" >>>> >>>> The following MMIO registers are locked and ignore writes after failed >>>> SNP shutdown: >>>> Device Table Base Address Register >>>> Command Buffer Base Address Register >>>> Event Buffer Base Address Register >>>> Completion Store Base Register/Exclusion Base Register >>>> Completion Store Limit Register/Exclusion Range Limit Register >>>> >>> >>> May be you can rephrase the description as first patch covered some of these >>> details >> >> We do need to include the complete description here as this is the final >> patch of the series which fixes the kdump boot. >> >> Do note, that the description in the first patch only mentions the >> IOMMU buffers - command, CWB and event buffers for reuse and this commit >> log covers all reusing and remapping required - IOMMU buffers, device table, >> etc. >> >>>> Instead of allocating new buffers, re-use the previous kernel’s pages >>>> for completion wait buffers, command buffers, event buffers and device >>>> tables and operate with the already enabled SNP configuration and >>>> existing data structures. >>>> >>>> This approach is now used for kdump boot regardless of whether SNP is >>>> enabled during kdump. >>>> >>>> The fix enables successful crashkernel/kdump operation on SNP hosts >>>> even when SNP_SHUTDOWN_EX fails. >>>> >>>> Fixes: c3b86e61b756 ("x86/cpufeatures: Enable/unmask SEV-SNP CPU feature") >>> >>> I am not sure why you have marked only this patch as Fixes? Also it won't fix >>> the kdump if someone just backports only this patch right? >>> >> >> As mentioned in the cover letter, this is the final patch of the series which >> actually fixes the SNP kdump boot, so i kept Fixes: tag as part of this patch. >>> I am not sure if i can add Fixes: tag to all the four patches in this series ? > > But just adding Fixes to this one patch is adding more confusion and > complicating backport process. > > Is this really a fix? Did kdump ever worked on SNP enabled system? If yes then > add Fixes to all patches. If not call it as an enhancement. > Well, kdump only worked on SNP enabled systems if there are no active SNP VMs. But i think it makes more sense to remove the Fixes: tag from these patch-series as this SNP kdump support is more or less a feature enhancement for SNP. Thanks, Ashish
© 2016 - 2025 Red Hat, Inc.