drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 + drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 225 ++++++++++++++++++-- 2 files changed, 207 insertions(+), 19 deletions(-)
When transitioning to a kdump kernel, the primary kernel might have crashed
while endpoint devices were actively bus-mastering DMA. Currently, the SMMU
driver aggressively resets the hardware during probe by clearing CR0_SMMUEN
and setting the Global Bypass Attribute (GBPA) to ABORT.
In a kdump scenario, this aggressive reset is highly destructive:
a) If GBPA is set to ABORT, in-flight DMA will be aborted, generating fatal
PCIe AER or SErrors that may panic the kdump kernel
b) If GBPA is set to BYPASS, in-flight DMA targeting some IOVAs will bypass
the SMMU and corrupt the physical memory at those 1:1 mapped IOVAs.
To safely absorb in-flight DMA, the kdump kernel must leave SMMUEN=1 intact
and avoid modifying STRTAB_BASE. This allows HW to continue translating in-
flight DMA using the crashed kernel's page tables until the endpoint device
drivers probe and quiesce their respective hardware.
However, the ARM SMMUv3 architecture specification states that updating the
SMMU_STRTAB_BASE register while SMMUEN == 1 is UNPREDICTABLE or ignored.
This leaves a kdump kernel no choice but to adopt the stream table from the
crashed kernel.
In this series:
- Introduce an ARM_SMMU_OPT_KDUMP
- Skip SMMUEN and STRTAB_BASE resets in arm_smmu_device_reset()
- Map the crashed kernel's stream tables into the kdump kernel [*]
- Defer any default domain attachment to retain STEs until device drivers
explicitly request it.
[*] This is implemented via memremap, which only works on a coherent SMMU.
Note that the entire series requires Jason's work that was merged in v6.12:
85196f54743d ("iommu/arm-smmu-v3: Reorganize struct arm_smmu_strtab_cfg").
I have a backported version that is verified with a v6.8 kernel. I can send
if we see a strong need after this version is accepted.
This is on Github:
https://github.com/nicolinc/iommufd/commits/smmuv3_kdump-v2
Changelog
v2
* Add warning in non-coherent SMMU cases
* Keep eventq/priq disabled v.s. enabling-and-disabling-later
* Check KDUMP option in the beginning of arm_smmu_device_reset()
* Validate STRTAB format matches HW capability instead of forcing flags
v1:
https://lore.kernel.org/all/cover.1775763475.git.nicolinc@nvidia.com/
Nicolin Chen (5):
iommu/arm-smmu-v3: Add arm_smmu_adopt_strtab() for kdump
iommu/arm-smmu-v3: Implement is_attach_deferred() for kdump
iommu/arm-smmu-v3: Retain CR0_SMMUEN during kdump device reset
iommu/arm-smmu-v3: Skip EVTQ/PRIQ setup in kdump kernel
iommu/arm-smmu-v3: Detect ARM_SMMU_OPT_KDUMP in
arm_smmu_device_hw_probe()
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 +
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 225 ++++++++++++++++++--
2 files changed, 207 insertions(+), 19 deletions(-)
--
2.43.0
On 15/04/2026 10:17 pm, Nicolin Chen wrote:
> When transitioning to a kdump kernel, the primary kernel might have crashed
> while endpoint devices were actively bus-mastering DMA. Currently, the SMMU
> driver aggressively resets the hardware during probe by clearing CR0_SMMUEN
> and setting the Global Bypass Attribute (GBPA) to ABORT.
>
> In a kdump scenario, this aggressive reset is highly destructive:
> a) If GBPA is set to ABORT, in-flight DMA will be aborted, generating fatal
> PCIe AER or SErrors that may panic the kdump kernel
> b) If GBPA is set to BYPASS, in-flight DMA targeting some IOVAs will bypass
> the SMMU and corrupt the physical memory at those 1:1 mapped IOVAs.
But wasn't that rather the point? Th kdump kernel doesn't know the scope
of how much could have gone wrong (including potentially the SMMU
configuration itself), so it just blocks everything, resets and
reenables the devices it cares about, and ignores whatever else might be
on fire.
If AER can panic a kdump kernel, that seems like a failing of the kdump
kernel itself more than anything else (especially given the likelihood
that additional AER events could follow from whatever initial
crash/failure triggered kdump to begin with). And frankly if some device
getting a translation fault could directly SError the whole system, then
I'd say that system is pretty doomed in general, kdump or not.
Thanks,
Robin.
> To safely absorb in-flight DMA, the kdump kernel must leave SMMUEN=1 intact
> and avoid modifying STRTAB_BASE. This allows HW to continue translating in-
> flight DMA using the crashed kernel's page tables until the endpoint device
> drivers probe and quiesce their respective hardware.
>
> However, the ARM SMMUv3 architecture specification states that updating the
> SMMU_STRTAB_BASE register while SMMUEN == 1 is UNPREDICTABLE or ignored.
>
> This leaves a kdump kernel no choice but to adopt the stream table from the
> crashed kernel.
>
> In this series:
> - Introduce an ARM_SMMU_OPT_KDUMP
> - Skip SMMUEN and STRTAB_BASE resets in arm_smmu_device_reset()
> - Map the crashed kernel's stream tables into the kdump kernel [*]
> - Defer any default domain attachment to retain STEs until device drivers
> explicitly request it.
>
> [*] This is implemented via memremap, which only works on a coherent SMMU.
>
> Note that the entire series requires Jason's work that was merged in v6.12:
> 85196f54743d ("iommu/arm-smmu-v3: Reorganize struct arm_smmu_strtab_cfg").
> I have a backported version that is verified with a v6.8 kernel. I can send
> if we see a strong need after this version is accepted.
>
> This is on Github:
> https://github.com/nicolinc/iommufd/commits/smmuv3_kdump-v2
>
> Changelog
> v2
> * Add warning in non-coherent SMMU cases
> * Keep eventq/priq disabled v.s. enabling-and-disabling-later
> * Check KDUMP option in the beginning of arm_smmu_device_reset()
> * Validate STRTAB format matches HW capability instead of forcing flags
> v1:
> https://lore.kernel.org/all/cover.1775763475.git.nicolinc@nvidia.com/
>
> Nicolin Chen (5):
> iommu/arm-smmu-v3: Add arm_smmu_adopt_strtab() for kdump
> iommu/arm-smmu-v3: Implement is_attach_deferred() for kdump
> iommu/arm-smmu-v3: Retain CR0_SMMUEN during kdump device reset
> iommu/arm-smmu-v3: Skip EVTQ/PRIQ setup in kdump kernel
> iommu/arm-smmu-v3: Detect ARM_SMMU_OPT_KDUMP in
> arm_smmu_device_hw_probe()
>
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 +
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 225 ++++++++++++++++++--
> 2 files changed, 207 insertions(+), 19 deletions(-)
>
On Thu, Apr 16, 2026 at 05:49:24PM +0100, Robin Murphy wrote: > On 15/04/2026 10:17 pm, Nicolin Chen wrote: > > When transitioning to a kdump kernel, the primary kernel might have crashed > > while endpoint devices were actively bus-mastering DMA. Currently, the SMMU > > driver aggressively resets the hardware during probe by clearing CR0_SMMUEN > > and setting the Global Bypass Attribute (GBPA) to ABORT. > > > > In a kdump scenario, this aggressive reset is highly destructive: > > a) If GBPA is set to ABORT, in-flight DMA will be aborted, generating fatal > > PCIe AER or SErrors that may panic the kdump kernel > > b) If GBPA is set to BYPASS, in-flight DMA targeting some IOVAs will bypass > > the SMMU and corrupt the physical memory at those 1:1 mapped IOVAs. > > But wasn't that rather the point? Th kdump kernel doesn't know the scope of > how much could have gone wrong (including potentially the SMMU configuration > itself), so it just blocks everything, resets and reenables the devices it > cares about, and ignores whatever else might be on fire. The purpose of kdump is to have the maximum chance to capture a dump from the blown up kernel. Yes, on a perfect platform aborting the entire SMMU should improve the chance of getting that dump. But sadly there are so many busted up platforms where if you start messing with the IOMMU they will explode and blow up the kdump. x86 and "firmware first" error handling systems are particularly notorious for nasty behavior like this. Seems like there are now ARM systems too. :( So, the iommu drivers have been preserving the IOMMU and not disrupting the DMAs on x86 for a long time. This is established kdump practice. > If AER can panic a kdump kernel, that seems like a failing of the kdump > kernel itself more than anything else (especially given the likelihood that > additional AER events could follow from whatever initial crash/failure > triggered kdump to begin with). Probably the kdump wasn't triggered by AER. You want kdump to not trigger more RAS events that might blow up the kdump while it is trying to run.. That increases the chance of success > And frankly if some device getting a > translation fault could directly SError the whole system, then I'd say that > system is pretty doomed in general, kdump or not. Aborting the SMMU while ATS is enabled also fails all ATS and translated requests which is a catastrophic event for a CXL type device that a correct OS should never trigger. The catastrophic explosion of the CXL device also unplugs all it's RAM from the system and the kdump kernel just cannot handle the resulting cascade of RAS failures. Plus you loose all that CXL RAM you may have wanted to dump.. Regardless, the platform has this flaw and to make kdump work it has to avoid triggering these errors like x86 does. Jason
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Friday, April 17, 2026 1:20 AM > > On Thu, Apr 16, 2026 at 05:49:24PM +0100, Robin Murphy wrote: > > On 15/04/2026 10:17 pm, Nicolin Chen wrote: > > > When transitioning to a kdump kernel, the primary kernel might have > crashed > > > while endpoint devices were actively bus-mastering DMA. Currently, the > SMMU > > > driver aggressively resets the hardware during probe by clearing > CR0_SMMUEN > > > and setting the Global Bypass Attribute (GBPA) to ABORT. > > > > > > In a kdump scenario, this aggressive reset is highly destructive: > > > a) If GBPA is set to ABORT, in-flight DMA will be aborted, generating fatal > > > PCIe AER or SErrors that may panic the kdump kernel > > > b) If GBPA is set to BYPASS, in-flight DMA targeting some IOVAs will > bypass > > > the SMMU and corrupt the physical memory at those 1:1 mapped > IOVAs. > > > > But wasn't that rather the point? Th kdump kernel doesn't know the scope > of > > how much could have gone wrong (including potentially the SMMU > configuration > > itself), so it just blocks everything, resets and reenables the devices it > > cares about, and ignores whatever else might be on fire. > > The purpose of kdump is to have the maximum chance to capture a dump > from the blown up kernel. > > Yes, on a perfect platform aborting the entire SMMU should improve the > chance of getting that dump. > > But sadly there are so many busted up platforms where if you start > messing with the IOMMU they will explode and blow up the kdump. x86 > and "firmware first" error handling systems are particularly notorious > for nasty behavior like this. > > Seems like there are now ARM systems too. :( is there any report on such systems? It might be informational to include a link to the report so it's clear that this series fixes real issues instead of a preparation for coming systems... > > So, the iommu drivers have been preserving the IOMMU and not > disrupting the DMAs on x86 for a long time. This is established kdump > practice. > > > If AER can panic a kdump kernel, that seems like a failing of the kdump > > kernel itself more than anything else (especially given the likelihood that > > additional AER events could follow from whatever initial crash/failure > > triggered kdump to begin with). > > Probably the kdump wasn't triggered by AER. You want kdump to not > trigger more RAS events that might blow up the kdump while it is > trying to run.. That increases the chance of success > btw the DMA is allowed after the previous kernel is hung til the point where smmu driver blocks it. In cases where in-fly DMAs are considered dangerous to kdump, this series just make it worse instead of creating a new issue. While for majority other failures not related to DMAs, unblocking then increases the chance of success...
On Fri, Apr 17, 2026 at 07:48:46AM +0000, Tian, Kevin wrote: > is there any report on such systems? It might be informational to include > a link to the report so it's clear that this series fixes real issues instead of > a preparation for coming systems... Yeah, we have an internal report and this was confirmed to fix it. > btw the DMA is allowed after the previous kernel is hung til the point > where smmu driver blocks it. In cases where in-fly DMAs are considered > dangerous to kdump, this series just make it worse instead of creating > a new issue. While for majority other failures not related to DMAs, > unblocking then increases the chance of success... Right, exactly. If DMA's are splattering over the kdump carve out memory its is probably dead no matter what. Jason
On Wed, Apr 15, 2026 at 02:17:35PM -0700, Nicolin Chen wrote: > This is on Github: > https://github.com/nicolinc/iommufd/commits/smmuv3_kdump-v2 > > Changelog > v2 > * Add warning in non-coherent SMMU cases > * Keep eventq/priq disabled v.s. enabling-and-disabling-later > * Check KDUMP option in the beginning of arm_smmu_device_reset() > * Validate STRTAB format matches HW capability instead of forcing flags https://sashiko.dev/#/patchset/cover.1776286352.git.nicolinc%40nvidia.com Sashiko posted a few comments, mostly valid. I am fixing them with a v3. Thanks Nicolin
© 2016 - 2026 Red Hat, Inc.