RE: [PATCH v2 0/7] iommu/arm-smmu-v3: Quarantine device upon ATC invalidation timeout

Tian, Kevin posted 7 patches 2 weeks, 5 days ago
Only 0 patches received!
There is a newer version of this series
RE: [PATCH v2 0/7] iommu/arm-smmu-v3: Quarantine device upon ATC invalidation timeout
Posted by Tian, Kevin 2 weeks, 5 days ago
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Wednesday, March 18, 2026 3:16 AM
> 
> Hi all,
> 
> This series addresses a critical vulnerability and stability issue where an
> unresponsive PCIe device failing to process ATC (Address Translation Cache)
> invalidation requests leads to silent data corruption and continuous SMMU
> CMDQ error spam.
> 

None of the patches in this series contains a Fixed tag and cc stable.
Re: [PATCH v2 0/7] iommu/arm-smmu-v3: Quarantine device upon ATC invalidation timeout
Posted by Nicolin Chen 2 weeks, 4 days ago
On Wed, Mar 18, 2026 at 07:47:18AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Wednesday, March 18, 2026 3:16 AM
> > 
> > Hi all,
> > 
> > This series addresses a critical vulnerability and stability issue where an
> > unresponsive PCIe device failing to process ATC (Address Translation Cache)
> > invalidation requests leads to silent data corruption and continuous SMMU
> > CMDQ error spam.
> > 
> 
> None of the patches in this series contains a Fixed tag and cc stable.

Hmm, I guess AI overly polished the cover letter so it sounds too
strong?

This is essentially a vulnerability (potential memory corruption).
And none of these patches actually fixes any regression. The PATCH
7 even requires the arm_smmu_invs series which has not been merged
yet :-/

Nicolin
RE: [PATCH v2 0/7] iommu/arm-smmu-v3: Quarantine device upon ATC invalidation timeout
Posted by Tian, Kevin 2 weeks, 4 days ago
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Thursday, March 19, 2026 4:05 AM
> 
> On Wed, Mar 18, 2026 at 07:47:18AM +0000, Tian, Kevin wrote:
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > Sent: Wednesday, March 18, 2026 3:16 AM
> > >
> > > Hi all,
> > >
> > > This series addresses a critical vulnerability and stability issue where an
> > > unresponsive PCIe device failing to process ATC (Address Translation
> Cache)
> > > invalidation requests leads to silent data corruption and continuous
> SMMU
> > > CMDQ error spam.
> > >
> >
> > None of the patches in this series contains a Fixed tag and cc stable.
> 
> Hmm, I guess AI overly polished the cover letter so it sounds too
> strong?
> 
> This is essentially a vulnerability (potential memory corruption).
> And none of these patches actually fixes any regression. The PATCH
> 7 even requires the arm_smmu_invs series which has not been merged
> yet :-/
> 

Fixes tag and backporting are not just for regression. People certainly
want to see reported vulnerabilities fixed in stable kernels...
Re: [PATCH v2 0/7] iommu/arm-smmu-v3: Quarantine device upon ATC invalidation timeout
Posted by Nicolin Chen 2 weeks, 4 days ago
On Thu, Mar 19, 2026 at 02:29:38AM +0000, Tian, Kevin wrote:
> > > > This series addresses a critical vulnerability and stability issue where an
> > > > unresponsive PCIe device failing to process ATC (Address Translation
> > Cache)
> > > > invalidation requests leads to silent data corruption and continuous
> > SMMU
> > > > CMDQ error spam.
> > > >
> > >
> > > None of the patches in this series contains a Fixed tag and cc stable.
> > 
> > Hmm, I guess AI overly polished the cover letter so it sounds too
> > strong?
> > 
> > This is essentially a vulnerability (potential memory corruption).
> > And none of these patches actually fixes any regression. The PATCH
> > 7 even requires the arm_smmu_invs series which has not been merged
> > yet :-/
> > 
> 
> Fixes tag and backporting are not just for regression. People certainly
> want to see reported vulnerabilities fixed in stable kernels...

Well, maybe I'll just leave additional line telling people that this
can't be a bug "fix" because it's written on another unmerged series?

Nicolin
Re: [PATCH v2 0/7] iommu/arm-smmu-v3: Quarantine device upon ATC invalidation timeout
Posted by Jason Gunthorpe 1 week, 6 days ago
On Wed, Mar 18, 2026 at 08:10:01PM -0700, Nicolin Chen wrote:
> On Thu, Mar 19, 2026 at 02:29:38AM +0000, Tian, Kevin wrote:
> > > > > This series addresses a critical vulnerability and stability issue where an
> > > > > unresponsive PCIe device failing to process ATC (Address Translation
> > > Cache)
> > > > > invalidation requests leads to silent data corruption and continuous
> > > SMMU
> > > > > CMDQ error spam.
> > > > >
> > > >
> > > > None of the patches in this series contains a Fixed tag and cc stable.
> > > 
> > > Hmm, I guess AI overly polished the cover letter so it sounds too
> > > strong?
> > > 
> > > This is essentially a vulnerability (potential memory corruption).
> > > And none of these patches actually fixes any regression. The PATCH
> > > 7 even requires the arm_smmu_invs series which has not been merged
> > > yet :-/
> > > 
> > 
> > Fixes tag and backporting are not just for regression. People certainly
> > want to see reported vulnerabilities fixed in stable kernels...
> 
> Well, maybe I'll just leave additional line telling people that this
> can't be a bug "fix" because it's written on another unmerged series?

I think this is more of a feature (RAS support for SMMUv3) than a
specific fix.

Jason
RE: [PATCH v2 0/7] iommu/arm-smmu-v3: Quarantine device upon ATC invalidation timeout
Posted by Tian, Kevin 1 week, 5 days ago
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, March 24, 2026 8:03 AM
> 
> On Wed, Mar 18, 2026 at 08:10:01PM -0700, Nicolin Chen wrote:
> > On Thu, Mar 19, 2026 at 02:29:38AM +0000, Tian, Kevin wrote:
> > > > > > This series addresses a critical vulnerability and stability issue where
> an
> > > > > > unresponsive PCIe device failing to process ATC (Address Translation
> > > > Cache)
> > > > > > invalidation requests leads to silent data corruption and continuous
> > > > SMMU
> > > > > > CMDQ error spam.
> > > > > >
> > > > >
> > > > > None of the patches in this series contains a Fixed tag and cc stable.
> > > >
> > > > Hmm, I guess AI overly polished the cover letter so it sounds too
> > > > strong?
> > > >
> > > > This is essentially a vulnerability (potential memory corruption).
> > > > And none of these patches actually fixes any regression. The PATCH
> > > > 7 even requires the arm_smmu_invs series which has not been merged
> > > > yet :-/
> > > >
> > >
> > > Fixes tag and backporting are not just for regression. People certainly
> > > want to see reported vulnerabilities fixed in stable kernels...
> >
> > Well, maybe I'll just leave additional line telling people that this
> > can't be a bug "fix" because it's written on another unmerged series?
> 
> I think this is more of a feature (RAS support for SMMUv3) than a
> specific fix.
> 

Not a RAS guy, but below is what I got from AI:

"
RAS improvements typically involve better error reporting, graceful
degradation, or improved recovery - but they usually don't involve
scenarios where the system continues operating with compromised
security assumptions."
Re: [PATCH v2 0/7] iommu/arm-smmu-v3: Quarantine device upon ATC invalidation timeout
Posted by Jason Gunthorpe 1 week, 5 days ago
On Wed, Mar 25, 2026 at 06:55:40AM +0000, Tian, Kevin wrote:
> > I think this is more of a feature (RAS support for SMMUv3) than a
> > specific fix.
> > 
> 
> Not a RAS guy, but below is what I got from AI:
> 
> "
> RAS improvements typically involve better error reporting, graceful
> degradation, or improved recovery - but they usually don't involve
> scenarios where the system continues operating with compromised
> security assumptions."

Right, so currently there is no RAS in smmuv3, if it hits this error
it continues with "compromised security assumptions". Adding RAS
support is to avoid this.

Jason
Re: [PATCH v2 0/7] iommu/arm-smmu-v3: Quarantine device upon ATC invalidation timeout
Posted by Nicolin Chen 1 week, 6 days ago
On Mon, Mar 23, 2026 at 09:03:21PM -0300, Jason Gunthorpe wrote:
> On Wed, Mar 18, 2026 at 08:10:01PM -0700, Nicolin Chen wrote:
> > On Thu, Mar 19, 2026 at 02:29:38AM +0000, Tian, Kevin wrote:
> > > > > > This series addresses a critical vulnerability and stability issue where an
> > > > > > unresponsive PCIe device failing to process ATC (Address Translation
> > > > Cache)
> > > > > > invalidation requests leads to silent data corruption and continuous
> > > > SMMU
> > > > > > CMDQ error spam.
> > > > > >
> > > > >
> > > > > None of the patches in this series contains a Fixed tag and cc stable.
> > > > 
> > > > Hmm, I guess AI overly polished the cover letter so it sounds too
> > > > strong?
> > > > 
> > > > This is essentially a vulnerability (potential memory corruption).
> > > > And none of these patches actually fixes any regression. The PATCH
> > > > 7 even requires the arm_smmu_invs series which has not been merged
> > > > yet :-/
> > > > 
> > > 
> > > Fixes tag and backporting are not just for regression. People certainly
> > > want to see reported vulnerabilities fixed in stable kernels...
> > 
> > Well, maybe I'll just leave additional line telling people that this
> > can't be a bug "fix" because it's written on another unmerged series?
> 
> I think this is more of a feature (RAS support for SMMUv3) than a
> specific fix.

Adding that to the cover-letter. Thanks for the input.

Nicolin