drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 27 +++++++++++++++++---------- 1 file changed, 17 insertions(+), 10 deletions(-)
Some platforms (e.g. SC8280XP and X1E) support more than 128 stream
matching groups. This is more than what is defined as maximum by the ARM
SMMU architecture specification. Commit 122611347326 ("iommu/arm-smmu-qcom:
Limit the SMR groups to 128") disabled use of the additional groups because
they don't exhibit the same behavior as the architecture supported ones.
It seems like this is just another quirk of the hypervisor: When running
bare-metal without the hypervisor, the additional groups appear to behave
just like all others. The boot firmware uses some of the additional groups,
so ignoring them in this situation leads to stream match conflicts whenever
we allocate a new SMR group for the same SID.
The workaround exists primarily because the bypass quirk detection fails
when using a S2CR register from the additional matching groups, so let's
perform the test with the last reliable S2CR (127) and then limit the
number of SMR groups only if we detect that we are running below the
hypervisor (because of the bypass quirk).
Fixes: 122611347326 ("iommu/arm-smmu-qcom: Limit the SMR groups to 128")
Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org>
---
I modified arm_smmu_find_sme() to prefer allocating from the SMR groups
above 128 (until they are all used). I did not see any issues, so I don't
see any indication that they behave any different from the others.
---
drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 27 +++++++++++++++++----------
1 file changed, 17 insertions(+), 10 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 57c097e87613084ffdfbe685d4406a236d3b4b74..c939d0856b719cd2a5501c1206c594dfd115b1c5 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -431,17 +431,19 @@ static int qcom_smmu_cfg_probe(struct arm_smmu_device *smmu)
/*
* Some platforms support more than the Arm SMMU architected maximum of
- * 128 stream matching groups. For unknown reasons, the additional
- * groups don't exhibit the same behavior as the architected registers,
- * so limit the groups to 128 until the behavior is fixed for the other
- * groups.
+ * 128 stream matching groups. The additional registers appear to have
+ * the same behavior as the architected registers in the hardware.
+ * However, on some firmware versions, the hypervisor does not
+ * correctly trap and emulate accesses to the additional registers,
+ * resulting in unexpected behavior.
+ *
+ * If there are more than 128 groups, use the last reliable group to
+ * detect if we need to apply the bypass quirk.
*/
- if (smmu->num_mapping_groups > 128) {
- dev_notice(smmu->dev, "\tLimiting the stream matching groups to 128\n");
- smmu->num_mapping_groups = 128;
- }
-
- last_s2cr = ARM_SMMU_GR0_S2CR(smmu->num_mapping_groups - 1);
+ if (smmu->num_mapping_groups > 128)
+ last_s2cr = ARM_SMMU_GR0_S2CR(127);
+ else
+ last_s2cr = ARM_SMMU_GR0_S2CR(smmu->num_mapping_groups - 1);
/*
* With some firmware versions writes to S2CR of type FAULT are
@@ -464,6 +466,11 @@ static int qcom_smmu_cfg_probe(struct arm_smmu_device *smmu)
reg = FIELD_PREP(ARM_SMMU_CBAR_TYPE, CBAR_TYPE_S1_TRANS_S2_BYPASS);
arm_smmu_gr1_write(smmu, ARM_SMMU_GR1_CBAR(qsmmu->bypass_cbndx), reg);
+
+ if (smmu->num_mapping_groups > 128) {
+ dev_notice(smmu->dev, "\tLimiting the stream matching groups to 128\n");
+ smmu->num_mapping_groups = 128;
+ }
}
for (i = 0; i < smmu->num_mapping_groups; i++) {
---
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
change-id: 20250815-arm-smmu-qcom-all-smr-1fc81c10840f
Best regards,
--
Stephan Gerhold <stephan.gerhold@linaro.org>
On Thu, Aug 21, 2025 at 10:33:53AM +0200, Stephan Gerhold wrote: > Some platforms (e.g. SC8280XP and X1E) support more than 128 stream > matching groups. This is more than what is defined as maximum by the ARM > SMMU architecture specification. Commit 122611347326 ("iommu/arm-smmu-qcom: > Limit the SMR groups to 128") disabled use of the additional groups because > they don't exhibit the same behavior as the architecture supported ones. > > It seems like this is just another quirk of the hypervisor: When running > bare-metal without the hypervisor, the additional groups appear to behave > just like all others. The boot firmware uses some of the additional groups, > so ignoring them in this situation leads to stream match conflicts whenever > we allocate a new SMR group for the same SID. > > The workaround exists primarily because the bypass quirk detection fails > when using a S2CR register from the additional matching groups, so let's > perform the test with the last reliable S2CR (127) and then limit the > number of SMR groups only if we detect that we are running below the > hypervisor (because of the bypass quirk). > > Fixes: 122611347326 ("iommu/arm-smmu-qcom: Limit the SMR groups to 128") > Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org> > --- > I modified arm_smmu_find_sme() to prefer allocating from the SMR groups > above 128 (until they are all used). I did not see any issues, so I don't > see any indication that they behave any different from the others. > --- > drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 27 +++++++++++++++++---------- > 1 file changed, 17 insertions(+), 10 deletions(-) Is the existing workaround causing you problems somehow? Limiting the SMR groups to what the architecture allows still seems like the best bet to me unless there's a compelling reason to do something else. Will
On Tue, Sep 09, 2025 at 01:57:11PM +0100, Will Deacon wrote: > On Thu, Aug 21, 2025 at 10:33:53AM +0200, Stephan Gerhold wrote: > > Some platforms (e.g. SC8280XP and X1E) support more than 128 stream > > matching groups. This is more than what is defined as maximum by the ARM > > SMMU architecture specification. Commit 122611347326 ("iommu/arm-smmu-qcom: > > Limit the SMR groups to 128") disabled use of the additional groups because > > they don't exhibit the same behavior as the architecture supported ones. > > > > It seems like this is just another quirk of the hypervisor: When running > > bare-metal without the hypervisor, the additional groups appear to behave > > just like all others. The boot firmware uses some of the additional groups, > > so ignoring them in this situation leads to stream match conflicts whenever > > we allocate a new SMR group for the same SID. > > > > The workaround exists primarily because the bypass quirk detection fails > > when using a S2CR register from the additional matching groups, so let's > > perform the test with the last reliable S2CR (127) and then limit the > > number of SMR groups only if we detect that we are running below the > > hypervisor (because of the bypass quirk). > > > > Fixes: 122611347326 ("iommu/arm-smmu-qcom: Limit the SMR groups to 128") > > Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org> > > --- > > I modified arm_smmu_find_sme() to prefer allocating from the SMR groups > > above 128 (until they are all used). I did not see any issues, so I don't > > see any indication that they behave any different from the others. > > --- > > drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 27 +++++++++++++++++---------- > > 1 file changed, 17 insertions(+), 10 deletions(-) > > Is the existing workaround causing you problems somehow? Limiting the SMR > groups to what the architecture allows still seems like the best bet to > me unless there's a compelling reason to do something else. > Yes, the problem is the following (copied from commit message above): > The boot firmware uses some of the additional groups, so ignoring them > in this situation leads to stream match conflicts whenever we allocate > a new SMR group for the same SID. This happens e.g. in the following situation on SC8280XP when enabling video decoding acceleration bare-metal without the hypervisor: 1. The SMMU is already set up by the boot firmware before Linux is started, so some SMRs are already in use during boot. I added some code to dump them: arm-smmu 15000000.iommu: Found SMR0 <0xe0 0x0> ... arm-smmu 15000000.iommu: Found SMR8 <0x800 0x0> <unused> arm-smmu 15000000.iommu: Found SMR170 <0x2a22 0x400> arm-smmu 15000000.iommu: Found SMR171 <0x2a02 0x400> ... arm-smmu 15000000.iommu: Found SMR211 <0x400 0x3> 2. We limit the SMRs to 128, so all the ones >= 170 just stay as-is. Only the ones < 128 are considered when allocating SMRs. 3. We need to configure the following IOMMU for video acceleration: video-firmware { iommus = <&apps_smmu 0x2a02 0x400>; }; 4. arm-smmu 15000000.iommu: Picked SMR 14 for SID 0x2a02 mask 0x400 ... but SMR170 already uses that SID+mask! 5. arm-smmu 15000000.iommu: Unexpected global fault, this could be serious arm-smmu 15000000.iommu: GFSR 0x80000004, GFSYNR0 0x0000000c, GFSYNR1 0x00002a02, GFSYNR2 0x00000000 SMCF, bit[2] is set -> Stream match conflict fault caused by SID GFSYNR1 0x00002a02 With my patch this does not happen anymore. As I wrote, so far I have seen no indication that the extra groups behave any different from the standard ones defined by the architecture. I don't know why it was done this way (rather than e.g. implementing the Extended Stream Matching Extension), but we definitely need to do something with the extra SMRs to avoid stream match conflicts. Thanks, Stephan
On 2025-09-09 4:35 pm, Stephan Gerhold wrote: > On Tue, Sep 09, 2025 at 01:57:11PM +0100, Will Deacon wrote: >> On Thu, Aug 21, 2025 at 10:33:53AM +0200, Stephan Gerhold wrote: >>> Some platforms (e.g. SC8280XP and X1E) support more than 128 stream >>> matching groups. This is more than what is defined as maximum by the ARM >>> SMMU architecture specification. Commit 122611347326 ("iommu/arm-smmu-qcom: >>> Limit the SMR groups to 128") disabled use of the additional groups because >>> they don't exhibit the same behavior as the architecture supported ones. >>> >>> It seems like this is just another quirk of the hypervisor: When running >>> bare-metal without the hypervisor, the additional groups appear to behave >>> just like all others. The boot firmware uses some of the additional groups, >>> so ignoring them in this situation leads to stream match conflicts whenever >>> we allocate a new SMR group for the same SID. >>> >>> The workaround exists primarily because the bypass quirk detection fails >>> when using a S2CR register from the additional matching groups, so let's >>> perform the test with the last reliable S2CR (127) and then limit the >>> number of SMR groups only if we detect that we are running below the >>> hypervisor (because of the bypass quirk). >>> >>> Fixes: 122611347326 ("iommu/arm-smmu-qcom: Limit the SMR groups to 128") >>> Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org> >>> --- >>> I modified arm_smmu_find_sme() to prefer allocating from the SMR groups >>> above 128 (until they are all used). I did not see any issues, so I don't >>> see any indication that they behave any different from the others. >>> --- >>> drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 27 +++++++++++++++++---------- >>> 1 file changed, 17 insertions(+), 10 deletions(-) >> >> Is the existing workaround causing you problems somehow? Limiting the SMR >> groups to what the architecture allows still seems like the best bet to >> me unless there's a compelling reason to do something else. >> > > Yes, the problem is the following (copied from commit message above): > >> The boot firmware uses some of the additional groups, so ignoring them >> in this situation leads to stream match conflicts whenever we allocate >> a new SMR group for the same SID. > > This happens e.g. in the following situation on SC8280XP when enabling > video decoding acceleration bare-metal without the hypervisor: > > 1. The SMMU is already set up by the boot firmware before Linux is > started, so some SMRs are already in use during boot. I added some > code to dump them: > > arm-smmu 15000000.iommu: Found SMR0 <0xe0 0x0> > ... > arm-smmu 15000000.iommu: Found SMR8 <0x800 0x0> > <unused> > arm-smmu 15000000.iommu: Found SMR170 <0x2a22 0x400> > arm-smmu 15000000.iommu: Found SMR171 <0x2a02 0x400> > ... > arm-smmu 15000000.iommu: Found SMR211 <0x400 0x3> > > 2. We limit the SMRs to 128, so all the ones >= 170 just stay as-is. > Only the ones < 128 are considered when allocating SMRs. > > 3. We need to configure the following IOMMU for video acceleration: > > video-firmware { > iommus = <&apps_smmu 0x2a02 0x400>; > }; > > 4. arm-smmu 15000000.iommu: Picked SMR 14 for SID 0x2a02 mask 0x400 > ... but SMR170 already uses that SID+mask! > > 5. arm-smmu 15000000.iommu: Unexpected global fault, this could be serious > arm-smmu 15000000.iommu: GFSR 0x80000004, GFSYNR0 0x0000000c, GFSYNR1 0x00002a02, GFSYNR2 0x00000000 > > SMCF, bit[2] is set -> Stream match conflict fault > caused by SID GFSYNR1 0x00002a02 > > With my patch this does not happen anymore. As I wrote, so far I have > seen no indication that the extra groups behave any different from the > standard ones defined by the architecture. I don't know why it was done > this way (rather than e.g. implementing the Extended Stream Matching > Extension), but we definitely need to do something with the extra SMRs > to avoid stream match conflicts. I'm also a little wary of exposing more non-architectural stuff to the main driver - could we not keep the existing logic and simply add an extra loop at the end here to ensure any "extra" SMRs are disabled? Thanks, Robin.
On Wed, Sep 17, 2025 at 07:02:52PM +0100, Robin Murphy wrote: > On 2025-09-09 4:35 pm, Stephan Gerhold wrote: > > On Tue, Sep 09, 2025 at 01:57:11PM +0100, Will Deacon wrote: > > > On Thu, Aug 21, 2025 at 10:33:53AM +0200, Stephan Gerhold wrote: > > > > Some platforms (e.g. SC8280XP and X1E) support more than 128 stream > > > > matching groups. This is more than what is defined as maximum by the ARM > > > > SMMU architecture specification. Commit 122611347326 ("iommu/arm-smmu-qcom: > > > > Limit the SMR groups to 128") disabled use of the additional groups because > > > > they don't exhibit the same behavior as the architecture supported ones. > > > > > > > > It seems like this is just another quirk of the hypervisor: When running > > > > bare-metal without the hypervisor, the additional groups appear to behave > > > > just like all others. The boot firmware uses some of the additional groups, > > > > so ignoring them in this situation leads to stream match conflicts whenever > > > > we allocate a new SMR group for the same SID. > > > > > > > > The workaround exists primarily because the bypass quirk detection fails > > > > when using a S2CR register from the additional matching groups, so let's > > > > perform the test with the last reliable S2CR (127) and then limit the > > > > number of SMR groups only if we detect that we are running below the > > > > hypervisor (because of the bypass quirk). > > > > > > > > Fixes: 122611347326 ("iommu/arm-smmu-qcom: Limit the SMR groups to 128") > > > > Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org> > > > > --- > > > > I modified arm_smmu_find_sme() to prefer allocating from the SMR groups > > > > above 128 (until they are all used). I did not see any issues, so I don't > > > > see any indication that they behave any different from the others. > > > > --- > > > > drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 27 +++++++++++++++++---------- > > > > 1 file changed, 17 insertions(+), 10 deletions(-) > > > > > > Is the existing workaround causing you problems somehow? Limiting the SMR > > > groups to what the architecture allows still seems like the best bet to > > > me unless there's a compelling reason to do something else. > > > > > > > Yes, the problem is the following (copied from commit message above): > > > > > The boot firmware uses some of the additional groups, so ignoring them > > > in this situation leads to stream match conflicts whenever we allocate > > > a new SMR group for the same SID. > > > > This happens e.g. in the following situation on SC8280XP when enabling > > video decoding acceleration bare-metal without the hypervisor: > > > > 1. The SMMU is already set up by the boot firmware before Linux is > > started, so some SMRs are already in use during boot. I added some > > code to dump them: > > > > arm-smmu 15000000.iommu: Found SMR0 <0xe0 0x0> > > ... > > arm-smmu 15000000.iommu: Found SMR8 <0x800 0x0> > > <unused> > > arm-smmu 15000000.iommu: Found SMR170 <0x2a22 0x400> > > arm-smmu 15000000.iommu: Found SMR171 <0x2a02 0x400> > > ... > > arm-smmu 15000000.iommu: Found SMR211 <0x400 0x3> > > > > 2. We limit the SMRs to 128, so all the ones >= 170 just stay as-is. > > Only the ones < 128 are considered when allocating SMRs. > > > > 3. We need to configure the following IOMMU for video acceleration: > > > > video-firmware { > > iommus = <&apps_smmu 0x2a02 0x400>; > > }; > > > > 4. arm-smmu 15000000.iommu: Picked SMR 14 for SID 0x2a02 mask 0x400 > > ... but SMR170 already uses that SID+mask! > > > > 5. arm-smmu 15000000.iommu: Unexpected global fault, this could be serious > > arm-smmu 15000000.iommu: GFSR 0x80000004, GFSYNR0 0x0000000c, GFSYNR1 0x00002a02, GFSYNR2 0x00000000 > > > > SMCF, bit[2] is set -> Stream match conflict fault > > caused by SID GFSYNR1 0x00002a02 > > > > With my patch this does not happen anymore. As I wrote, so far I have > > seen no indication that the extra groups behave any different from the > > standard ones defined by the architecture. I don't know why it was done > > this way (rather than e.g. implementing the Extended Stream Matching > > Extension), but we definitely need to do something with the extra SMRs > > to avoid stream match conflicts. > > I'm also a little wary of exposing more non-architectural stuff to the main > driver - could we not keep the existing logic and simply add an extra loop > at the end here to ensure any "extra" SMRs are disabled? > It's not that simple at least, because some of these SMRs are used by co-processors (remoteprocs) that are already active during boot and we need to keep them in bypass until they are taken over by the drivers in Linux. Any interruption inbetween could cause the remoteprocs to crash. With my changes, the boot SMRs stay active (at the same index), because there is an existing loop inside qcom_smmu_cfg_probe() that preserves them as bypass: for (i = 0; i < smmu->num_mapping_groups; i++) { smr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_SMR(i)); if (FIELD_GET(ARM_SMMU_SMR_VALID, smr)) { /* Ignore valid bit for SMR mask extraction. */ smr &= ~ARM_SMMU_SMR_VALID; smmu->smrs[i].id = FIELD_GET(ARM_SMMU_SMR_ID, smr); smmu->smrs[i].mask = FIELD_GET(ARM_SMMU_SMR_MASK, smr); smmu->smrs[i].valid = true; smmu->s2crs[i].type = S2CR_TYPE_BYPASS; smmu->s2crs[i].privcfg = S2CR_PRIVCFG_DEFAULT; smmu->s2crs[i].cbndx = 0xff; } } We could "move" the SMRs > 128 to earlier indexes, but this also needs to be done carefully in order to avoid: - Stream match conflicts, if we write the new entry before deleting the old one. - Unhandled transactions, if we delete the old entry before writing the new one. Currently this can't happen, because we don't move any entries around. We could do it similar to arm_smmu_rmr_install_bypass_smr() and add: /* * Rather than trying to look at existing mappings that * are setup by the firmware and then invalidate the ones * that do no have matching RMR entries, just disable the * SMMU until it gets enabled again in the reset routine. */ reg = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sCR0); reg |= ARM_SMMU_sCR0_CLIENTPD; arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_sCR0, reg); However, this would need to be done carefully only for the bare-metal case, since I doubt Qualcomm's hypervisor will allow disabling all access protections by setting CLIENTPD. I can try implementing this, but the resulting code will likely be more complex than this patch. I realize it is weird to allow non-architectural features like this, but I haven't found any indication that the additional SMRs work any different from the standard ones. The SMMU spec seems to reserve space for up to 256 SMRs in the address space and the register bits, as if it was intended to be extended like this later. That's also why it works correctly without any changes in arm-smmu.c: the bit masks used there already allow up to 256 SMRs. What do you think? Thanks, Stephan
On Wed, Sep 17, 2025 at 09:16:46PM +0200, Stephan Gerhold wrote: > I realize it is weird to allow non-architectural features like this, but > I haven't found any indication that the additional SMRs work any > different from the standard ones. The SMMU spec seems to reserve space > for up to 256 SMRs in the address space and the register bits, as if it > was intended to be extended like this later. That's also why it works > correctly without any changes in arm-smmu.c: the bit masks used there > already allow up to 256 SMRs. > > What do you think? Although it's all pretty ugly, I think we really only have two choices: - Teach the core driver code about all this and use an rmr-like scheme to leave the upper SMRs in bypass - Hack it in the impl code as per your patch The latter option is probably the most pragmatic (especially considering the need to handle the virtualised case differently) but I'd like to see what Robin thinks. Will
© 2016 - 2025 Red Hat, Inc.