[PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne

Yang Shi posted 1 patch 1 week ago
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
Posted by Yang Shi 1 week ago
The commit ce410410f1a7 ("iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx()")
calculated the last index of L1 stream table by 1 << smmu->sid_bits. 1
is 32 bit value.
However AmpereOne has 32-bit stream id size.  This resulted in
ouf-of-bound shift.  The disassembly of shift is:

    ldr     w2, [x19, 828]  //, smmu_7(D)->sid_bits
    mov     w20, 1
    lsl     w20, w20, w2

According to ARM spec, if the registers are 32 bit, the instruction actually
does:
    dest = src << (shift % 32)

So it actually shifted by zero bit.

This caused v6.12-rc1 failed to boot on AmpereOne and UBSAN also
reported:

UBSAN: shift-out-of-bounds in drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:3628:29
shift exponent 32 is too large for 32-bit type 'int'
CPU: 70 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0-rc1 #4
Hardware name: ZOLLNER SUNMOONLAKE/SunMoon Lake, BIOS 00.00. 2024-08-28 18:42:45 08/28/2024
Call trace:
 dump_backtrace+0xdc/0x140
 show_stack+0x20/0x40
 dump_stack_lvl+0x60/0x80
 dump_stack+0x18/0x28
 ubsan_epilogue+0x10/0x48
 __ubsan_handle_shift_out_of_bounds+0xd8/0x1a0
 arm_smmu_init_structures+0x374/0x3c8
 arm_smmu_device_probe+0x208/0x600
 platform_probe+0x70/0xe8
 really_probe+0xc8/0x3a0
 __driver_probe_device+0x84/0x160
 driver_probe_device+0x44/0x130
 __driver_attach+0xcc/0x208
 bus_for_each_dev+0x84/0x100
 driver_attach+0x2c/0x40
 bus_add_driver+0x158/0x290
 driver_register+0x70/0x138
 __platform_driver_register+0x2c/0x40
 arm_smmu_driver_init+0x28/0x40
 do_one_initcall+0x60/0x318
 do_initcalls+0x198/0x1e0
 kernel_init_freeable+0x18c/0x1e8
 kernel_init+0x28/0x160
 ret_from_fork+0x10/0x20

Using 64 bit immediate when doing shift can solve the problem.  The
disssembly after the fix looks like:
    ldr     w20, [x19, 828] //, smmu_7(D)->sid_bits
    mov     x0, 1
    lsl     x0, x0, x20

Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 737c5b882355..01a2faee04bc 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3625,7 +3625,7 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
 	u32 l1size;
 	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
 	unsigned int last_sid_idx =
-		arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
+		arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1);
 
 	/* Calculate the L1 size, capped to the SIDSIZE. */
 	cfg->l2.num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES);
-- 
2.41.0
Re: [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
Posted by James Morse 6 days, 16 hours ago
Hello!

On 01/10/2024 19:03, Yang Shi wrote:
> The commit ce410410f1a7 ("iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx()")
> calculated the last index of L1 stream table by 1 << smmu->sid_bits. 1
> is 32 bit value.
> However AmpereOne has 32-bit stream id size.  This resulted in
> ouf-of-bound shift.  The disassembly of shift is:
> 
>     ldr     w2, [x19, 828]  //, smmu_7(D)->sid_bits
>     mov     w20, 1
>     lsl     w20, w20, w2
> 
> According to ARM spec, if the registers are 32 bit, the instruction actually
> does:
>     dest = src << (shift % 32)
> 
> So it actually shifted by zero bit.
> 
> This caused v6.12-rc1 failed to boot on AmpereOne

Same here - one of arm's reference designs prints 1 giga-tonne of:
| arm-smmu-v3 arm-smmu-v3.5.auto: event 0x02 received:
| arm-smmu-v3 arm-smmu-v3.5.auto:  0x0000c90000000002
| arm-smmu-v3 arm-smmu-v3.5.auto:  0x0000000000000000
| arm-smmu-v3 arm-smmu-v3.5.auto:  0x0000000000000000
| arm-smmu-v3 arm-smmu-v3.5.auto:  0x0000000000000000

during boot - then fails to find the nvme.
Bisect points to ce410410f1a7, and the below diff fixes it.

> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 737c5b882355..01a2faee04bc 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -3625,7 +3625,7 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
>  	u32 l1size;
>  	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
>  	unsigned int last_sid_idx =
> -		arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
> +		arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1);
>  
>  	/* Calculate the L1 size, capped to the SIDSIZE. */
>  	cfg->l2.num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES);


Tested-by: James Morse <james.morse@arm.com>


Thanks,

James
Re: [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
Posted by Yang Shi 6 days, 15 hours ago

On 10/2/24 9:58 AM, James Morse wrote:
> Hello!
>
> On 01/10/2024 19:03, Yang Shi wrote:
>> The commit ce410410f1a7 ("iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx()")
>> calculated the last index of L1 stream table by 1 << smmu->sid_bits. 1
>> is 32 bit value.
>> However AmpereOne has 32-bit stream id size.  This resulted in
>> ouf-of-bound shift.  The disassembly of shift is:
>>
>>      ldr     w2, [x19, 828]  //, smmu_7(D)->sid_bits
>>      mov     w20, 1
>>      lsl     w20, w20, w2
>>
>> According to ARM spec, if the registers are 32 bit, the instruction actually
>> does:
>>      dest = src << (shift % 32)
>>
>> So it actually shifted by zero bit.
>>
>> This caused v6.12-rc1 failed to boot on AmpereOne
> Same here - one of arm's reference designs prints 1 giga-tonne of:
> | arm-smmu-v3 arm-smmu-v3.5.auto: event 0x02 received:
> | arm-smmu-v3 arm-smmu-v3.5.auto:  0x0000c90000000002
> | arm-smmu-v3 arm-smmu-v3.5.auto:  0x0000000000000000
> | arm-smmu-v3 arm-smmu-v3.5.auto:  0x0000000000000000
> | arm-smmu-v3 arm-smmu-v3.5.auto:  0x0000000000000000
>
> during boot - then fails to find the nvme.
> Bisect points to ce410410f1a7, and the below diff fixes it.
>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 737c5b882355..01a2faee04bc 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -3625,7 +3625,7 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
>>   	u32 l1size;
>>   	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
>>   	unsigned int last_sid_idx =
>> -		arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
>> +		arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1);
>>   
>>   	/* Calculate the L1 size, capped to the SIDSIZE. */
>>   	cfg->l2.num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES);
>
> Tested-by: James Morse <james.morse@arm.com>

Thank you for testing. I will change the patch subject a little bit to 
make it more general.

>
>
> Thanks,
>
> James
Re: [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
Posted by Nicolin Chen 1 week ago
On Tue, Oct 01, 2024 at 11:03:46AM -0700, Yang Shi wrote:
> Using 64 bit immediate when doing shift can solve the problem.  The
> disssembly after the fix looks like:

[...]

>         unsigned int last_sid_idx =
> -               arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
> +               arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1);

Could a 32-bit build be a corner case where UL is no longer a
"64 bit" stated in the commit message?

Also, there are other places doing "1 << smmu->sid_bits", e.g.
arm_smmu_init_strtab_linear().

Then, can ssid_bits/s1cdmax be a concern similarly?

Thanks
Nicolin
Re: [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
Posted by Yang Shi 1 week ago

On 10/1/24 11:27 AM, Nicolin Chen wrote:
> On Tue, Oct 01, 2024 at 11:03:46AM -0700, Yang Shi wrote:
>> Using 64 bit immediate when doing shift can solve the problem.  The
>> disssembly after the fix looks like:
> [...]
>
>>          unsigned int last_sid_idx =
>> -               arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
>> +               arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1);
> Could a 32-bit build be a corner case where UL is no longer a
> "64 bit" stated in the commit message?

It shouldn't. Because smmu v3 depends on ARM64.

config ARM_SMMU_V3
         tristate "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
         depends on ARM64

>
> Also, there are other places doing "1 << smmu->sid_bits", e.g.
> arm_smmu_init_strtab_linear().

The disassembly shows it uses "sbfiz   x21, x20, 6, 32" instead of lsl. 
1UL should be used if we want to take extra caution and don't prefer 
rely on compiler.

>
> Then, can ssid_bits/s1cdmax be a concern similarly?

IIUC, ssid_bits is determined by IDR1_SSIDSIZE. It is GENMASK(10, 6). So 
it shouldn't be 32. IDR1_SIDSIZE is GENMASK(5, 0).

>
> Thanks
> Nicolin

Re: [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
Posted by Nicolin Chen 1 week ago
On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote:
> On 10/1/24 11:27 AM, Nicolin Chen wrote:
> > On Tue, Oct 01, 2024 at 11:03:46AM -0700, Yang Shi wrote:
> > > Using 64 bit immediate when doing shift can solve the problem.  The
> > > disssembly after the fix looks like:
> > [...]
> > 
> > >          unsigned int last_sid_idx =
> > > -               arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
> > > +               arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1);
> > Could a 32-bit build be a corner case where UL is no longer a
> > "64 bit" stated in the commit message?
> 
> It shouldn't. Because smmu v3 depends on ARM64.
> 
> config ARM_SMMU_V3
>         tristate "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
>         depends on ARM64

ARM64 can have aarch32 support. I am not sure if ARM64 running a
32-bit OS can be a case though, (and not confined to AmpereOne).

> > Then, can ssid_bits/s1cdmax be a concern similarly?
> 
> IIUC, ssid_bits is determined by IDR1_SSIDSIZE. It is GENMASK(10, 6). So
> it shouldn't be 32. IDR1_SIDSIZE is GENMASK(5, 0).

Rechecked the RM. Yea, max sid can be 32 but max ssid is 20 at
this moment, so we should be safe.

Thanks
Nicolin
Re: [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
Posted by Yang Shi 1 week ago

On 10/1/24 12:29 PM, Nicolin Chen wrote:
> On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote:
>> On 10/1/24 11:27 AM, Nicolin Chen wrote:
>>> On Tue, Oct 01, 2024 at 11:03:46AM -0700, Yang Shi wrote:
>>>> Using 64 bit immediate when doing shift can solve the problem.  The
>>>> disssembly after the fix looks like:
>>> [...]
>>>
>>>>           unsigned int last_sid_idx =
>>>> -               arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
>>>> +               arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1);
>>> Could a 32-bit build be a corner case where UL is no longer a
>>> "64 bit" stated in the commit message?
>> It shouldn't. Because smmu v3 depends on ARM64.
>>
>> config ARM_SMMU_V3
>>          tristate "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
>>          depends on ARM64
> ARM64 can have aarch32 support. I am not sure if ARM64 running a
> 32-bit OS can be a case though, (and not confined to AmpereOne).

I don't think ARM64 runs 32-bit kernel, at least for newer kernel.

>
>>> Then, can ssid_bits/s1cdmax be a concern similarly?
>> IIUC, ssid_bits is determined by IDR1_SSIDSIZE. It is GENMASK(10, 6). So
>> it shouldn't be 32. IDR1_SIDSIZE is GENMASK(5, 0).
> Rechecked the RM. Yea, max sid can be 32 but max ssid is 20 at
> this moment, so we should be safe.
>
> Thanks
> Nicolin
Re: [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
Posted by Robin Murphy 6 days, 23 hours ago
On 2024-10-01 8:48 pm, Yang Shi wrote:
> 
> 
> On 10/1/24 12:29 PM, Nicolin Chen wrote:
>> On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote:
>>> On 10/1/24 11:27 AM, Nicolin Chen wrote:
>>>> On Tue, Oct 01, 2024 at 11:03:46AM -0700, Yang Shi wrote:
>>>>> Using 64 bit immediate when doing shift can solve the problem.  The
>>>>> disssembly after the fix looks like:
>>>> [...]
>>>>
>>>>>           unsigned int last_sid_idx =
>>>>> -               arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
>>>>> +               arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1);
>>>> Could a 32-bit build be a corner case where UL is no longer a
>>>> "64 bit" stated in the commit message?
>>> It shouldn't. Because smmu v3 depends on ARM64.
>>>
>>> config ARM_SMMU_V3
>>>          tristate "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
>>>          depends on ARM64
>> ARM64 can have aarch32 support. I am not sure if ARM64 running a
>> 32-bit OS can be a case though, (and not confined to AmpereOne).
> 
> I don't think ARM64 runs 32-bit kernel, at least for newer kernel.

Just use ULL - if the point is that it must be a 64-bit shift for 
correctness, then being clear about that intent is far more valuable 
than saving one character of source code.

Thanks,
Robin.

> 
>>
>>>> Then, can ssid_bits/s1cdmax be a concern similarly?
>>> IIUC, ssid_bits is determined by IDR1_SSIDSIZE. It is GENMASK(10, 6). So
>>> it shouldn't be 32. IDR1_SIDSIZE is GENMASK(5, 0).
>> Rechecked the RM. Yea, max sid can be 32 but max ssid is 20 at
>> this moment, so we should be safe.
>>
>> Thanks
>> Nicolin
> 
Re: [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
Posted by Yang Shi 6 days, 17 hours ago

On 10/2/24 2:59 AM, Robin Murphy wrote:
> On 2024-10-01 8:48 pm, Yang Shi wrote:
>>
>>
>> On 10/1/24 12:29 PM, Nicolin Chen wrote:
>>> On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote:
>>>> On 10/1/24 11:27 AM, Nicolin Chen wrote:
>>>>> On Tue, Oct 01, 2024 at 11:03:46AM -0700, Yang Shi wrote:
>>>>>> Using 64 bit immediate when doing shift can solve the problem.  The
>>>>>> disssembly after the fix looks like:
>>>>> [...]
>>>>>
>>>>>>           unsigned int last_sid_idx =
>>>>>> -               arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
>>>>>> +               arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1);
>>>>> Could a 32-bit build be a corner case where UL is no longer a
>>>>> "64 bit" stated in the commit message?
>>>> It shouldn't. Because smmu v3 depends on ARM64.
>>>>
>>>> config ARM_SMMU_V3
>>>>          tristate "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
>>>>          depends on ARM64
>>> ARM64 can have aarch32 support. I am not sure if ARM64 running a
>>> 32-bit OS can be a case though, (and not confined to AmpereOne).
>>
>> I don't think ARM64 runs 32-bit kernel, at least for newer kernel.
>
> Just use ULL - if the point is that it must be a 64-bit shift for 
> correctness, then being clear about that intent is far more valuable 
> than saving one character of source code.

Yeah, it must be 64 bit. Will fix in v2.

>
> Thanks,
> Robin.
>
>>
>>>
>>>>> Then, can ssid_bits/s1cdmax be a concern similarly?
>>>> IIUC, ssid_bits is determined by IDR1_SSIDSIZE. It is GENMASK(10, 
>>>> 6). So
>>>> it shouldn't be 32. IDR1_SIDSIZE is GENMASK(5, 0).
>>> Rechecked the RM. Yea, max sid can be 32 but max ssid is 20 at
>>> this moment, so we should be safe.
>>>
>>> Thanks
>>> Nicolin
>>

Re: [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
Posted by Jason Gunthorpe 1 week ago
On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote:
> > Also, there are other places doing "1 << smmu->sid_bits", e.g.
> > arm_smmu_init_strtab_linear().
> 
> The disassembly shows it uses "sbfiz x21, x20, 6, 32" instead of lsl. 1UL
> should be used if we want to take extra caution and don't prefer rely on
> compiler.

Still, we should be fixing them all if sid_bits == 32, all those
shifts should be throwing a UBSAN error. It would be crazy to have a
32 bit linear table but I guess it is allowed?

Jason
Re: [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
Posted by Yang Shi 1 week ago

On 10/1/24 12:18 PM, Jason Gunthorpe wrote:
> On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote:
>>> Also, there are other places doing "1 << smmu->sid_bits", e.g.
>>> arm_smmu_init_strtab_linear().
>> The disassembly shows it uses "sbfiz x21, x20, 6, 32" instead of lsl. 1UL
>> should be used if we want to take extra caution and don't prefer rely on
>> compiler.
> Still, we should be fixing them all if sid_bits == 32, all those
> shifts should be throwing a UBSAN error. It would be crazy to have a

OK, will cover this is v2.

> 32 bit linear table but I guess it is allowed?

I'm not smmu expert, but if sid_bits is 32, it looks like the linear 
table will consume 256GB IIUC? That is crazy.

>
> Jason
Re: [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
Posted by Jason Gunthorpe 1 week ago
On Tue, Oct 01, 2024 at 12:38:56PM -0700, Yang Shi wrote:
> 
> 
> On 10/1/24 12:18 PM, Jason Gunthorpe wrote:
> > On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote:
> > > > Also, there are other places doing "1 << smmu->sid_bits", e.g.
> > > > arm_smmu_init_strtab_linear().
> > > The disassembly shows it uses "sbfiz x21, x20, 6, 32" instead of lsl. 1UL
> > > should be used if we want to take extra caution and don't prefer rely on
> > > compiler.
> > Still, we should be fixing them all if sid_bits == 32, all those
> > shifts should be throwing a UBSAN error. It would be crazy to have a
> 
> OK, will cover this is v2.

Maybe just make a little inline function to do this math and remove
the repated open coding? Then the types can be right, etc.

Jason
Re: [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
Posted by Yang Shi 1 week ago

On 10/1/24 12:46 PM, Jason Gunthorpe wrote:
> On Tue, Oct 01, 2024 at 12:38:56PM -0700, Yang Shi wrote:
>>
>> On 10/1/24 12:18 PM, Jason Gunthorpe wrote:
>>> On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote:
>>>>> Also, there are other places doing "1 << smmu->sid_bits", e.g.
>>>>> arm_smmu_init_strtab_linear().
>>>> The disassembly shows it uses "sbfiz x21, x20, 6, 32" instead of lsl. 1UL
>>>> should be used if we want to take extra caution and don't prefer rely on
>>>> compiler.
>>> Still, we should be fixing them all if sid_bits == 32, all those
>>> shifts should be throwing a UBSAN error. It would be crazy to have a
>> OK, will cover this is v2.
> Maybe just make a little inline function to do this math and remove
> the repated open coding? Then the types can be right, etc.

Something like this?

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 01a2faee04bc..0f3aa7962117 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3624,8 +3624,9 @@ static int arm_smmu_init_strtab_2lvl(struct 
arm_smmu_device *smmu)
  {
         u32 l1size;
         struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
+       unsigned int max_sid = arm_smmu_strtab_max_sid(smmu->sid_bits);
         unsigned int last_sid_idx =
-               arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1);
+               arm_smmu_strtab_l1_idx(max_sid - 1);

         /* Calculate the L1 size, capped to the SIDSIZE. */
         cfg->l2.num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES);
@@ -3657,8 +3658,9 @@ static int arm_smmu_init_strtab_linear(struct 
arm_smmu_device *smmu)
  {
         u32 size;
         struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
+       unsigned int max_sid = arm_smmu_strtab_max_sid(smmu->sid_bits);

-       size = (1 << smmu->sid_bits) * sizeof(struct arm_smmu_ste);
+       size = max_sid * sizeof(struct arm_smmu_ste);
         cfg->linear.table = dmam_alloc_coherent(smmu->dev, size,
&cfg->linear.ste_dma,
                                                 GFP_KERNEL);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 1e9952ca989f..de9f14293485 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -235,6 +235,11 @@ static inline u32 arm_smmu_strtab_l2_idx(u32 sid)
         return sid % STRTAB_NUM_L2_STES;
  }

+static inline unsigned int arm_smmu_strtab_max_sid(unsigned int sid_bits)
+{
+       return (1UL << sid_bits);
+}
+

>
> Jason

Re: [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
Posted by James Morse 6 days, 16 hours ago
Hello,

On 01/10/2024 21:47, Yang Shi wrote:
> On 10/1/24 12:46 PM, Jason Gunthorpe wrote:
>> On Tue, Oct 01, 2024 at 12:38:56PM -0700, Yang Shi wrote:
>>>
>>> On 10/1/24 12:18 PM, Jason Gunthorpe wrote:
>>>> On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote:
>>>>>> Also, there are other places doing "1 << smmu->sid_bits", e.g.
>>>>>> arm_smmu_init_strtab_linear().
>>>>> The disassembly shows it uses "sbfiz x21, x20, 6, 32" instead of lsl. 1UL
>>>>> should be used if we want to take extra caution and don't prefer rely on
>>>>> compiler.
>>>> Still, we should be fixing them all if sid_bits == 32, all those
>>>> shifts should be throwing a UBSAN error. It would be crazy to have a
>>> OK, will cover this is v2.
>> Maybe just make a little inline function to do this math and remove
>> the repated open coding? Then the types can be right, etc.
> 
> Something like this?
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 01a2faee04bc..0f3aa7962117 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -3624,8 +3624,9 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
>  {
>         u32 l1size;
>         struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> +       unsigned int max_sid = arm_smmu_strtab_max_sid(smmu->sid_bits);
>         unsigned int last_sid_idx =
> -               arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1);
> +               arm_smmu_strtab_l1_idx(max_sid - 1);
> 
>         /* Calculate the L1 size, capped to the SIDSIZE. */
>         cfg->l2.num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES);
> @@ -3657,8 +3658,9 @@ static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
>  {
>         u32 size;
>         struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> +       unsigned int max_sid = arm_smmu_strtab_max_sid(smmu->sid_bits);
> 
> -       size = (1 << smmu->sid_bits) * sizeof(struct arm_smmu_ste);
> +       size = max_sid * sizeof(struct arm_smmu_ste);
>         cfg->linear.table = dmam_alloc_coherent(smmu->dev, size,
> &cfg->linear.ste_dma,
>                                                 GFP_KERNEL);
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 1e9952ca989f..de9f14293485 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -235,6 +235,11 @@ static inline u32 arm_smmu_strtab_l2_idx(u32 sid)
>         return sid % STRTAB_NUM_L2_STES;
>  }
> 
> +static inline unsigned int arm_smmu_strtab_max_sid(unsigned int sid_bits)
> +{
> +       return (1UL << sid_bits);
> +}
> +

On the same platform - this also fixes the issue:
Tested-by: James Morse <james.morse@arm.com>


Thanks,

James
Re: [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
Posted by Jason Gunthorpe 1 week ago
On Tue, Oct 01, 2024 at 01:47:24PM -0700, Yang Shi wrote:
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 1e9952ca989f..de9f14293485 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -235,6 +235,11 @@ static inline u32 arm_smmu_strtab_l2_idx(u32 sid)
>         return sid % STRTAB_NUM_L2_STES;
>  }
> 
> +static inline unsigned int arm_smmu_strtab_max_sid(unsigned int sid_bits)
> +{

Can we take the smmu struct here instead of the int?

But yes, this looks OK

Jason
Re: [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
Posted by Yang Shi 1 week ago

On 10/1/24 2:02 PM, Jason Gunthorpe wrote:
> On Tue, Oct 01, 2024 at 01:47:24PM -0700, Yang Shi wrote:
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> index 1e9952ca989f..de9f14293485 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> @@ -235,6 +235,11 @@ static inline u32 arm_smmu_strtab_l2_idx(u32 sid)
>>          return sid % STRTAB_NUM_L2_STES;
>>   }
>>
>> +static inline unsigned int arm_smmu_strtab_max_sid(unsigned int sid_bits)
>> +{
> Can we take the smmu struct here instead of the int?

No problem.

>
> But yes, this looks OK

Thank you.

>
> Jason
Re: [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
Posted by Yang Shi 1 week ago

On 10/1/24 12:46 PM, Jason Gunthorpe wrote:
> On Tue, Oct 01, 2024 at 12:38:56PM -0700, Yang Shi wrote:
>>
>> On 10/1/24 12:18 PM, Jason Gunthorpe wrote:
>>> On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote:
>>>>> Also, there are other places doing "1 << smmu->sid_bits", e.g.
>>>>> arm_smmu_init_strtab_linear().
>>>> The disassembly shows it uses "sbfiz x21, x20, 6, 32" instead of lsl. 1UL
>>>> should be used if we want to take extra caution and don't prefer rely on
>>>> compiler.
>>> Still, we should be fixing them all if sid_bits == 32, all those
>>> shifts should be throwing a UBSAN error. It would be crazy to have a
>> OK, will cover this is v2.
> Maybe just make a little inline function to do this math and remove
> the repated open coding? Then the types can be right, etc.

Fine to me.

>
> Jason