[v2] iommu/arm-smmu-v3: Add support for ECMDQ register mode

[PATCH v2 0/2] iommu/arm-smmu-v3: Add support for ECMDQ register mode

Posted by thunder.leizhen@huaweicloud.com 2 years, 6 months ago

From: Zhen Lei <thunder.leizhen@huawei.com>

v1 --> v2:
1. Drop patch "iommu/arm-smmu-v3: Add arm_smmu_ecmdq_issue_cmdlist() for non-shared ECMDQ" in v1
2. Drop patch "iommu/arm-smmu-v3: Add support for less than one ECMDQ per core" in v1
3. Replace rwlock with IPI to support lockless protection against the write operation to bit
   'ERRACK' during error handling and the read operation to bit 'ERRACK' during command insertion. 
4. Standardize variable names.
-	struct arm_smmu_ecmdq *__percpu	*ecmdq;
+	struct arm_smmu_ecmdq *__percpu	*ecmdqs;

5. Add member 'iobase' to struct arm_smmu_device to record the start physical
   address of the SMMU, to replace translation operation (vmalloc_to_pfn(smmu->base) << PAGE_SHIFT)
+	phys_addr_t			iobase;
-	smmu_dma_base = (vmalloc_to_pfn(smmu->base) << PAGE_SHIFT);

6. Cancel below union. Whether ECMDQ is enabled is determined only based on 'ecmdq_enabled'.
-	union {
-		u32			nr_ecmdq;
-		u32			ecmdq_enabled;
-	};
+	u32				nr_ecmdq;
+	bool				ecmdq_enabled;

7. Eliminate some sparse check warnings. For example.
-	struct arm_smmu_ecmdq *ecmdq;
+	struct arm_smmu_ecmdq __percpu *ecmdq;



Zhen Lei (2):
  iommu/arm-smmu-v3: Add support for ECMDQ register mode
  iommu/arm-smmu-v3: Ensure that a set of associated commands are
    inserted in the same ECMDQ

 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 260 +++++++++++++++++++-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  33 +++
 2 files changed, 285 insertions(+), 8 deletions(-)

-- 
2.34.1

Re: [PATCH v2 0/2] iommu/arm-smmu-v3: Add support for ECMDQ register mode

Posted by Will Deacon 2 years, 6 months ago

On Wed, Aug 09, 2023 at 09:13:01PM +0800, thunder.leizhen@huaweicloud.com wrote:
> From: Zhen Lei <thunder.leizhen@huawei.com>
> 
> v1 --> v2:

Jason previously asked about performance numbers for ECMDQ:

https://lore.kernel.org/r/ZL6n3f01yV7tc4yH@ziepe.ca

Do you have any?

Will

Re: [PATCH v2 0/2] iommu/arm-smmu-v3: Add support for ECMDQ register mode

Posted by Leizhen (ThunderTown) 2 years, 6 months ago

On 2023/8/9 21:56, Will Deacon wrote:
> On Wed, Aug 09, 2023 at 09:13:01PM +0800, thunder.leizhen@huaweicloud.com wrote:
>> From: Zhen Lei <thunder.leizhen@huawei.com>
>>
>> v1 --> v2:
> 
> Jason previously asked about performance numbers for ECMDQ:
> 
> https://lore.kernel.org/r/ZL6n3f01yV7tc4yH@ziepe.ca
> 
> Do you have any?

I asked my colleagues in the chip department, and they said that the chip
was not commercially available and the specific data could not be disclosed.
However, to be sure, the performance has improved, but not by much, the
public benchmark is only about 5%. Your optimization patch was so perfect
that it ruined our jobs.

However, since Marvell also implements ECMDQ, there are at least two users.
Do we think about making it available first?

> 
> Will
> .
> 

-- 
Regards,
  Zhen Lei

Re: [PATCH v2 0/2] iommu/arm-smmu-v3: Add support for ECMDQ register mode

Posted by Nicolin Chen 2 years, 6 months ago

On Wed, Aug 09, 2023 at 07:18:36PM -0700, Leizhen (ThunderTown) wrote:
> On 2023/8/9 21:56, Will Deacon wrote:
> > On Wed, Aug 09, 2023 at 09:13:01PM +0800, thunder.leizhen@huaweicloud.com wrote:
> >> From: Zhen Lei <thunder.leizhen@huawei.com>
> >>
> >> v1 --> v2:
> >
> > Jason previously asked about performance numbers for ECMDQ:
> >
> > https://lore.kernel.org/r/ZL6n3f01yV7tc4yH@ziepe.ca
> >
> > Do you have any?
> 
> I asked my colleagues in the chip department, and they said that the chip
> was not commercially available and the specific data could not be disclosed.
> However, to be sure, the performance has improved, but not by much, the
> public benchmark is only about 5%. Your optimization patch was so perfect
> that it ruined our jobs.
> 
> However, since Marvell also implements ECMDQ, there are at least two users.
> Do we think about making it available first?

I have seen something similar (~5%) with VCMDQ on NVIDIA Grace,
when running, in host OS, TLB flush benchmark tests concurrently
on different CPUs.

Although VCMDQ could be slightly different from ECMDQ, both have
a multi-queue feature. And the amount of improvement in my case
came from a reduction of congestion at issueing commands to the
multi queues vs. a single queue. And I guess ECMDQ might benefit
its 5% from that too.

If we decide to move ECMDQ forward, perhaps we can converge some
of the functions to support both :)

Thanks
Nicolin