From: Barry Song <baohua@kernel.org>
Apply batched DMA synchronization to iommu_dma_sync_sg_for_cpu() and
iommu_dma_sync_sg_for_device(). For all buffers in an SG list, only
a single flush operation is needed.
I do not have the hardware to test this, so the patch is marked as
RFC. I would greatly appreciate any testing feedback.
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Tangquan Zheng <zhengtangquan@oppo.com>
Signed-off-by: Barry Song <baohua@kernel.org>
---
drivers/iommu/dma-iommu.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index ffa940bdbbaf..b68dbfcb7846 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1131,10 +1131,9 @@ void iommu_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sgl,
iommu_dma_sync_single_for_cpu(dev, sg_dma_address(sg),
sg->length, dir);
} else if (!dev_is_dma_coherent(dev)) {
- for_each_sg(sgl, sg, nelems, i) {
+ for_each_sg(sgl, sg, nelems, i)
arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
- arch_sync_dma_flush();
- }
+ arch_sync_dma_flush();
}
}
@@ -1144,16 +1143,16 @@ void iommu_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sgl,
struct scatterlist *sg;
int i;
- if (sg_dma_is_swiotlb(sgl))
+ if (sg_dma_is_swiotlb(sgl)) {
for_each_sg(sgl, sg, nelems, i)
iommu_dma_sync_single_for_device(dev,
sg_dma_address(sg),
sg->length, dir);
- else if (!dev_is_dma_coherent(dev))
- for_each_sg(sgl, sg, nelems, i) {
+ } else if (!dev_is_dma_coherent(dev)) {
+ for_each_sg(sgl, sg, nelems, i)
arch_sync_dma_for_device(sg_phys(sg), sg->length, dir);
- arch_sync_dma_flush();
- }
+ arch_sync_dma_flush();
+ }
}
static phys_addr_t iommu_dma_map_swiotlb(struct device *dev, phys_addr_t phys,
--
2.43.0
On Sat, Dec 27, 2025 at 11:52:48AM +1300, Barry Song wrote:
> From: Barry Song <baohua@kernel.org>
>
> Apply batched DMA synchronization to iommu_dma_sync_sg_for_cpu() and
> iommu_dma_sync_sg_for_device(). For all buffers in an SG list, only
> a single flush operation is needed.
>
> I do not have the hardware to test this, so the patch is marked as
> RFC. I would greatly appreciate any testing feedback.
>
> Cc: Leon Romanovsky <leon@kernel.org>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Ada Couprie Diaz <ada.coupriediaz@arm.com>
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Tangquan Zheng <zhengtangquan@oppo.com>
> Signed-off-by: Barry Song <baohua@kernel.org>
> ---
> drivers/iommu/dma-iommu.c | 15 +++++++--------
> 1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index ffa940bdbbaf..b68dbfcb7846 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -1131,10 +1131,9 @@ void iommu_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sgl,
> iommu_dma_sync_single_for_cpu(dev, sg_dma_address(sg),
> sg->length, dir);
> } else if (!dev_is_dma_coherent(dev)) {
> - for_each_sg(sgl, sg, nelems, i) {
> + for_each_sg(sgl, sg, nelems, i)
> arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
> - arch_sync_dma_flush();
> - }
> + arch_sync_dma_flush();
This and previous patches should be squashed into the one which
introduced arch_sync_dma_flush().
Thanks
On Sun, Dec 28, 2025 at 9:16 AM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Sat, Dec 27, 2025 at 11:52:48AM +1300, Barry Song wrote:
> > From: Barry Song <baohua@kernel.org>
> >
> > Apply batched DMA synchronization to iommu_dma_sync_sg_for_cpu() and
> > iommu_dma_sync_sg_for_device(). For all buffers in an SG list, only
> > a single flush operation is needed.
> >
> > I do not have the hardware to test this, so the patch is marked as
> > RFC. I would greatly appreciate any testing feedback.
> >
> > Cc: Leon Romanovsky <leon@kernel.org>
> > Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Will Deacon <will@kernel.org>
> > Cc: Ada Couprie Diaz <ada.coupriediaz@arm.com>
> > Cc: Ard Biesheuvel <ardb@kernel.org>
> > Cc: Marc Zyngier <maz@kernel.org>
> > Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> > Cc: Ryan Roberts <ryan.roberts@arm.com>
> > Cc: Suren Baghdasaryan <surenb@google.com>
> > Cc: Robin Murphy <robin.murphy@arm.com>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Cc: Tangquan Zheng <zhengtangquan@oppo.com>
> > Signed-off-by: Barry Song <baohua@kernel.org>
> > ---
> > drivers/iommu/dma-iommu.c | 15 +++++++--------
> > 1 file changed, 7 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index ffa940bdbbaf..b68dbfcb7846 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -1131,10 +1131,9 @@ void iommu_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sgl,
> > iommu_dma_sync_single_for_cpu(dev, sg_dma_address(sg),
> > sg->length, dir);
> > } else if (!dev_is_dma_coherent(dev)) {
> > - for_each_sg(sgl, sg, nelems, i) {
> > + for_each_sg(sgl, sg, nelems, i)
> > arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
> > - arch_sync_dma_flush();
> > - }
> > + arch_sync_dma_flush();
>
> This and previous patches should be squashed into the one which
> introduced arch_sync_dma_flush().
Hi Leon,
The series is structured to first introduce no functional change by
replacing all arch_sync_dma_for_* calls with arch_sync_dma_for_* plus
arch_sync_dma_flush(). Subsequent patches then add batching for
different scenarios as separate changes.
Another issue is that I was unable to find a board that both runs
mainline and exercises the IOMMU paths affected by these changes.
As a result, patches 7 and 8 are marked as RFC, while the other
patches have been tested on a real board running mainline + changes.
Thanks
Barry
On 2025-12-27 8:59 pm, Barry Song wrote:
> On Sun, Dec 28, 2025 at 9:16 AM Leon Romanovsky <leon@kernel.org> wrote:
>>
>> On Sat, Dec 27, 2025 at 11:52:48AM +1300, Barry Song wrote:
>>> From: Barry Song <baohua@kernel.org>
>>>
>>> Apply batched DMA synchronization to iommu_dma_sync_sg_for_cpu() and
>>> iommu_dma_sync_sg_for_device(). For all buffers in an SG list, only
>>> a single flush operation is needed.
>>>
>>> I do not have the hardware to test this, so the patch is marked as
>>> RFC. I would greatly appreciate any testing feedback.
>>>
>>> Cc: Leon Romanovsky <leon@kernel.org>
>>> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>> Cc: Will Deacon <will@kernel.org>
>>> Cc: Ada Couprie Diaz <ada.coupriediaz@arm.com>
>>> Cc: Ard Biesheuvel <ardb@kernel.org>
>>> Cc: Marc Zyngier <maz@kernel.org>
>>> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
>>> Cc: Ryan Roberts <ryan.roberts@arm.com>
>>> Cc: Suren Baghdasaryan <surenb@google.com>
>>> Cc: Robin Murphy <robin.murphy@arm.com>
>>> Cc: Joerg Roedel <joro@8bytes.org>
>>> Cc: Tangquan Zheng <zhengtangquan@oppo.com>
>>> Signed-off-by: Barry Song <baohua@kernel.org>
>>> ---
>>> drivers/iommu/dma-iommu.c | 15 +++++++--------
>>> 1 file changed, 7 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>>> index ffa940bdbbaf..b68dbfcb7846 100644
>>> --- a/drivers/iommu/dma-iommu.c
>>> +++ b/drivers/iommu/dma-iommu.c
>>> @@ -1131,10 +1131,9 @@ void iommu_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sgl,
>>> iommu_dma_sync_single_for_cpu(dev, sg_dma_address(sg),
>>> sg->length, dir);
>>> } else if (!dev_is_dma_coherent(dev)) {
>>> - for_each_sg(sgl, sg, nelems, i) {
>>> + for_each_sg(sgl, sg, nelems, i)
>>> arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
>>> - arch_sync_dma_flush();
>>> - }
>>> + arch_sync_dma_flush();
>>
>> This and previous patches should be squashed into the one which
>> introduced arch_sync_dma_flush().
>
> Hi Leon,
>
> The series is structured to first introduce no functional change by
> replacing all arch_sync_dma_for_* calls with arch_sync_dma_for_* plus
> arch_sync_dma_flush(). Subsequent patches then add batching for
> different scenarios as separate changes.
>
> Another issue is that I was unable to find a board that both runs
> mainline and exercises the IOMMU paths affected by these changes.
> As a result, patches 7 and 8 are marked as RFC, while the other
> patches have been tested on a real board running mainline + changes.
FWIW if you can get your hands on an M.2 NVMe for the Rock5 then that
has an SMMU in front of PCIe (and could also work to test non-coherent
SWIOTLB, with the SMMU in bypass and either some fake restrictive
dma-ranges in the DT or a hack to reduce the DMA mask in the NVMe driver.)
Cheers,
Robin.
© 2016 - 2026 Red Hat, Inc.