[v1] dma-mapping: migrate to physical address-based API

[PATCH 0/8] dma-mapping: migrate to physical address-based API

Posted by Leon Romanovsky 3 months, 2 weeks ago

This series refactors the DMA mapping to use physical addresses
as the primary interface instead of page+offset parameters. This
change aligns the DMA API with the underlying hardware reality where
DMA operations work with physical addresses, not page structures.

The series consists of 8 patches that progressively convert the DMA
mapping infrastructure from page-based to physical address-based APIs:

The series maintains backward compatibility by keeping the old
page-based API as wrapper functions around the new physical
address-based implementations.

Thanks

Leon Romanovsky (8):
  dma-debug: refactor to use physical addresses for page mapping
  dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys
  iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys
  dma-mapping: convert dma_direct_*map_page to be phys_addr_t based
  kmsan: convert kmsan_handle_dma to use physical addresses
  dma-mapping: fail early if physical address is mapped through platform
    callback
  dma-mapping: export new dma_*map_phys() interface
  mm/hmm: migrate to physical address-based DMA mapping API

 Documentation/core-api/dma-api.rst |  4 +-
 arch/powerpc/kernel/dma-iommu.c    |  4 +-
 drivers/iommu/dma-iommu.c          | 14 +++----
 drivers/virtio/virtio_ring.c       |  4 +-
 include/linux/dma-map-ops.h        |  8 ++--
 include/linux/dma-mapping.h        | 13 ++++++
 include/linux/iommu-dma.h          |  7 ++--
 include/linux/kmsan.h              | 12 +++---
 include/trace/events/dma.h         |  4 +-
 kernel/dma/debug.c                 | 28 ++++++++-----
 kernel/dma/debug.h                 | 16 ++++---
 kernel/dma/direct.c                |  6 +--
 kernel/dma/direct.h                | 13 +++---
 kernel/dma/mapping.c               | 67 +++++++++++++++++++++---------
 kernel/dma/ops_helpers.c           |  6 +--
 mm/hmm.c                           |  8 ++--
 mm/kmsan/hooks.c                   | 36 ++++++++++++----
 tools/virtio/linux/kmsan.h         |  2 +-
 18 files changed, 159 insertions(+), 93 deletions(-)

-- 
2.49.0

Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API

Posted by Robin Murphy 2 months, 2 weeks ago

On 2025-06-25 2:18 pm, Leon Romanovsky wrote:
> This series refactors the DMA mapping to use physical addresses
> as the primary interface instead of page+offset parameters. This
> change aligns the DMA API with the underlying hardware reality where
> DMA operations work with physical addresses, not page structures.

That is obvious nonsense - the DMA *API* does not exist in "hardware 
reality"; the DMA API abstracts *software* operations that must be 
performed before and after the actual hardware DMA operation in order to 
preserve memory coherency etc.

Streaming DMA API callers get their buffers from alloc_pages() or 
kmalloc(); they do not have physical addresses, they have a page or 
virtual address. The internal operations of pretty much every DMA API 
implementation that isn't a no-op also require a page and/or virtual 
address. It is 100% logical for the DMA API interfaces to take a page or 
virtual address (and since virt_to_page() is pretty trivial, we already 
consolidated the two interfaces ages ago).

Yes, once you get right down to the low-level arch_sync_dma_*() 
interfaces that passes a physical address, but that's mostly an artefact 
of them being factored out of old dma_sync_single_*() implementations 
that took a (physical) DMA address. Nearly all of them then use __va() 
or phys_to_virt() to actually consume it. Even though it's a 
phys_addr_t, the implicit guarantee that it represents page-backed 
memory is absolutely vital.

Take a step back; what do you imagine that a DMA API call on a 
non-page-backed physical address could actually *do*?

- Cache maintenance? No, it would be illogical for a P2P address to be 
cached in a CPU cache, and anyway it would almost always crash because 
it requires page-backed memory with a virtual address.

- Bounce buffering? Again no, that would be illogical, defeat the entire 
point of a P2P operation, and anyway would definitely crash because it 
requires page-backed memory with a virtual address.

- IOMMU mappings? Oh hey look that's exactly what dma_map_resource() has 
been doing for 9 years. Not to mention your new IOMMU API if callers 
want to be IOMMU-aware (although without the same guarantee of not also 
doing the crashy things.)

- Debug tracking? Again, already taken care of by dma_map_resource().

- Some entirely new concept? Well, I'm eager to be enlightened if so!

But given what we do already know of from decades of experience, obvious 
question: For the tiny minority of users who know full well when they're 
dealing with a non-page-backed physical address, what's wrong with using 
dma_map_resource?

Does it make sense to try to consolidate our p2p infrastructure so 
dma_map_resource() could return bus addresses where appropriate? Yes, 
almost certainly, if it makes it more convenient to use. And with only 
about 20 users it's not too impractical to add some extra arguments or 
even rejig the whole interface if need be. Indeed an overhaul might even 
help solve the current grey area as to when it should take dma_range_map 
into account or not for platform devices.

> The series consists of 8 patches that progressively convert the DMA
> mapping infrastructure from page-based to physical address-based APIs:

And as a result ends up making said DMA mapping infrastructure slightly 
more complicated and slightly less efficient for all its legitimate 
users, all so one or two highly specialised users can then pretend to 
call it in situations where it must be a no-op anyway? Please explain 
convincingly why that is not a giant waste of time.

Are we trying to remove struct page from the kernel altogether? If yes, 
then for goodness' sake lead with that, but even then I'd still prefer 
to see the replacements for critical related infrastructure like 
pfn_valid() in place before we start trying to reshape the DMA API to fit.

Thanks,
Robin.

> The series maintains backward compatibility by keeping the old
> page-based API as wrapper functions around the new physical
> address-based implementations.
> 
> Thanks
> 
> Leon Romanovsky (8):
>    dma-debug: refactor to use physical addresses for page mapping
>    dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys
>    iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys
>    dma-mapping: convert dma_direct_*map_page to be phys_addr_t based
>    kmsan: convert kmsan_handle_dma to use physical addresses
>    dma-mapping: fail early if physical address is mapped through platform
>      callback
>    dma-mapping: export new dma_*map_phys() interface
>    mm/hmm: migrate to physical address-based DMA mapping API
> 
>   Documentation/core-api/dma-api.rst |  4 +-
>   arch/powerpc/kernel/dma-iommu.c    |  4 +-
>   drivers/iommu/dma-iommu.c          | 14 +++----
>   drivers/virtio/virtio_ring.c       |  4 +-
>   include/linux/dma-map-ops.h        |  8 ++--
>   include/linux/dma-mapping.h        | 13 ++++++
>   include/linux/iommu-dma.h          |  7 ++--
>   include/linux/kmsan.h              | 12 +++---
>   include/trace/events/dma.h         |  4 +-
>   kernel/dma/debug.c                 | 28 ++++++++-----
>   kernel/dma/debug.h                 | 16 ++++---
>   kernel/dma/direct.c                |  6 +--
>   kernel/dma/direct.h                | 13 +++---
>   kernel/dma/mapping.c               | 67 +++++++++++++++++++++---------
>   kernel/dma/ops_helpers.c           |  6 +--
>   mm/hmm.c                           |  8 ++--
>   mm/kmsan/hooks.c                   | 36 ++++++++++++----
>   tools/virtio/linux/kmsan.h         |  2 +-
>   18 files changed, 159 insertions(+), 93 deletions(-)
>

Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API

Posted by Jason Gunthorpe 2 months, 1 week ago

On Fri, Jul 25, 2025 at 09:05:46PM +0100, Robin Murphy wrote:

> But given what we do already know of from decades of experience, obvious
> question: For the tiny minority of users who know full well when they're
> dealing with a non-page-backed physical address, what's wrong with using
> dma_map_resource?

I was also pushing for this, that we would have two seperate paths:

- the phys_addr was guarenteed to have a KVA (and today also struct page)
- the phys_addr is non-cachable and no KVA may exist

This is basically already the distinction today between map resource
and map page.

The caller would have to look at what it is trying to map, do the P2P
evaluation and then call the cachable phys or resource path(s).

Leon, I think you should revive the work you had along these lines. It
would address my concerns with the dma_ops changes too. I continue to
think we should not push non-cachable, non-KVA MMIO down the map_page
ops, those should use the map_resource op.

> Does it make sense to try to consolidate our p2p infrastructure so
> dma_map_resource() could return bus addresses where appropriate?

For some users but not entirely :( The sg path for P2P relies on
storing information inside the scatterlist so unmap knows what to do.

Changing map_resource to return a similar flag and then having drivers
somehow store that flag and give it back to unmap is not a trivial
change. It would be a good API for simple drivers, and I think we
could build such a helper calling through the new flow. But places
like DMABUF that have more complex lists will not like it.

For them we've been following the approach of BIO where the
driver/subystem will maintain a mapping list and be aware of when the
P2P information is changing. Then it has to do different map/unmap
sequences based on its own existing tracking.

I view this as all very low level infrastructure, I'm really hoping we
can get an agreement with Chritain and build a scatterlist replacement
for DMABUF that encapsulates all this away from drivers like BIO does
for block.

But we can't start that until we have a DMA API working fully for
non-struct page P2P memory. That is being driven by this series and
the VFIO DMABUF implementation on top of it.

> Are we trying to remove struct page from the kernel altogether? 

Yes, it is a very long term project being pushed along with the
folios, memdesc conversion and so forth. It is huge, with many
aspects, but we can start to reasonably work on parts of them
independently.

A mid-term dream is to be able to go from pin_user_pages() -> DMA
without drivers needing to touch struct page at all. 

This is a huge project on its own, and we are progressing it slowly
"bottom up" by allowing phys_addr_t in the DMA API then we can build
more infrastructure for subsystems to be struct-page free, culminating
in some pin_user_phyr() and phys_addr_t bio_vec someday.

Certainly a big part of this series is influenced by requirements to
advance pin_user_pages() -> DMA, while the other part is about
allowing P2P to work using phys_addr_t without struct page.

Jason

Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API

Posted by Marek Szyprowski 3 months, 1 week ago

On 25.06.2025 15:18, Leon Romanovsky wrote:
> This series refactors the DMA mapping to use physical addresses
> as the primary interface instead of page+offset parameters. This
> change aligns the DMA API with the underlying hardware reality where
> DMA operations work with physical addresses, not page structures.
>
> The series consists of 8 patches that progressively convert the DMA
> mapping infrastructure from page-based to physical address-based APIs:
>
> The series maintains backward compatibility by keeping the old
> page-based API as wrapper functions around the new physical
> address-based implementations.

Thanks for this rework! I assume that the next step is to add map_phys 
callback also to the dma_map_ops and teach various dma-mapping providers 
to use it to avoid more phys-to-page-to-phys conversions.

I only wonder if this newly introduced dma_map_phys()/dma_unmap_phys() 
API is also suitable for the recently discussed PCI P2P DMA? While 
adding a new API maybe we should take this into account? My main concern 
is the lack of the source phys addr passed to the dma_unmap_phys() 
function and I'm aware that this might complicate a bit code conversion 
from old dma_map/unmap_page() API.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API

Posted by Leon Romanovsky 3 months, 1 week ago

On Fri, Jun 27, 2025 at 03:44:10PM +0200, Marek Szyprowski wrote:
> On 25.06.2025 15:18, Leon Romanovsky wrote:
> > This series refactors the DMA mapping to use physical addresses
> > as the primary interface instead of page+offset parameters. This
> > change aligns the DMA API with the underlying hardware reality where
> > DMA operations work with physical addresses, not page structures.
> >
> > The series consists of 8 patches that progressively convert the DMA
> > mapping infrastructure from page-based to physical address-based APIs:
> >
> > The series maintains backward compatibility by keeping the old
> > page-based API as wrapper functions around the new physical
> > address-based implementations.
> 
> Thanks for this rework! I assume that the next step is to add map_phys 
> callback also to the dma_map_ops and teach various dma-mapping providers 
> to use it to avoid more phys-to-page-to-phys conversions.

Probably Christoph will say yes, however I personally don't see any
benefit in this. Maybe I wrong here, but all existing .map_page()
implementation platforms don't support p2p anyway. They won't benefit
from this such conversion.

> 
> I only wonder if this newly introduced dma_map_phys()/dma_unmap_phys() 
> API is also suitable for the recently discussed PCI P2P DMA? While 
> adding a new API maybe we should take this into account?

First, immediate user (not related to p2p) is blk layer:
https://lore.kernel.org/linux-nvme/bcdcb5eb-17ed-412f-bf5c-303079798fe2@nvidia.com/T/#m7e715697d4b2e3997622a3400243477c75cab406

+static bool blk_dma_map_direct(struct request *req, struct device *dma_dev,
+		struct blk_dma_iter *iter, struct phys_vec *vec)
+{
+	iter->addr = dma_map_page(dma_dev, phys_to_page(vec->paddr),
+			offset_in_page(vec->paddr), vec->len, rq_dma_dir(req));
+	if (dma_mapping_error(dma_dev, iter->addr)) {
+		iter->status = BLK_STS_RESOURCE;
+		return false;
+	}
+	iter->len = vec->len;
+	return true;
+}

Block layer started to store phys addresses instead of struct pages and
this phys_to_page() conversion in data-path will be avoided.

> My main concern is the lack of the source phys addr passed to the dma_unmap_phys() 
> function and I'm aware that this might complicate a bit code conversion 
> from old dma_map/unmap_page() API.
> 
> Best regards
> -- 
> Marek Szyprowski, PhD
> Samsung R&D Institute Poland
> 
>

Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API

Posted by Leon Romanovsky 3 months ago

On Fri, Jun 27, 2025 at 08:02:13PM +0300, Leon Romanovsky wrote:
> On Fri, Jun 27, 2025 at 03:44:10PM +0200, Marek Szyprowski wrote:
> > On 25.06.2025 15:18, Leon Romanovsky wrote:
> > > This series refactors the DMA mapping to use physical addresses
> > > as the primary interface instead of page+offset parameters. This
> > > change aligns the DMA API with the underlying hardware reality where
> > > DMA operations work with physical addresses, not page structures.
> > >
> > > The series consists of 8 patches that progressively convert the DMA
> > > mapping infrastructure from page-based to physical address-based APIs:
> > >
> > > The series maintains backward compatibility by keeping the old
> > > page-based API as wrapper functions around the new physical
> > > address-based implementations.
> > 
> > Thanks for this rework! I assume that the next step is to add map_phys 
> > callback also to the dma_map_ops and teach various dma-mapping providers 
> > to use it to avoid more phys-to-page-to-phys conversions.
> 
> Probably Christoph will say yes, however I personally don't see any
> benefit in this. Maybe I wrong here, but all existing .map_page()
> implementation platforms don't support p2p anyway. They won't benefit
> from this such conversion.
> 
> > 
> > I only wonder if this newly introduced dma_map_phys()/dma_unmap_phys() 
> > API is also suitable for the recently discussed PCI P2P DMA? While 
> > adding a new API maybe we should take this into account?
> 
> First, immediate user (not related to p2p) is blk layer:
> https://lore.kernel.org/linux-nvme/bcdcb5eb-17ed-412f-bf5c-303079798fe2@nvidia.com/T/#m7e715697d4b2e3997622a3400243477c75cab406
> 
> +static bool blk_dma_map_direct(struct request *req, struct device *dma_dev,
> +		struct blk_dma_iter *iter, struct phys_vec *vec)
> +{
> +	iter->addr = dma_map_page(dma_dev, phys_to_page(vec->paddr),
> +			offset_in_page(vec->paddr), vec->len, rq_dma_dir(req));
> +	if (dma_mapping_error(dma_dev, iter->addr)) {
> +		iter->status = BLK_STS_RESOURCE;
> +		return false;
> +	}
> +	iter->len = vec->len;
> +	return true;
> +}
> 
> Block layer started to store phys addresses instead of struct pages and
> this phys_to_page() conversion in data-path will be avoided.

I almost completed main user of this dma_map_phys() callback. It is
rewrite of this patch [PATCH v3 3/3] vfio/pci: Allow MMIO regions to be exported through dma-buf
https://lore.kernel.org/all/20250307052248.405803-4-vivek.kasireddy@intel.com/

Whole populate_sgt()->dma_map_resource() block looks differently now and
it is relying on dma_map_phys() as we are exporting memory without
struct pages. It will be something like this:

   89         for (i = 0; i < priv->nr_ranges; i++) {
   90                 phys = pci_resource_start(priv->vdev->pdev,
   91                                           dma_ranges[i].region_index);
   92                 phys += dma_ranges[i].offset;
   93
   94                 if (priv->bus_addr) {
   95                         addr = pci_p2pdma_bus_addr_map(&p2pdma_state, phys);
   96                         fill_sg_entry(sgl, dma_ranges[i].length, addr);
   97                         sgl = sg_next(sgl);
   98                 } else if (dma_use_iova(&priv->state)) {
   99                         ret = dma_iova_link(attachment->dev, &priv->state, phys,
  100                                             priv->mapped_len,
  101                                             dma_ranges[i].length, dir, attrs);
  102                         if (ret)
  103                                 goto err_unmap_dma;
  104
  105                         priv->mapped_len += dma_ranges[i].length;
  106                 } else {
  107                         addr = dma_map_phys(attachment->dev, phys, 0,
  108                                             dma_ranges[i].length, dir, attrs);
  109                         ret = dma_mapping_error(attachment->dev, addr);
  110                         if (ret)
  111                                 goto unmap_dma_buf;
  112
  113                         fill_sg_entry(sgl, dma_ranges[i].length, addr);
  114                         sgl = sg_next(sgl);
  115                 }
  116         }
  117
  118         if (dma_use_iova(&priv->state) && !priv->bus_addr) {
  119                 ret = dma_iova_sync(attachment->dev, &pri->state, 0,
  120                                     priv->mapped_len);
  121                 if (ret)
  122                         goto err_unmap_dma;
  123
  124                 fill_sg_entry(sgl, priv->mapped_len, priv->state.addr);
  125         }

> 
> > My main concern is the lack of the source phys addr passed to the dma_unmap_phys() 
> > function and I'm aware that this might complicate a bit code conversion 
> > from old dma_map/unmap_page() API.

It is not needed for now, all p2p logic is external to DMA API.

Thanks

> > 
> > Best regards
> > -- 
> > Marek Szyprowski, PhD
> > Samsung R&D Institute Poland
> > 
> > 
>

Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API

Posted by Christoph Hellwig 3 months, 1 week ago

On Fri, Jun 27, 2025 at 08:02:13PM +0300, Leon Romanovsky wrote:
> > Thanks for this rework! I assume that the next step is to add map_phys 
> > callback also to the dma_map_ops and teach various dma-mapping providers 
> > to use it to avoid more phys-to-page-to-phys conversions.
> 
> Probably Christoph will say yes, however I personally don't see any
> benefit in this. Maybe I wrong here, but all existing .map_page()
> implementation platforms don't support p2p anyway. They won't benefit
> from this such conversion.

I think that conversion should eventually happen, and rather sooner than
later.

Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API

Posted by Marek Szyprowski 3 months ago

On 30.06.2025 15:38, Christoph Hellwig wrote:
> On Fri, Jun 27, 2025 at 08:02:13PM +0300, Leon Romanovsky wrote:
>>> Thanks for this rework! I assume that the next step is to add map_phys
>>> callback also to the dma_map_ops and teach various dma-mapping providers
>>> to use it to avoid more phys-to-page-to-phys conversions.
>> Probably Christoph will say yes, however I personally don't see any
>> benefit in this. Maybe I wrong here, but all existing .map_page()
>> implementation platforms don't support p2p anyway. They won't benefit
>> from this such conversion.
> I think that conversion should eventually happen, and rather sooner than
> later.

Agreed.

Applied patches 1-7 to my dma-mapping-next branch. Let me know if one 
needs a stable branch with it.

Leon, it would be great if You could also prepare an incremental patch 
adding map_phys callback to the dma_maps_ops, so the individual 
arch-specific dma-mapping providers can be then converted (or simplified 
in many cases) too.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API

Posted by Robin Murphy 2 months, 1 week ago

On 2025-07-08 11:27 am, Marek Szyprowski wrote:
> On 30.06.2025 15:38, Christoph Hellwig wrote:
>> On Fri, Jun 27, 2025 at 08:02:13PM +0300, Leon Romanovsky wrote:
>>>> Thanks for this rework! I assume that the next step is to add map_phys
>>>> callback also to the dma_map_ops and teach various dma-mapping providers
>>>> to use it to avoid more phys-to-page-to-phys conversions.
>>> Probably Christoph will say yes, however I personally don't see any
>>> benefit in this. Maybe I wrong here, but all existing .map_page()
>>> implementation platforms don't support p2p anyway. They won't benefit
>>> from this such conversion.
>> I think that conversion should eventually happen, and rather sooner than
>> later.
> 
> Agreed.
> 
> Applied patches 1-7 to my dma-mapping-next branch. Let me know if one
> needs a stable branch with it.

As the maintainer of iommu-dma, please drop the iommu-dma patch because 
it is broken. It does not in any way remove the struct page dependency 
from iommu-dma, it merely hides it so things can crash more easily in 
circumstances that clearly nobody's bothered to test.

> Leon, it would be great if You could also prepare an incremental patch
> adding map_phys callback to the dma_maps_ops, so the individual
> arch-specific dma-mapping providers can be then converted (or simplified
> in many cases) too.

Marek, I'm surprised that even you aren't seeing why that would at best 
be pointless churn. The fundamental design of dma_map_page() operating 
on struct page is that it sits in between alloc_pages() at the caller 
and kmap_atomic() deep down in the DMA API implementation (which also 
subsumes any dependencies on having a kernel virtual address at the 
implementation end). The natural working unit for whatever replaces 
dma_map_page() will be whatever the replacement for alloc_pages() 
returns, and the replacement for kmap_atomic() operates on. Until that 
exists (and I simply cannot believe it would be an unadorned physical 
address) there cannot be any *meaningful* progress made towards removing 
the struct page dependency from the DMA API. If there is also a goal to 
kill off highmem before then, then logically we should just wait for 
that to land, then revert back to dma_map_single() being the first-class 
interface, and dma_map_page() can turn into a trivial page_to_virt() 
wrapper for the long tail of caller conversions.

Simply obfuscating the struct page dependency today by dressing it up as 
a phys_addr_t with implicit baggage is not not in any way helpful. It 
only makes the code harder to understand and more bug-prone. Despite the 
disingenuous claims, it is quite blatantly the opposite of "efficient" 
for callers to do extra work to throw away useful information with 
page_to_phys(), and the implementation then have to re-derive that 
information with pfn_valid()/phys_to_page().

And by "bug-prone" I also include greater distractions like this 
misguided idea that the same API could somehow work for non-memory 
addresses too, so then everyone can move on bikeshedding VFIO while 
overlooking the fundamental flaws in the whole premise. I mean, besides 
all the issues I've already pointed out in that regard, not least the 
glaring fact that it's literally just a worse version of *an API we 
already have*, as DMA API maintainer do you *really* approve of a design 
that depends on callers abusing DMA_ATTR_SKIP_CPU_SYNC, yet will still 
readily blow up if they did then call a dma_sync op?

Thanks,
Robin.

Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API

Posted by Matthew Wilcox 2 months, 1 week ago

Hi Robin,

I don't know the DMA mapping code well and haven't reviewed this
patch set in particular, but I wanted to comment on some of the things
you say here.

> Marek, I'm surprised that even you aren't seeing why that would at best be
> pointless churn. The fundamental design of dma_map_page() operating on
> struct page is that it sits in between alloc_pages() at the caller and
> kmap_atomic() deep down in the DMA API implementation (which also subsumes
> any dependencies on having a kernel virtual address at the implementation
> end). The natural working unit for whatever replaces dma_map_page() will be
> whatever the replacement for alloc_pages() returns, and the replacement for
> kmap_atomic() operates on. Until that exists (and I simply cannot believe it
> would be an unadorned physical address) there cannot be any *meaningful*
> progress made towards removing the struct page dependency from the DMA API.
> If there is also a goal to kill off highmem before then, then logically we
> should just wait for that to land, then revert back to dma_map_single()
> being the first-class interface, and dma_map_page() can turn into a trivial
> page_to_virt() wrapper for the long tail of caller conversions.

While I'm sure we'd all love to kill off highmem, that's not a realistic
goal for another ten years or so.  There are meaningful improvements we
can make, for example pulling page tables out of highmem, but we need to
keep file data and anonymous memory in highmem, so we'll need to support
DMA to highmem for the foreseeable future.

The replacement for kmap_atomic() is already here -- it's
kmap_(atomic|local)_pfn().  If a simple wrapper like kmap_local_phys()
would make this more palatable, that would be fine by me.  Might save
a bit of messing around with calculating offsets in each caller.

As far as replacing alloc_pages() goes, some callers will still use
alloc_pages().  Others will use folio_alloc() or have used kmalloc().
Or maybe the caller won't have used any kind of page allocation because
they're doing I/O to something that isn't part of Linux's memory at all.
Part of the Grand Plan here is for Linux to catch up with Xen's ability
to do I/O to guests without allocating struct pages for every page of
memory in the guests.

You say that a physical address will need some adornment -- can you
elaborate on that for me?  It may be that I'm missing something
important here.

Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API

Posted by Jason Gunthorpe 2 months ago

On Thu, Jul 31, 2025 at 06:37:11PM +0100, Matthew Wilcox wrote:

> The replacement for kmap_atomic() is already here -- it's
> kmap_(atomic|local)_pfn().  If a simple wrapper like kmap_local_phys()
> would make this more palatable, that would be fine by me.  Might save
> a bit of messing around with calculating offsets in each caller.

I think that makes the general plan clearer. We should be removing the
struct pages entirely from the insides of DMA API layer and use the
phys_addr_t, kmap_XX_phys(), phys_to_virt(), and so on.

The request from Christoph and Marek to clean up the dma_ops makes
sense in that context, we'd have to go into the ops and replace the
struct page kmaps/etc with the phys based ones.

This hides the struct page requirement to get to a KVA inside the core
mm code only and that sort of modularity is exactly the sort of thing
that could help entirely remove a struct page requirement for some
kinds of DMA someday.

Matthew, do you think it makes sense to introduce types to make this
clearer? We have two kinds of values that a phys_addr_t can store -
something compatible with kmap_XX_phys(), and something that isn't.

This was recently a long discussion in ARM KVM as well which had a
similar confusion that a phys_addr_t was actually two very different
things inside its logic.

So what about some dedicated types:
 kphys_addr_t - A physical address that can be passed to
     kmap_XX_phys(), phys_to_virt(), etc.

 raw_phys_addr_t - A physical address that may not be cachable, may
     not be DRAM, and does not work with kmap_XX_phys()/etc.

We clearly have these two different ideas floating around in code,
page tables, etc.

I read some of Robin's concern that the struct page provided a certain
amount of type safety in the DMA API, this could provide similar.

Thanks,
Jason

Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API

Posted by Matthew Wilcox 2 months ago

On Sun, Aug 03, 2025 at 12:59:06PM -0300, Jason Gunthorpe wrote:
> Matthew, do you think it makes sense to introduce types to make this
> clearer? We have two kinds of values that a phys_addr_t can store -
> something compatible with kmap_XX_phys(), and something that isn't.

I was with you up until this point.  And then you said "What if we have
a raccoon that isn't a raccoon" and my brain derailed.

> This was recently a long discussion in ARM KVM as well which had a
> similar confusion that a phys_addr_t was actually two very different
> things inside its logic.

No.  A phys_addr_t is a phys_addr_t.  If something's abusing a
phys_addr_t to store something entirely different then THAT is what
should be using a different type.  We've defined what a phys_addr_t
is.  That was in Documentation/core-api/bus-virt-phys-mapping.rst
before Arnd removed it; to excerpt the relevant bit:

---

- CPU untranslated.  This is the "physical" address.  Physical address
  0 is what the CPU sees when it drives zeroes on the memory bus.

[...]
So why do we care about the physical address at all? We do need the physical
address in some cases, it's just not very often in normal code.  The physical
address is needed if you use memory mappings, for example, because the
"remap_pfn_range()" mm function wants the physical address of the memory to
be remapped as measured in units of pages, a.k.a. the pfn.

---

So if somebody is stuffing something else into phys_addr_t, *THAT* is
what needs to be fixed, not adding a new sub-type of phys_addr_t for
things which are actually phys_addr_t.

> We clearly have these two different ideas floating around in code,
> page tables, etc.

No.  No, we don't.  I've never heard of this asininity before.

Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API

Posted by Jason Gunthorpe 2 months ago

On Mon, Aug 04, 2025 at 04:37:56AM +0100, Matthew Wilcox wrote:
> On Sun, Aug 03, 2025 at 12:59:06PM -0300, Jason Gunthorpe wrote:
> > Matthew, do you think it makes sense to introduce types to make this
> > clearer? We have two kinds of values that a phys_addr_t can store -
> > something compatible with kmap_XX_phys(), and something that isn't.
> 
> I was with you up until this point.  And then you said "What if we have
> a raccoon that isn't a raccoon" and my brain derailed.

I though it was clear..

   kmap_local_pfn(phys >> PAGE_SHIFT)
   phys_to_virt(phys)

Does not work for all values of phys. It definately illegal for
non-cachable MMIO. Agree?

There is a subset of phys that is cachable and has struct page that is
usable with kmap_local_pfn()/etc

phys is always this:

> - CPU untranslated.  This is the "physical" address.  Physical address
>   0 is what the CPU sees when it drives zeroes on the memory bus.

But that is a pure HW perspective. It doesn't say which of our SW APIs
are allowed to use this address.

We have callchains in DMA API land that want to do a kmap at the
bottom. It would be nice to mark the whole call chain that the
phys_addr being passed around is actually required to be kmappable.

Because if you pass a non-kmappable MMIO backed phys it will explode
in some way on some platforms.

> > We clearly have these two different ideas floating around in code,
> > page tables, etc.

> No.  No, we don't.  I've never heard of this asininity before.

Welcome to the fun world of cachable and non-cachable memory.

Consider, today we can create struct pages of type
MEMORY_DEVICE_PCI_P2PDMA for non-cachable MMIO. I think today you
"can" use kmap to establish a cachable mapping in the vmap.

But it is *illegal* to establish a cachable CPU mapping of MMIO. Archs
are free to MCE if you do this - speculative cache line load of MMIO
can just error in HW inside the interconnect.

So, the phys_addr is always a "CPU untranslated physical address" but
the cachable/non-cachable cases, or DRAM vs MMIO, are sometimes
semantically very different things for the SW!

Jason

Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API

Posted by Marek Szyprowski 2 months, 1 week ago

On 30.07.2025 13:11, Robin Murphy wrote:
> On 2025-07-08 11:27 am, Marek Szyprowski wrote:
>> On 30.06.2025 15:38, Christoph Hellwig wrote:
>>> On Fri, Jun 27, 2025 at 08:02:13PM +0300, Leon Romanovsky wrote:
>>>>> Thanks for this rework! I assume that the next step is to add 
>>>>> map_phys
>>>>> callback also to the dma_map_ops and teach various dma-mapping 
>>>>> providers
>>>>> to use it to avoid more phys-to-page-to-phys conversions.
>>>> Probably Christoph will say yes, however I personally don't see any
>>>> benefit in this. Maybe I wrong here, but all existing .map_page()
>>>> implementation platforms don't support p2p anyway. They won't benefit
>>>> from this such conversion.
>>> I think that conversion should eventually happen, and rather sooner 
>>> than
>>> later.
>>
>> Agreed.
>>
>> Applied patches 1-7 to my dma-mapping-next branch. Let me know if one
>> needs a stable branch with it.
>
> As the maintainer of iommu-dma, please drop the iommu-dma patch 
> because it is broken. It does not in any way remove the struct page 
> dependency from iommu-dma, it merely hides it so things can crash more 
> easily in circumstances that clearly nobody's bothered to test.
>
>> Leon, it would be great if You could also prepare an incremental patch
>> adding map_phys callback to the dma_maps_ops, so the individual
>> arch-specific dma-mapping providers can be then converted (or simplified
>> in many cases) too.
>
> Marek, I'm surprised that even you aren't seeing why that would at 
> best be pointless churn. The fundamental design of dma_map_page() 
> operating on struct page is that it sits in between alloc_pages() at 
> the caller and kmap_atomic() deep down in the DMA API implementation 
> (which also subsumes any dependencies on having a kernel virtual 
> address at the implementation end). The natural working unit for 
> whatever replaces dma_map_page() will be whatever the replacement for 
> alloc_pages() returns, and the replacement for kmap_atomic() operates 
> on. Until that exists (and I simply cannot believe it would be an 
> unadorned physical address) there cannot be any *meaningful* progress 
> made towards removing the struct page dependency from the DMA API. If 
> there is also a goal to kill off highmem before then, then logically 
> we should just wait for that to land, then revert back to 
> dma_map_single() being the first-class interface, and dma_map_page() 
> can turn into a trivial page_to_virt() wrapper for the long tail of 
> caller conversions.
>
> Simply obfuscating the struct page dependency today by dressing it up 
> as a phys_addr_t with implicit baggage is not not in any way helpful. 
> It only makes the code harder to understand and more bug-prone. 
> Despite the disingenuous claims, it is quite blatantly the opposite of 
> "efficient" for callers to do extra work to throw away useful 
> information with page_to_phys(), and the implementation then have to 
> re-derive that information with pfn_valid()/phys_to_page().
>
> And by "bug-prone" I also include greater distractions like this 
> misguided idea that the same API could somehow work for non-memory 
> addresses too, so then everyone can move on bikeshedding VFIO while 
> overlooking the fundamental flaws in the whole premise. I mean, 
> besides all the issues I've already pointed out in that regard, not 
> least the glaring fact that it's literally just a worse version of *an 
> API we already have*, as DMA API maintainer do you *really* approve of 
> a design that depends on callers abusing DMA_ATTR_SKIP_CPU_SYNC, yet 
> will still readily blow up if they did then call a dma_sync op?
>
Robin, Your concerns are right. I missed the fact that making everything 
depend on phys_addr_t would make DMA-mapping API prone for various 
abuses. I need to think a bit more on this and try to understand more 
the PCI P2P case, what means that I will probably miss this merge 
window. I'm sorry for the lack of being active in the discussion, but I 
just got back from my holidays and I'm trying to catch up.


Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API

Posted by Leon Romanovsky 2 months, 1 week ago

On Wed, Jul 30, 2025 at 12:11:32PM +0100, Robin Murphy wrote:
> On 2025-07-08 11:27 am, Marek Szyprowski wrote:
> > On 30.06.2025 15:38, Christoph Hellwig wrote:
> > > On Fri, Jun 27, 2025 at 08:02:13PM +0300, Leon Romanovsky wrote:
> > > > > Thanks for this rework! I assume that the next step is to add map_phys
> > > > > callback also to the dma_map_ops and teach various dma-mapping providers
> > > > > to use it to avoid more phys-to-page-to-phys conversions.
> > > > Probably Christoph will say yes, however I personally don't see any
> > > > benefit in this. Maybe I wrong here, but all existing .map_page()
> > > > implementation platforms don't support p2p anyway. They won't benefit
> > > > from this such conversion.
> > > I think that conversion should eventually happen, and rather sooner than
> > > later.
> > 
> > Agreed.
> > 
> > Applied patches 1-7 to my dma-mapping-next branch. Let me know if one
> > needs a stable branch with it.
> 
> As the maintainer of iommu-dma, please drop the iommu-dma patch because it
> is broken. It does not in any way remove the struct page dependency from
> iommu-dma, it merely hides it so things can crash more easily in
> circumstances that clearly nobody's bothered to test.
> 
> > Leon, it would be great if You could also prepare an incremental patch
> > adding map_phys callback to the dma_maps_ops, so the individual
> > arch-specific dma-mapping providers can be then converted (or simplified
> > in many cases) too.
> 
> Marek, I'm surprised that even you aren't seeing why that would at best be
> pointless churn. The fundamental design of dma_map_page() operating on
> struct page is that it sits in between alloc_pages() at the caller and
> kmap_atomic() deep down in the DMA API implementation (which also subsumes
> any dependencies on having a kernel virtual address at the implementation
> end). The natural working unit for whatever replaces dma_map_page() will be
> whatever the replacement for alloc_pages() returns, and the replacement for
> kmap_atomic() operates on. Until that exists (and I simply cannot believe it
> would be an unadorned physical address) there cannot be any *meaningful*
> progress made towards removing the struct page dependency from the DMA API.
> If there is also a goal to kill off highmem before then, then logically we
> should just wait for that to land, then revert back to dma_map_single()
> being the first-class interface, and dma_map_page() can turn into a trivial
> page_to_virt() wrapper for the long tail of caller conversions.
> 
> Simply obfuscating the struct page dependency today by dressing it up as a
> phys_addr_t with implicit baggage is not not in any way helpful. It only
> makes the code harder to understand and more bug-prone. Despite the
> disingenuous claims, it is quite blatantly the opposite of "efficient" for
> callers to do extra work to throw away useful information with
> page_to_phys(), and the implementation then have to re-derive that
> information with pfn_valid()/phys_to_page().
> 
> And by "bug-prone" I also include greater distractions like this misguided
> idea that the same API could somehow work for non-memory addresses too, so
> then everyone can move on bikeshedding VFIO while overlooking the
> fundamental flaws in the whole premise. I mean, besides all the issues I've
> already pointed out in that regard, not least the glaring fact that it's
> literally just a worse version of *an API we already have*, as DMA API
> maintainer do you *really* approve of a design that depends on callers
> abusing DMA_ATTR_SKIP_CPU_SYNC, yet will still readily blow up if they did
> then call a dma_sync op?

Robin, Marek

I would like to ask you to do not drop this series and allow me to
gradually change the code during my VFIO DMABUF adventure.

The most reasonable way to prevent DMA_ATTR_SKIP_CPU_SYNC leakage is to
introduce new DMA attribute (let's call it DMA_ATTR_MMIO for now) and
pass it to both dma_map_phys() and dma_iova_link(). This flag will
indicate that p2p type is PCI_P2PDMA_MAP_THRU_HOST_BRIDGE and call to
right callbacks which will set IOMMU_MMIO flag and skip CPU sync,

dma_map_phys() isn't entirely wrong, it just needs an extra tweaks.

Thanks

> 
> Thanks,
> Robin.
>

Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API

Posted by Jason Gunthorpe 2 months, 1 week ago

On Wed, Jul 30, 2025 at 04:40:26PM +0300, Leon Romanovsky wrote:

> > The natural working unit for whatever replaces dma_map_page() will be
> > whatever the replacement for alloc_pages() returns, and the replacement for
> > kmap_atomic() operates on. Until that exists (and I simply cannot believe it
> > would be an unadorned physical address) there cannot be any
> > *meaningful*

alloc_pages becomes legacy.

There will be some new API 'memdesc alloc'. If I understand Matthew's
plan properly - here is a sketch of changing iommu-pages:

--- a/drivers/iommu/iommu-pages.c
+++ b/drivers/iommu/iommu-pages.c
@@ -36,9 +36,10 @@ static_assert(sizeof(struct ioptdesc) <= sizeof(struct page));
  */
 void *iommu_alloc_pages_node_sz(int nid, gfp_t gfp, size_t size)
 {
+       struct ioptdesc *desc;
        unsigned long pgcnt;
-       struct folio *folio;
        unsigned int order;
+       void *addr;

        /* This uses page_address() on the memory. */
        if (WARN_ON(gfp & __GFP_HIGHMEM))
@@ -56,8 +57,8 @@ void *iommu_alloc_pages_node_sz(int nid, gfp_t gfp, size_t size)
        if (nid == NUMA_NO_NODE)
                nid = numa_mem_id();

-       folio = __folio_alloc_node(gfp | __GFP_ZERO, order, nid);
-       if (unlikely(!folio))
+       addr = memdesc_alloc_pages(&desc, gfp | __GFP_ZERO, order, nid);
+       if (unlikely(!addr))
                return NULL;

        /*
@@ -73,7 +74,7 @@ void *iommu_alloc_pages_node_sz(int nid, gfp_t gfp, size_t size)
        mod_node_page_state(folio_pgdat(folio), NR_IOMMU_PAGES, pgcnt);
        lruvec_stat_mod_folio(folio, NR_SECONDARY_PAGETABLE, pgcnt);

-       return folio_address(folio);
+       return addr;
 }

Where the memdesc_alloc_pages() will kmalloc a 'struct ioptdesc' and
some other change so that virt_to_ioptdesc() indirects through a new
memdesc. See here:

https://kernelnewbies.org/MatthewWilcox/Memdescs

We don't end up with some kind of catch-all struct to mean 'cachable
CPU memory' anymore because every user gets their own unique "struct
XXXdesc". So the thinking has been that the phys_addr_t is the best
option. I guess the alternative would be the memdesc as a handle, but
I'm not sure that is such a good idea. 

People still express a desire to be able to do IO to cachable memory
that has a KVA through phys_to_virt but no memdesc/page allocation. I
don't know if this will happen but it doesn't seem like a good idea to
make it impossible by forcing memdesc types into low level APIs that
don't use them.

Also, the bio/scatterlist code between pin_user_pages() and DMA
mapping is consolidating physical contiguity. This runs faster if you
don't have to to page_to_phys() because everything is already
phys_addr_t.

> > progress made towards removing the struct page dependency from the DMA API.
> > If there is also a goal to kill off highmem before then, then logically we
> > should just wait for that to land, then revert back to dma_map_single()
> > being the first-class interface, and dma_map_page() can turn into a trivial
> > page_to_virt() wrapper for the long tail of caller conversions.

As I said there are many many projects related here and we can
meaningfully make progress in parts. It is not functionally harmful to
do the phys to page conversion before calling the legacy
dma_ops/SWIOTLB etc. This avoids creating patch dependencies with
highmem removal and other projects.

So long as the legacy things (highmem, dma_ops, etc) continue to work
I think it is OK to accept some obfuscation to allow the modern things
to work better. The majority flow - no highmem, no dma ops, no
swiotlb, does not require struct page. Having to do

  PTE -> phys -> page -> phys -> DMA

Does have a cost.

> The most reasonable way to prevent DMA_ATTR_SKIP_CPU_SYNC leakage is to
> introduce new DMA attribute (let's call it DMA_ATTR_MMIO for now) and
> pass it to both dma_map_phys() and dma_iova_link(). This flag will
> indicate that p2p type is PCI_P2PDMA_MAP_THRU_HOST_BRIDGE and call to
> right callbacks which will set IOMMU_MMIO flag and skip CPU sync,

So the idea is if the memory is non-cachable, no-KVA you'd call
dma_iova_link(phys_addr, DMA_ATTR_MMIO) and dma_map_phys(phys_addr,
DMA_ATTR_MMIO) ?

And then internally the dma_ops and dma_iommu would use the existing
map_page/map_resource variations based on the flag, thus ensuring that
MMIO is never kmap'd or cache flushed?

dma_map_resource is really then just
dma_map_phys(phys_addr, DMA_ATTR_MMIO)?

I like this, I think it well addresses the concerns.

Jason

Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API

Posted by Leon Romanovsky 2 months, 1 week ago

On Wed, Jul 30, 2025 at 11:28:18AM -0300, Jason Gunthorpe wrote:
> On Wed, Jul 30, 2025 at 04:40:26PM +0300, Leon Romanovsky wrote:

<...>

> > The most reasonable way to prevent DMA_ATTR_SKIP_CPU_SYNC leakage is to
> > introduce new DMA attribute (let's call it DMA_ATTR_MMIO for now) and
> > pass it to both dma_map_phys() and dma_iova_link(). This flag will
> > indicate that p2p type is PCI_P2PDMA_MAP_THRU_HOST_BRIDGE and call to
> > right callbacks which will set IOMMU_MMIO flag and skip CPU sync,
> 
> So the idea is if the memory is non-cachable, no-KVA you'd call
> dma_iova_link(phys_addr, DMA_ATTR_MMIO) and dma_map_phys(phys_addr,
> DMA_ATTR_MMIO) ?

Yes

> 
> And then internally the dma_ops and dma_iommu would use the existing
> map_page/map_resource variations based on the flag, thus ensuring that
> MMIO is never kmap'd or cache flushed?
> 
> dma_map_resource is really then just
> dma_map_phys(phys_addr, DMA_ATTR_MMIO)?
> 
> I like this, I think it well addresses the concerns.

Yes, I had this idea and implementation before. :(

> 
> Jason
>

Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API

Posted by Leon Romanovsky 3 months ago

On Tue, Jul 08, 2025 at 12:27:09PM +0200, Marek Szyprowski wrote:
> On 30.06.2025 15:38, Christoph Hellwig wrote:
> > On Fri, Jun 27, 2025 at 08:02:13PM +0300, Leon Romanovsky wrote:
> >>> Thanks for this rework! I assume that the next step is to add map_phys
> >>> callback also to the dma_map_ops and teach various dma-mapping providers
> >>> to use it to avoid more phys-to-page-to-phys conversions.
> >> Probably Christoph will say yes, however I personally don't see any
> >> benefit in this. Maybe I wrong here, but all existing .map_page()
> >> implementation platforms don't support p2p anyway. They won't benefit
> >> from this such conversion.
> > I think that conversion should eventually happen, and rather sooner than
> > later.
> 
> Agreed.
> 
> Applied patches 1-7 to my dma-mapping-next branch. Let me know if one 
> needs a stable branch with it.

Thanks a lot, I don't think that stable branch is needed. Realistically
speaking, my VFIO DMA work won't be merged this cycle, We are in -rc5,
it is complete rewrite from RFC version and touches pci-p2p code (to
remove dependency on struct page) in addition to VFIO, so it will take
time.

Regarding, last patch (hmm), it will be great if you can take it.
We didn't touch anything in hmm.c this cycle and have no plans to send PR.
It can safely go through your tree.

> 
> Leon, it would be great if You could also prepare an incremental patch 
> adding map_phys callback to the dma_maps_ops, so the individual 
> arch-specific dma-mapping providers can be then converted (or simplified 
> in many cases) too.

Sure, will do.

> 
> Best regards
> -- 
> Marek Szyprowski, PhD
> Samsung R&D Institute Poland
>

Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API

Posted by Marek Szyprowski 3 months ago

On 08.07.2025 13:00, Leon Romanovsky wrote:
> On Tue, Jul 08, 2025 at 12:27:09PM +0200, Marek Szyprowski wrote:
>> On 30.06.2025 15:38, Christoph Hellwig wrote:
>>> On Fri, Jun 27, 2025 at 08:02:13PM +0300, Leon Romanovsky wrote:
>>>>> Thanks for this rework! I assume that the next step is to add map_phys
>>>>> callback also to the dma_map_ops and teach various dma-mapping providers
>>>>> to use it to avoid more phys-to-page-to-phys conversions.
>>>> Probably Christoph will say yes, however I personally don't see any
>>>> benefit in this. Maybe I wrong here, but all existing .map_page()
>>>> implementation platforms don't support p2p anyway. They won't benefit
>>>> from this such conversion.
>>> I think that conversion should eventually happen, and rather sooner than
>>> later.
>> Agreed.
>>
>> Applied patches 1-7 to my dma-mapping-next branch. Let me know if one
>> needs a stable branch with it.
> Thanks a lot, I don't think that stable branch is needed. Realistically
> speaking, my VFIO DMA work won't be merged this cycle, We are in -rc5,
> it is complete rewrite from RFC version and touches pci-p2p code (to
> remove dependency on struct page) in addition to VFIO, so it will take
> time.
>
> Regarding, last patch (hmm), it will be great if you can take it.
> We didn't touch anything in hmm.c this cycle and have no plans to send PR.
> It can safely go through your tree.

Okay, then I would like to get an explicit ack from Jérôme for this.

>> Leon, it would be great if You could also prepare an incremental patch
>> adding map_phys callback to the dma_maps_ops, so the individual
>> arch-specific dma-mapping providers can be then converted (or simplified
>> in many cases) too.
> Sure, will do.

Thanks!

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API

Posted by Leon Romanovsky 3 months ago

On Tue, Jul 08, 2025 at 01:45:20PM +0200, Marek Szyprowski wrote:
> On 08.07.2025 13:00, Leon Romanovsky wrote:
> > On Tue, Jul 08, 2025 at 12:27:09PM +0200, Marek Szyprowski wrote:
> >> On 30.06.2025 15:38, Christoph Hellwig wrote:
> >>> On Fri, Jun 27, 2025 at 08:02:13PM +0300, Leon Romanovsky wrote:
> >>>>> Thanks for this rework! I assume that the next step is to add map_phys
> >>>>> callback also to the dma_map_ops and teach various dma-mapping providers
> >>>>> to use it to avoid more phys-to-page-to-phys conversions.
> >>>> Probably Christoph will say yes, however I personally don't see any
> >>>> benefit in this. Maybe I wrong here, but all existing .map_page()
> >>>> implementation platforms don't support p2p anyway. They won't benefit
> >>>> from this such conversion.
> >>> I think that conversion should eventually happen, and rather sooner than
> >>> later.
> >> Agreed.
> >>
> >> Applied patches 1-7 to my dma-mapping-next branch. Let me know if one
> >> needs a stable branch with it.
> > Thanks a lot, I don't think that stable branch is needed. Realistically
> > speaking, my VFIO DMA work won't be merged this cycle, We are in -rc5,
> > it is complete rewrite from RFC version and touches pci-p2p code (to
> > remove dependency on struct page) in addition to VFIO, so it will take
> > time.
> >
> > Regarding, last patch (hmm), it will be great if you can take it.
> > We didn't touch anything in hmm.c this cycle and have no plans to send PR.
> > It can safely go through your tree.
> 
> Okay, then I would like to get an explicit ack from Jérôme for this.

Jerome is not active in HMM world for a long time already.
HMM tree is managed by us (RDMA) https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=hmm
➜  kernel git:(m/dmabuf-vfio) git log --merges mm/hmm.c
...
Pull HMM updates from Jason Gunthorpe:
...

https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=58ba80c4740212c29a1cf9b48f588e60a7612209
+hmm		git	git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git#hmm

We just never bothered to reflect current situation in MAINTAINERS file.

Thanks

Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API

Posted by Marek Szyprowski 3 months ago

On 08.07.2025 14:06, Leon Romanovsky wrote:
> On Tue, Jul 08, 2025 at 01:45:20PM +0200, Marek Szyprowski wrote:
>> On 08.07.2025 13:00, Leon Romanovsky wrote:
>>> On Tue, Jul 08, 2025 at 12:27:09PM +0200, Marek Szyprowski wrote:
>>>> On 30.06.2025 15:38, Christoph Hellwig wrote:
>>>>> On Fri, Jun 27, 2025 at 08:02:13PM +0300, Leon Romanovsky wrote:
>>>>>>> Thanks for this rework! I assume that the next step is to add map_phys
>>>>>>> callback also to the dma_map_ops and teach various dma-mapping providers
>>>>>>> to use it to avoid more phys-to-page-to-phys conversions.
>>>>>> Probably Christoph will say yes, however I personally don't see any
>>>>>> benefit in this. Maybe I wrong here, but all existing .map_page()
>>>>>> implementation platforms don't support p2p anyway. They won't benefit
>>>>>> from this such conversion.
>>>>> I think that conversion should eventually happen, and rather sooner than
>>>>> later.
>>>> Agreed.
>>>>
>>>> Applied patches 1-7 to my dma-mapping-next branch. Let me know if one
>>>> needs a stable branch with it.
>>> Thanks a lot, I don't think that stable branch is needed. Realistically
>>> speaking, my VFIO DMA work won't be merged this cycle, We are in -rc5,
>>> it is complete rewrite from RFC version and touches pci-p2p code (to
>>> remove dependency on struct page) in addition to VFIO, so it will take
>>> time.
>>>
>>> Regarding, last patch (hmm), it will be great if you can take it.
>>> We didn't touch anything in hmm.c this cycle and have no plans to send PR.
>>> It can safely go through your tree.
>> Okay, then I would like to get an explicit ack from Jérôme for this.
> Jerome is not active in HMM world for a long time already.
> HMM tree is managed by us (RDMA) https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=hmm
> ➜  kernel git:(m/dmabuf-vfio) git log --merges mm/hmm.c
> ...
> Pull HMM updates from Jason Gunthorpe:
> ...
>
> https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=58ba80c4740212c29a1cf9b48f588e60a7612209
> +hmm		git	git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git#hmm
>
> We just never bothered to reflect current situation in MAINTAINERS file.

Maybe this is the time to update it :)

I was just a bit confused that no-one commented the HMM patch, but if 
You maintain it, then this is okay.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API

Posted by Marek Szyprowski 3 months ago

On 08.07.2025 14:56, Marek Szyprowski wrote:
> On 08.07.2025 14:06, Leon Romanovsky wrote:
>> On Tue, Jul 08, 2025 at 01:45:20PM +0200, Marek Szyprowski wrote:
>>> On 08.07.2025 13:00, Leon Romanovsky wrote:
>>>> On Tue, Jul 08, 2025 at 12:27:09PM +0200, Marek Szyprowski wrote:
>>>>> On 30.06.2025 15:38, Christoph Hellwig wrote:
>>>>>> On Fri, Jun 27, 2025 at 08:02:13PM +0300, Leon Romanovsky wrote:
>>>>>>>> Thanks for this rework! I assume that the next step is to add 
>>>>>>>> map_phys
>>>>>>>> callback also to the dma_map_ops and teach various dma-mapping 
>>>>>>>> providers
>>>>>>>> to use it to avoid more phys-to-page-to-phys conversions.
>>>>>>> Probably Christoph will say yes, however I personally don't see any
>>>>>>> benefit in this. Maybe I wrong here, but all existing .map_page()
>>>>>>> implementation platforms don't support p2p anyway. They won't 
>>>>>>> benefit
>>>>>>> from this such conversion.
>>>>>> I think that conversion should eventually happen, and rather 
>>>>>> sooner than
>>>>>> later.
>>>>> Agreed.
>>>>>
>>>>> Applied patches 1-7 to my dma-mapping-next branch. Let me know if one
>>>>> needs a stable branch with it.
>>>> Thanks a lot, I don't think that stable branch is needed. 
>>>> Realistically
>>>> speaking, my VFIO DMA work won't be merged this cycle, We are in -rc5,
>>>> it is complete rewrite from RFC version and touches pci-p2p code (to
>>>> remove dependency on struct page) in addition to VFIO, so it will take
>>>> time.
>>>>
>>>> Regarding, last patch (hmm), it will be great if you can take it.
>>>> We didn't touch anything in hmm.c this cycle and have no plans to 
>>>> send PR.
>>>> It can safely go through your tree.
>>> Okay, then I would like to get an explicit ack from Jérôme for this.
>> Jerome is not active in HMM world for a long time already.
>> HMM tree is managed by us (RDMA) 
>> https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=hmm
>> ➜  kernel git:(m/dmabuf-vfio) git log --merges mm/hmm.c
>> ...
>> Pull HMM updates from Jason Gunthorpe:
>> ...
>>
>> https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=58ba80c4740212c29a1cf9b48f588e60a7612209 
>>
>> +hmm        git 
>> git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git#hmm
>>
>> We just never bothered to reflect current situation in MAINTAINERS file.
>
> Maybe this is the time to update it :)
>
> I was just a bit confused that no-one commented the HMM patch, but if 
> You maintain it, then this is okay.


I've applied the last patch to dma-mapping-for-next branch.


Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland