RDMA: Enable runs with DMA debug enabled

[PATCH 2/3] dma-mapping: Clarify valid conditions for CPU cache line overlap

Posted by Leon Romanovsky 1 month ago

From: Leon Romanovsky <leonro@nvidia.com>

Rename the DMA_ATTR_CPU_CACHE_CLEAN attribute to reflect that it allows
CPU cache overlaps to exist, and document a slightly different but still
valid use case involving overlapping CPU cache lines.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 Documentation/core-api/dma-attributes.rst | 26 ++++++++++++++++++--------
 drivers/virtio/virtio_ring.c              |  4 ++--
 include/linux/dma-mapping.h               |  8 ++++----
 kernel/dma/debug.c                        |  2 +-
 4 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/Documentation/core-api/dma-attributes.rst b/Documentation/core-api/dma-attributes.rst
index 1d7bfad73b1c7..6b73d92c62721 100644
--- a/Documentation/core-api/dma-attributes.rst
+++ b/Documentation/core-api/dma-attributes.rst
@@ -149,11 +149,21 @@ For architectures that require cache flushing for DMA coherence
 DMA_ATTR_MMIO will not perform any cache flushing. The address
 provided must never be mapped cacheable into the CPU.
 
-DMA_ATTR_CPU_CACHE_CLEAN
-------------------------
-
-This attribute indicates the CPU will not dirty any cacheline overlapping this
-DMA_FROM_DEVICE/DMA_BIDIRECTIONAL buffer while it is mapped. This allows
-multiple small buffers to safely share a cacheline without risk of data
-corruption, suppressing DMA debug warnings about overlapping mappings.
-All mappings sharing a cacheline should have this attribute.
+DMA_ATTR_CPU_CACHE_OVERLAP
+--------------------------
+
+This attribute indicates that CPU cache lines may overlap for buffers mapped
+with DMA_FROM_DEVICE or DMA_BIDIRECTIONAL.
+
+Such overlap may occur when callers map multiple small buffers that reside
+within the same cache line. In this case, callers must guarantee that the CPU
+will not dirty these cache lines after the mappings are established. When this
+condition is met, multiple buffers can safely share a cache line without risking
+data corruption.
+
+Another valid use case is on systems that are CPU-coherent and do not use
+SWIOTLB, where the caller can guarantee that no cache maintenance operations
+(such as flushes) will be performed that could overwrite shared cache lines.
+
+All mappings that share a cache line must set this attribute to suppress DMA
+debug warnings about overlapping mappings.
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 335692d41617a..bf51ae9a39169 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2912,7 +2912,7 @@ EXPORT_SYMBOL_GPL(virtqueue_add_inbuf);
  * @data: the token identifying the buffer.
  * @gfp: how to do memory allocations (if necessary).
  *
- * Same as virtqueue_add_inbuf but passes DMA_ATTR_CPU_CACHE_CLEAN to indicate
+ * Same as virtqueue_add_inbuf but passes DMA_ATTR_CPU_CACHE_OVERLAP to indicate
  * that the CPU will not dirty any cacheline overlapping this buffer while it
  * is available, and to suppress overlapping cacheline warnings in DMA debug
  * builds.
@@ -2928,7 +2928,7 @@ int virtqueue_add_inbuf_cache_clean(struct virtqueue *vq,
 				    gfp_t gfp)
 {
 	return virtqueue_add(vq, &sg, num, 0, 1, data, NULL, false, gfp,
-			     DMA_ATTR_CPU_CACHE_CLEAN);
+			     DMA_ATTR_CPU_CACHE_OVERLAP);
 }
 EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_cache_clean);
 
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 29973baa05816..45efede1a6cce 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -80,11 +80,11 @@
 #define DMA_ATTR_MMIO		(1UL << 10)
 
 /*
- * DMA_ATTR_CPU_CACHE_CLEAN: Indicates the CPU will not dirty any cacheline
- * overlapping this buffer while it is mapped for DMA. All mappings sharing
- * a cacheline must have this attribute for this to be considered safe.
+ * DMA_ATTR_CPU_CACHE_OVERLAP: Indicates the CPU cache line can be overlapped.
+ * All mappings sharing a cacheline must have this attribute for this
+ * to be considered safe.
  */
-#define DMA_ATTR_CPU_CACHE_CLEAN	(1UL << 11)
+#define DMA_ATTR_CPU_CACHE_OVERLAP	(1UL << 11)
 
 /*
  * A dma_addr_t can hold any valid DMA or bus address for the platform.  It can
diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index be207be749968..603be342063f1 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -601,7 +601,7 @@ static void add_dma_entry(struct dma_debug_entry *entry, unsigned long attrs)
 	unsigned long flags;
 	int rc;
 
-	entry->is_cache_clean = !!(attrs & DMA_ATTR_CPU_CACHE_CLEAN);
+	entry->is_cache_clean = attrs & DMA_ATTR_CPU_CACHE_OVERLAP;
 
 	bucket = get_hash_bucket(entry, &flags);
 	hash_bucket_add(bucket, entry);

-- 
2.53.0

Re: [PATCH 2/3] dma-mapping: Clarify valid conditions for CPU cache line overlap

Posted by Jason Gunthorpe 1 month ago

On Sat, Mar 07, 2026 at 06:49:56PM +0200, Leon Romanovsky wrote:

> -This attribute indicates the CPU will not dirty any cacheline overlapping this
> -DMA_FROM_DEVICE/DMA_BIDIRECTIONAL buffer while it is mapped. This allows
> -multiple small buffers to safely share a cacheline without risk of data
> -corruption, suppressing DMA debug warnings about overlapping mappings.
> -All mappings sharing a cacheline should have this attribute.
> +DMA_ATTR_CPU_CACHE_OVERLAP

This is a very specific and well defined use case that allows some cache
flushing behaviors to work only under the promise that the CPU doesn't
touch the memory to cause cache inconsistencies.

> +Another valid use case is on systems that are CPU-coherent and do not use
> +SWIOTLB, where the caller can guarantee that no cache maintenance operations
> +(such as flushes) will be performed that could overwrite shared cache lines.

This is something completely unrelated. 

What I would really like is a new DMA_ATTR_REQUIRE_COHERENT which
fails any mappings requests that would use any SWIOTLB or cache
flushing.

It should only be used by callers like RDMA/DRM/etc where they have
historical uAPI that has never supported incoherent DMA operation and
are an exception to the normal DMA API requirements.

The problem is to limit the use of that flag to only a few approved
places. I fear adding such a flag wide open would open the door to
widespread driver abuse. These days we have 'export symbol for module'
so maybe there is a way to do it with safety?

I'd really like this right now because CC systems are forcing SWIOTLB
and things like RDMA userspace are unfixably broken with SWIOTLB. The
uAPI it has simply cannot work with it. I'd much rather to immediate
fail than suffer data corruption. Jiri was looking at adding some
hacky "is cc" check, but I'd far prefer a proper flag that covered all
the uAPI breaking cases.

Jason

Re: [PATCH 2/3] dma-mapping: Clarify valid conditions for CPU cache line overlap

Posted by Leon Romanovsky 1 month ago

On Sun, Mar 08, 2026 at 03:19:20PM -0300, Jason Gunthorpe wrote:
> On Sat, Mar 07, 2026 at 06:49:56PM +0200, Leon Romanovsky wrote:
> 
> > -This attribute indicates the CPU will not dirty any cacheline overlapping this
> > -DMA_FROM_DEVICE/DMA_BIDIRECTIONAL buffer while it is mapped. This allows
> > -multiple small buffers to safely share a cacheline without risk of data
> > -corruption, suppressing DMA debug warnings about overlapping mappings.
> > -All mappings sharing a cacheline should have this attribute.
> > +DMA_ATTR_CPU_CACHE_OVERLAP
> 
> This is a very specific and well defined use case that allows some cache
> flushing behaviors to work only under the promise that the CPU doesn't
> touch the memory to cause cache inconsistencies.
> 
> > +Another valid use case is on systems that are CPU-coherent and do not use
> > +SWIOTLB, where the caller can guarantee that no cache maintenance operations
> > +(such as flushes) will be performed that could overwrite shared cache lines.
> 
> This is something completely unrelated. 

I disagree. The situation is equivalent in that callers guarantee the
CPU cache will not be overwritten. For the RDMA case, this results in
the same behavior as with virtio. For our case, it addresses and
clears the debug warnings.

> 
> What I would really like is a new DMA_ATTR_REQUIRE_COHERENT which
> fails any mappings requests that would use any SWIOTLB or cache
> flushing.

You are proposing something orthogonal that operates at a different layer
(DMA mapping). However, for DMA debugging, your new attribute will be
equivalent to DMA_ATTR_CPU_CACHE_OVERLAP.

Thanks

Re: [PATCH 2/3] dma-mapping: Clarify valid conditions for CPU cache line overlap

Posted by Jason Gunthorpe 1 month ago

On Sun, Mar 08, 2026 at 08:49:02PM +0200, Leon Romanovsky wrote:
> On Sun, Mar 08, 2026 at 03:19:20PM -0300, Jason Gunthorpe wrote:
> > On Sat, Mar 07, 2026 at 06:49:56PM +0200, Leon Romanovsky wrote:
> > 
> > > -This attribute indicates the CPU will not dirty any cacheline overlapping this
> > > -DMA_FROM_DEVICE/DMA_BIDIRECTIONAL buffer while it is mapped. This allows
> > > -multiple small buffers to safely share a cacheline without risk of data
> > > -corruption, suppressing DMA debug warnings about overlapping mappings.
> > > -All mappings sharing a cacheline should have this attribute.
> > > +DMA_ATTR_CPU_CACHE_OVERLAP
> > 
> > This is a very specific and well defined use case that allows some cache
> > flushing behaviors to work only under the promise that the CPU doesn't
> > touch the memory to cause cache inconsistencies.
> > 
> > > +Another valid use case is on systems that are CPU-coherent and do not use
> > > +SWIOTLB, where the caller can guarantee that no cache maintenance operations
> > > +(such as flushes) will be performed that could overwrite shared cache lines.
> > 
> > This is something completely unrelated. 
> 
> I disagree. The situation is equivalent in that callers guarantee the
> CPU cache will not be overwritten.

The RDMA callers do no such thing, they just don't work at all if
there is non-coherence in the mapping which is why it is not a bug.

virtio looks like it does actually keep the caches clean for different
mappings (and probably also in practice forced coherent as well given
qemu is coherent with the VM and VFIO doesn't allow non-coherent DMA
devices)

> > What I would really like is a new DMA_ATTR_REQUIRE_COHERENT which
> > fails any mappings requests that would use any SWIOTLB or cache
> > flushing.
> 
> You are proposing something orthogonal that operates at a different layer
> (DMA mapping). However, for DMA debugging, your new attribute will be
> equivalent to DMA_ATTR_CPU_CACHE_OVERLAP.

DMA_ATTR is a dma mapping flag, if you want some weird dma debugging
flag it should be called DMA_ATTR_DEBUGGING_IGNORE_CACHELINES with
some kind of statement at the user why it is OK.

Jason

Re: [PATCH 2/3] dma-mapping: Clarify valid conditions for CPU cache line overlap

Posted by Leon Romanovsky 1 month ago

On Sun, Mar 08, 2026 at 08:09:16PM -0300, Jason Gunthorpe wrote:
> On Sun, Mar 08, 2026 at 08:49:02PM +0200, Leon Romanovsky wrote:
> > On Sun, Mar 08, 2026 at 03:19:20PM -0300, Jason Gunthorpe wrote:
> > > On Sat, Mar 07, 2026 at 06:49:56PM +0200, Leon Romanovsky wrote:
> > > 
> > > > -This attribute indicates the CPU will not dirty any cacheline overlapping this
> > > > -DMA_FROM_DEVICE/DMA_BIDIRECTIONAL buffer while it is mapped. This allows
> > > > -multiple small buffers to safely share a cacheline without risk of data
> > > > -corruption, suppressing DMA debug warnings about overlapping mappings.
> > > > -All mappings sharing a cacheline should have this attribute.
> > > > +DMA_ATTR_CPU_CACHE_OVERLAP
> > > 
> > > This is a very specific and well defined use case that allows some cache
> > > flushing behaviors to work only under the promise that the CPU doesn't
> > > touch the memory to cause cache inconsistencies.
> > > 
> > > > +Another valid use case is on systems that are CPU-coherent and do not use
> > > > +SWIOTLB, where the caller can guarantee that no cache maintenance operations
> > > > +(such as flushes) will be performed that could overwrite shared cache lines.
> > > 
> > > This is something completely unrelated. 
> > 
> > I disagree. The situation is equivalent in that callers guarantee the
> > CPU cache will not be overwritten.
> 
> The RDMA callers do no such thing, they just don't work at all if
> there is non-coherence in the mapping which is why it is not a bug.
> 
> virtio looks like it does actually keep the caches clean for different
> mappings (and probably also in practice forced coherent as well given
> qemu is coherent with the VM and VFIO doesn't allow non-coherent DMA
> devices)
> 
> > > What I would really like is a new DMA_ATTR_REQUIRE_COHERENT which
> > > fails any mappings requests that would use any SWIOTLB or cache
> > > flushing.
> > 
> > You are proposing something orthogonal that operates at a different layer
> > (DMA mapping). However, for DMA debugging, your new attribute will be
> > equivalent to DMA_ATTR_CPU_CACHE_OVERLAP.
> 
> DMA_ATTR is a dma mapping flag, if you want some weird dma debugging
> flag it should be called DMA_ATTR_DEBUGGING_IGNORE_CACHELINES with
> some kind of statement at the user why it is OK.

And this is the issue: the existing DMA_ATTR_CPU_CACHE_CLEAN is essentially
a debug-oriented attribute. The upper layers are already handled through
__dma_from_device_group_begin()/end(), which pad cache lines on
non-coherent systems.

Marek,

What do you see as the right path forward here? RDMA has a legitimate use
case where CPU cache lines may overlap. The underlying reason differs from
VirtIO, but the outcome is the same. Should I keep the current name? Should
we rename it to the proposed DMA_ATTR_CPU_CACHE_OVERLAP or
DMA_ATTR_DEBUGGING_IGNORE_CACHELINES? Should we introduce a new
DMA_ATTR_REQUIRE_COHERENT attribute instead? Or do you have another
recommendation?

Thanks

> 
> Jason

Re: [PATCH 2/3] dma-mapping: Clarify valid conditions for CPU cache line overlap

Posted by Marek Szyprowski 1 month ago

On 09.03.2026 10:03, Leon Romanovsky wrote:
> On Sun, Mar 08, 2026 at 08:09:16PM -0300, Jason Gunthorpe wrote:
>> On Sun, Mar 08, 2026 at 08:49:02PM +0200, Leon Romanovsky wrote:
>>> On Sun, Mar 08, 2026 at 03:19:20PM -0300, Jason Gunthorpe wrote:
>>>> On Sat, Mar 07, 2026 at 06:49:56PM +0200, Leon Romanovsky wrote:
>>>>
>>>>> -This attribute indicates the CPU will not dirty any cacheline overlapping this
>>>>> -DMA_FROM_DEVICE/DMA_BIDIRECTIONAL buffer while it is mapped. This allows
>>>>> -multiple small buffers to safely share a cacheline without risk of data
>>>>> -corruption, suppressing DMA debug warnings about overlapping mappings.
>>>>> -All mappings sharing a cacheline should have this attribute.
>>>>> +DMA_ATTR_CPU_CACHE_OVERLAP
>>>> This is a very specific and well defined use case that allows some cache
>>>> flushing behaviors to work only under the promise that the CPU doesn't
>>>> touch the memory to cause cache inconsistencies.
>>>>
>>>>> +Another valid use case is on systems that are CPU-coherent and do not use
>>>>> +SWIOTLB, where the caller can guarantee that no cache maintenance operations
>>>>> +(such as flushes) will be performed that could overwrite shared cache lines.
>>>> This is something completely unrelated.
>>> I disagree. The situation is equivalent in that callers guarantee the
>>> CPU cache will not be overwritten.
>> The RDMA callers do no such thing, they just don't work at all if
>> there is non-coherence in the mapping which is why it is not a bug.
>>
>> virtio looks like it does actually keep the caches clean for different
>> mappings (and probably also in practice forced coherent as well given
>> qemu is coherent with the VM and VFIO doesn't allow non-coherent DMA
>> devices)
>>
>>>> What I would really like is a new DMA_ATTR_REQUIRE_COHERENT which
>>>> fails any mappings requests that would use any SWIOTLB or cache
>>>> flushing.
>>> You are proposing something orthogonal that operates at a different layer
>>> (DMA mapping). However, for DMA debugging, your new attribute will be
>>> equivalent to DMA_ATTR_CPU_CACHE_OVERLAP.
>> DMA_ATTR is a dma mapping flag, if you want some weird dma debugging
>> flag it should be called DMA_ATTR_DEBUGGING_IGNORE_CACHELINES with
>> some kind of statement at the user why it is OK.
> And this is the issue: the existing DMA_ATTR_CPU_CACHE_CLEAN is essentially
> a debug-oriented attribute. The upper layers are already handled through
> __dma_from_device_group_begin()/end(), which pad cache lines on
> non-coherent systems.
>
> Marek,
>
> What do you see as the right path forward here? RDMA has a legitimate use
> case where CPU cache lines may overlap. The underlying reason differs from
> VirtIO, but the outcome is the same. Should I keep the current name? Should
> we rename it to the proposed DMA_ATTR_CPU_CACHE_OVERLAP or
> DMA_ATTR_DEBUGGING_IGNORE_CACHELINES? Should we introduce a new
> DMA_ATTR_REQUIRE_COHERENT attribute instead? Or do you have another
> recommendation?

My question here is if RDMA works on any non-coherent DMA systems? If 
not then it should fail early (during init or probe?) to avoid potential 
data corruption and new DMA attributes won't help it. On the other hand, 
theDMA_ATTR_CPU_CACHE_OVERLAP attribute is a bit more descriptive to me 
than DMA_ATTR_CPU_CACHE_CLEAN, but this indeed looks like a separate 
issue from the RDMA case.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

Re: [PATCH 2/3] dma-mapping: Clarify valid conditions for CPU cache line overlap

Posted by Leon Romanovsky 1 month ago

On Mon, Mar 09, 2026 at 01:30:24PM +0100, Marek Szyprowski wrote:
> On 09.03.2026 10:03, Leon Romanovsky wrote:
> > On Sun, Mar 08, 2026 at 08:09:16PM -0300, Jason Gunthorpe wrote:
> >> On Sun, Mar 08, 2026 at 08:49:02PM +0200, Leon Romanovsky wrote:
> >>> On Sun, Mar 08, 2026 at 03:19:20PM -0300, Jason Gunthorpe wrote:
> >>>> On Sat, Mar 07, 2026 at 06:49:56PM +0200, Leon Romanovsky wrote:
> >>>>
> >>>>> -This attribute indicates the CPU will not dirty any cacheline overlapping this
> >>>>> -DMA_FROM_DEVICE/DMA_BIDIRECTIONAL buffer while it is mapped. This allows
> >>>>> -multiple small buffers to safely share a cacheline without risk of data
> >>>>> -corruption, suppressing DMA debug warnings about overlapping mappings.
> >>>>> -All mappings sharing a cacheline should have this attribute.
> >>>>> +DMA_ATTR_CPU_CACHE_OVERLAP
> >>>> This is a very specific and well defined use case that allows some cache
> >>>> flushing behaviors to work only under the promise that the CPU doesn't
> >>>> touch the memory to cause cache inconsistencies.
> >>>>
> >>>>> +Another valid use case is on systems that are CPU-coherent and do not use
> >>>>> +SWIOTLB, where the caller can guarantee that no cache maintenance operations
> >>>>> +(such as flushes) will be performed that could overwrite shared cache lines.
> >>>> This is something completely unrelated.
> >>> I disagree. The situation is equivalent in that callers guarantee the
> >>> CPU cache will not be overwritten.
> >> The RDMA callers do no such thing, they just don't work at all if
> >> there is non-coherence in the mapping which is why it is not a bug.
> >>
> >> virtio looks like it does actually keep the caches clean for different
> >> mappings (and probably also in practice forced coherent as well given
> >> qemu is coherent with the VM and VFIO doesn't allow non-coherent DMA
> >> devices)
> >>
> >>>> What I would really like is a new DMA_ATTR_REQUIRE_COHERENT which
> >>>> fails any mappings requests that would use any SWIOTLB or cache
> >>>> flushing.
> >>> You are proposing something orthogonal that operates at a different layer
> >>> (DMA mapping). However, for DMA debugging, your new attribute will be
> >>> equivalent to DMA_ATTR_CPU_CACHE_OVERLAP.
> >> DMA_ATTR is a dma mapping flag, if you want some weird dma debugging
> >> flag it should be called DMA_ATTR_DEBUGGING_IGNORE_CACHELINES with
> >> some kind of statement at the user why it is OK.
> > And this is the issue: the existing DMA_ATTR_CPU_CACHE_CLEAN is essentially
> > a debug-oriented attribute. The upper layers are already handled through
> > __dma_from_device_group_begin()/end(), which pad cache lines on
> > non-coherent systems.
> >
> > Marek,
> >
> > What do you see as the right path forward here? RDMA has a legitimate use
> > case where CPU cache lines may overlap. The underlying reason differs from
> > VirtIO, but the outcome is the same. Should I keep the current name? Should
> > we rename it to the proposed DMA_ATTR_CPU_CACHE_OVERLAP or
> > DMA_ATTR_DEBUGGING_IGNORE_CACHELINES? Should we introduce a new
> > DMA_ATTR_REQUIRE_COHERENT attribute instead? Or do you have another
> > recommendation?
> 
> My question here is if RDMA works on any non-coherent DMA systems? If 
> not then it should fail early (during init or probe?) to avoid potential 
> data corruption and new DMA attributes won't help it.

Like Jason wrote, our user‑visible API does not work on non‑coherent
systems, and this is where I'm using the DMA_ATTR_CPU_CACHE_OVERLAP
attribute.

Regarding failure on unsupported systems, I have tried more than once to
make the RDMA fail when the device is known to take the SWIOTLB path
in RDMA and cannot operate correctly, but each attempt was met with a
cold reception:
https://lore.kernel.org/all/d18c454636bf3cfdba9b66b7cc794d713eadc4a5.1719909395.git.leon@kernel.org/

I'm afraid the outcome will be the same this time as well.

> On the other hand, the DMA_ATTR_CPU_CACHE_OVERLAP attribute is a bit more
> descriptive to me than DMA_ATTR_CPU_CACHE_CLEAN, but this indeed looks
> like a separate issue from the RDMA case.
> 
> Best regards
> -- 
> Marek Szyprowski, PhD
> Samsung R&D Institute Poland
> 
>

Re: [PATCH 2/3] dma-mapping: Clarify valid conditions for CPU cache line overlap

Posted by Jason Gunthorpe 1 month ago

On Mon, Mar 09, 2026 at 05:05:02PM +0200, Leon Romanovsky wrote:

> Regarding failure on unsupported systems, I have tried more than once to
> make the RDMA fail when the device is known to take the SWIOTLB path
> in RDMA and cannot operate correctly, but each attempt was met with a
> cold reception:
> https://lore.kernel.org/all/d18c454636bf3cfdba9b66b7cc794d713eadc4a5.1719909395.git.leon@kernel.org/

I think alot of that is the APIs used there. It is hard to determine
if SWIOTLB is possible or coherent is possible, I've also hit these
things in VFIO and gave up.

However, DMA_ATTR_REQUIRE_COHERENCE can be done properly and not leak
alot of dangerous APIs to drivers (beyond itself).

It is also more important now with CC systems, I think.

Jason

Re: [PATCH 2/3] dma-mapping: Clarify valid conditions for CPU cache line overlap

Posted by Marek Szyprowski 1 month ago

On 09.03.2026 16:13, Jason Gunthorpe wrote:
> On Mon, Mar 09, 2026 at 05:05:02PM +0200, Leon Romanovsky wrote:
>> Regarding failure on unsupported systems, I have tried more than once to
>> make the RDMA fail when the device is known to take the SWIOTLB path
>> in RDMA and cannot operate correctly, but each attempt was met with a
>> cold reception:
>> https://lore.kernel.org/all/d18c454636bf3cfdba9b66b7cc794d713eadc4a5.1719909395.git.leon@kernel.org/
> I think alot of that is the APIs used there. It is hard to determine
> if SWIOTLB is possible or coherent is possible, I've also hit these
> things in VFIO and gave up.
>
> However, DMA_ATTR_REQUIRE_COHERENCE can be done properly and not leak
> alot of dangerous APIs to drivers (beyond itself).
>
> It is also more important now with CC systems, I think.

Jason is right. Indeed the rdma/uverbs case needs some extension to 
ensure that the coherent mapping is used, what is not possible now. This 
however doesn't mean that the DMA_ATTR_CPU_CACHE_OVERLAP is not needed 
for that use case too. I'm open to accept both. The only question I have 
is which name should we use? We already have DMA_ATTR_CPU_CACHE_CLEAN, 
while DMA_ATTR_CPU_CACHE_OVERLAP and 
DMA_ATTR_DEBUGGING_IGNORE_CACHELINES were proposed here. The last seems 
to be most descriptive.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

Re: [PATCH 2/3] dma-mapping: Clarify valid conditions for CPU cache line overlap

Posted by Jason Gunthorpe 1 month ago

On Tue, Mar 10, 2026 at 10:45:38AM +0100, Marek Szyprowski wrote:
> Jason is right. Indeed the rdma/uverbs case needs some extension to 
> ensure that the coherent mapping is used, what is not possible now. This 
> however doesn't mean that the DMA_ATTR_CPU_CACHE_OVERLAP is not needed 
> for that use case too. I'm open to accept both. The only question I have 
> is which name should we use? We already have DMA_ATTR_CPU_CACHE_CLEAN, 
> while DMA_ATTR_CPU_CACHE_OVERLAP and 
> DMA_ATTR_DEBUGGING_IGNORE_CACHELINES were proposed here. The last seems 
> to be most descriptive.

If we do DMA_ATTR_REQUIRE_COHERENCE then I imagine it would internally
also set DMA_ATTR_DEBUGGING_IGNORE_CACHELINES, but I'd prefer that
detail not leak into the callers.

Jason

Re: [PATCH 2/3] dma-mapping: Clarify valid conditions for CPU cache line overlap

Posted by Marek Szyprowski 4 weeks, 1 day ago

On 10.03.2026 13:34, Jason Gunthorpe wrote:
> On Tue, Mar 10, 2026 at 10:45:38AM +0100, Marek Szyprowski wrote:
>> Jason is right. Indeed the rdma/uverbs case needs some extension to
>> ensure that the coherent mapping is used, what is not possible now. This
>> however doesn't mean that the DMA_ATTR_CPU_CACHE_OVERLAP is not needed
>> for that use case too. I'm open to accept both. The only question I have
>> is which name should we use? We already have DMA_ATTR_CPU_CACHE_CLEAN,
>> while DMA_ATTR_CPU_CACHE_OVERLAP and
>> DMA_ATTR_DEBUGGING_IGNORE_CACHELINES were proposed here. The last seems
>> to be most descriptive.
> If we do DMA_ATTR_REQUIRE_COHERENCE then I imagine it would internally
> also set DMA_ATTR_DEBUGGING_IGNORE_CACHELINES, but I'd prefer that
> detail not leak into the callers.

Why DMA_ATTR_REQUIRE_COHERENCE should imply 
DMA_ATTR_DEBUGGING_IGNORE_CACHELINES?

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

Re: [PATCH 2/3] dma-mapping: Clarify valid conditions for CPU cache line overlap

Posted by Jason Gunthorpe 4 weeks, 1 day ago

On Tue, Mar 10, 2026 at 10:08:38PM +0100, Marek Szyprowski wrote:
> On 10.03.2026 13:34, Jason Gunthorpe wrote:
> > On Tue, Mar 10, 2026 at 10:45:38AM +0100, Marek Szyprowski wrote:
> >> Jason is right. Indeed the rdma/uverbs case needs some extension to
> >> ensure that the coherent mapping is used, what is not possible now. This
> >> however doesn't mean that the DMA_ATTR_CPU_CACHE_OVERLAP is not needed
> >> for that use case too. I'm open to accept both. The only question I have
> >> is which name should we use? We already have DMA_ATTR_CPU_CACHE_CLEAN,
> >> while DMA_ATTR_CPU_CACHE_OVERLAP and
> >> DMA_ATTR_DEBUGGING_IGNORE_CACHELINES were proposed here. The last seems
> >> to be most descriptive.
> > If we do DMA_ATTR_REQUIRE_COHERENCE then I imagine it would internally
> > also set DMA_ATTR_DEBUGGING_IGNORE_CACHELINES, but I'd prefer that
> > detail not leak into the callers.
> 
> Why DMA_ATTR_REQUIRE_COHERENCE should imply 
> DMA_ATTR_DEBUGGING_IGNORE_CACHELINES?

AFAICT the purpose of the DMA API debugging cacheline tracking is to
ensure that drivers are mapping things properly such that the cache
flushing in incoherent systems can properly cache flush them without
creating bugs (ie a dirty line overwriteing DMA'd data or something).

If the mapping is REQUIRE_COHERENCE then it is prevented from running
on systems where these cache artifacts can cause corruption, so we
don't need to track them and we don't need the strict restrictions on
what can be mapped.

Which trips up and gives false positives for cases like RDMA, DRM, etc
that are allowing userspace to multi-map userspace memory.

Jason

Re: [PATCH 2/3] dma-mapping: Clarify valid conditions for CPU cache line overlap

Posted by Leon Romanovsky 4 weeks, 1 day ago

On Tue, Mar 10, 2026 at 09:34:05AM -0300, Jason Gunthorpe wrote:
> On Tue, Mar 10, 2026 at 10:45:38AM +0100, Marek Szyprowski wrote:
> > Jason is right. Indeed the rdma/uverbs case needs some extension to 
> > ensure that the coherent mapping is used, what is not possible now. This 
> > however doesn't mean that the DMA_ATTR_CPU_CACHE_OVERLAP is not needed 
> > for that use case too. I'm open to accept both. The only question I have 
> > is which name should we use? We already have DMA_ATTR_CPU_CACHE_CLEAN, 
> > while DMA_ATTR_CPU_CACHE_OVERLAP and 
> > DMA_ATTR_DEBUGGING_IGNORE_CACHELINES were proposed here. The last seems 
> > to be most descriptive.
> 
> If we do DMA_ATTR_REQUIRE_COHERENCE then I imagine it would internally
> also set DMA_ATTR_DEBUGGING_IGNORE_CACHELINES, but I'd prefer that
> detail not leak into the callers.

Yes, this is how I implemented in my v2, which I didn't send yet :).

Thanks

> 
> Jason

Re: [PATCH 2/3] dma-mapping: Clarify valid conditions for CPU cache line overlap

Posted by Jason Gunthorpe 1 month ago

On Mon, Mar 09, 2026 at 01:30:24PM +0100, Marek Szyprowski wrote:

> My question here is if RDMA works on any non-coherent DMA systems? 

The in kernel components do work, like storage, nvme over fabrics, netdev.

The user API (uverbs) does not work at all, and has never worked.

I think DRM has similar issues too where most of their DMA API usage
is OK but some places where they interact win pin_user_pages() have
the same issues as RDMA.

This is why I'd like a new attribute DMA_ATTR_REQUIRE_COHERENCE that
these special cases can use to fail instead of data corrupt.

Jason