[PATCH v1] ntb_transport: replace wmb() with dma_wmb() in TX paths

Xingbang Liu posted 1 patch 1 month, 3 weeks ago
drivers/ntb/ntb_transport.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH v1] ntb_transport: replace wmb() with dma_wmb() in TX paths
Posted by Xingbang Liu 1 month, 3 weeks ago
The TX paths currently use wmb() after memcpy_toio() to ensure that
writes to the peer memory window are visible before subsequent
notification.

The actual ordering requirement is to ensure that data written to the
peer memory is observed before the associated notification or status
update. This matches the semantics of dma_wmb(), which orders memory
writes with respect to external observers.

The peer memory window is mapped with ioremap_wc(), allowing
write-combining. On weakly ordered architectures such as arm64,
dma_wmb() ensures that prior writes are committed from write-combining
buffers and become visible to the peer before subsequent operations.

This aligns with common patterns in PCIe-based drivers, where it is
used to order descriptor or payload writes before notification, while
avoiding the stronger ordering semantics of wmb().

The ordering with respect to peer observation is completed by the
subsequent notification mechanism.

No functional change is intended.

Signed-off-by: Xingbang Liu <liu.airalert@gmail.com>
---
 drivers/ntb/ntb_transport.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index 7cabc82305d6..c8ef46ddcc57 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -1791,7 +1791,7 @@ static void ntb_memcpy_tx_on_stack(struct ntb_queue_entry *entry, void __iomem *
 #endif
 
 	/* Ensure that the data is fully copied out before setting the flags */
-	wmb();
+	dma_wmb();
 
 	ntb_tx_copy_callback(entry, NULL);
 }
-- 
2.34.1
Re: [PATCH v1] ntb_transport: replace wmb() with dma_wmb() in TX paths
Posted by Dave Jiang 1 month, 3 weeks ago

On 4/24/26 2:23 AM, Xingbang Liu wrote:
> The TX paths currently use wmb() after memcpy_toio() to ensure that
> writes to the peer memory window are visible before subsequent
> notification.
> 
> The actual ordering requirement is to ensure that data written to the
> peer memory is observed before the associated notification or status
> update. This matches the semantics of dma_wmb(), which orders memory
> writes with respect to external observers.
> 
> The peer memory window is mapped with ioremap_wc(), allowing
> write-combining. On weakly ordered architectures such as arm64,
> dma_wmb() ensures that prior writes are committed from write-combining
> buffers and become visible to the peer before subsequent operations.
> 
> This aligns with common patterns in PCIe-based drivers, where it is
> used to order descriptor or payload writes before notification, while
> avoiding the stronger ordering semantics of wmb().
> 
> The ordering with respect to peer observation is completed by the
> subsequent notification mechanism.
> 
> No functional change is intended.
> 
> Signed-off-by: Xingbang Liu <liu.airalert@gmail.com>

NAK. dma_wmb() is used for shared memory between CPU and a device. For example:

memcpy(dma_buffer, data, size);
dma_wmb(); /* Ensure CPU writes visible to device DMA */
writel(DOORBELL, device->reg + DB_OFFSET); /* trigger device DMA from dma_buffer */

In this case we are copying data to MMIO and therefore wmb() is needed to ensure all MMIO writes have happened before we proceed.

DJ

> ---
>  drivers/ntb/ntb_transport.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
> index 7cabc82305d6..c8ef46ddcc57 100644
> --- a/drivers/ntb/ntb_transport.c
> +++ b/drivers/ntb/ntb_transport.c
> @@ -1791,7 +1791,7 @@ static void ntb_memcpy_tx_on_stack(struct ntb_queue_entry *entry, void __iomem *
>  #endif
>  
>  	/* Ensure that the data is fully copied out before setting the flags */
> -	wmb();
> +	dma_wmb();
>  
>  	ntb_tx_copy_callback(entry, NULL);
>  }
Re: [PATCH v1] ntb_transport: replace wmb() with dma_wmb() in TX paths
Posted by xingbang liu 1 month, 3 weeks ago
Hi DJ ,

Thanks for the review and the explanation.

I understand that dma_wmb() is intended for CPU → device DMA visibility, and
since this path writes to MMIO (peer memory window), a full wmb() is required
to guarantee that all writes are visible before the notification/doorbell.

I will withdraw this patch and keep the wmb() in place.

Thanks,
Xingbang Liu

Dave Jiang <dave.jiang@intel.com> 于2026年4月24日周五 23:32写道:
>
>
>
> On 4/24/26 2:23 AM, Xingbang Liu wrote:
> > The TX paths currently use wmb() after memcpy_toio() to ensure that
> > writes to the peer memory window are visible before subsequent
> > notification.
> >
> > The actual ordering requirement is to ensure that data written to the
> > peer memory is observed before the associated notification or status
> > update. This matches the semantics of dma_wmb(), which orders memory
> > writes with respect to external observers.
> >
> > The peer memory window is mapped with ioremap_wc(), allowing
> > write-combining. On weakly ordered architectures such as arm64,
> > dma_wmb() ensures that prior writes are committed from write-combining
> > buffers and become visible to the peer before subsequent operations.
> >
> > This aligns with common patterns in PCIe-based drivers, where it is
> > used to order descriptor or payload writes before notification, while
> > avoiding the stronger ordering semantics of wmb().
> >
> > The ordering with respect to peer observation is completed by the
> > subsequent notification mechanism.
> >
> > No functional change is intended.
> >
> > Signed-off-by: Xingbang Liu <liu.airalert@gmail.com>
>
> NAK. dma_wmb() is used for shared memory between CPU and a device. For example:
>
> memcpy(dma_buffer, data, size);
> dma_wmb(); /* Ensure CPU writes visible to device DMA */
> writel(DOORBELL, device->reg + DB_OFFSET); /* trigger device DMA from dma_buffer */
>
> In this case we are copying data to MMIO and therefore wmb() is needed to ensure all MMIO writes have happened before we proceed.
>
> DJ
>
> > ---
> >  drivers/ntb/ntb_transport.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
> > index 7cabc82305d6..c8ef46ddcc57 100644
> > --- a/drivers/ntb/ntb_transport.c
> > +++ b/drivers/ntb/ntb_transport.c
> > @@ -1791,7 +1791,7 @@ static void ntb_memcpy_tx_on_stack(struct ntb_queue_entry *entry, void __iomem *
> >  #endif
> >
> >       /* Ensure that the data is fully copied out before setting the flags */
> > -     wmb();
> > +     dma_wmb();
> >
> >       ntb_tx_copy_callback(entry, NULL);
> >  }
>