[v1] NTB: Allow drivers to provide DMA mapping device

[PATCH 0/2] NTB: Allow drivers to provide DMA mapping device

Posted by Koichiro Den 1 month, 1 week ago

Some NTB implementations are backed by a "virtual" PCI device, while the
actual DMA mapping context (IOMMU domain) belongs to a different device.

One example is vNTB, where the NTB device is represented as a virtual
PCI endpoint function, but DMA operations must be performed against the
EPC parent device, which owns the IOMMU context.

Today, ntb_transport implicitly relies on the NTB device's parent device
as the DMA mapping device. This works for most PCIe NTB hardware, but
breaks implementations where the NTB PCI function is not the correct
device to use for DMA API operations.

This small series introduces an optional .get_dma_dev() callback in
struct ntb_dev_ops, together with a helper ntb_get_dma_dev(). If the
callback is not implemented, the helper falls back to the existing
default behavior. Drivers that implement .get_dma_dev() must return a
non-NULL struct device.

- Patch 1/2: Add .get_dma_dev() to struct ntb_dev_ops and provide
             ntb_get_dma_dev().

- Patch 2/2: Switch ntb_transport coherent allocations and frees to use
             ntb_get_dma_dev().

No functional changes are intended by this series itself.

A follow-up patch implementing .get_dma_dev() for the vNTB EPF driver
(drivers/pci/endpoint/functions/pci-epf-vntb.c) will be submitted
separately to the PCI Endpoint subsystem tree. That will enable
ntb_transport to work correctly in IOMMU-backed EPC setups.

Best regards,
Koichiro


Koichiro Den (2):
  NTB: core: Add .get_dma_dev() callback to ntb_dev_ops
  NTB: ntb_transport: Use ntb_get_dma_dev() for DMA buffers

 drivers/ntb/ntb_transport.c | 14 +++++++-------
 include/linux/ntb.h         | 23 +++++++++++++++++++++++
 2 files changed, 30 insertions(+), 7 deletions(-)

-- 
2.51.0

Re: [PATCH 0/2] NTB: Allow drivers to provide DMA mapping device

Posted by Dave Jiang 1 month, 1 week ago


On 3/2/26 7:45 AM, Koichiro Den wrote:
> Some NTB implementations are backed by a "virtual" PCI device, while the
> actual DMA mapping context (IOMMU domain) belongs to a different device.
> 
> One example is vNTB, where the NTB device is represented as a virtual
> PCI endpoint function, but DMA operations must be performed against the
> EPC parent device, which owns the IOMMU context.
> 
> Today, ntb_transport implicitly relies on the NTB device's parent device
> as the DMA mapping device. This works for most PCIe NTB hardware, but
> breaks implementations where the NTB PCI function is not the correct
> device to use for DMA API operations.

Actually it doesn't quite work. This resulted in 061a785a114f ("ntb: Force
physically contiguous allocation of rx ring buffers"). As you can see it
tries to get around the issue as a temp measure. The main issue is the
memory window buffer is allocated before the dmaengine devices are allocated.
So the buffer is mapped against the NTB device rather than the DMA device.
So I think we may need to come up with a better scheme to clean up this
issue as some of the current NTBs can utilize this change as well.

The per queue DMA device presents an initialization hierarchy challenge with the
memory window context. I'm open to suggestions.  

DJ

> 
> This small series introduces an optional .get_dma_dev() callback in
> struct ntb_dev_ops, together with a helper ntb_get_dma_dev(). If the
> callback is not implemented, the helper falls back to the existing
> default behavior. Drivers that implement .get_dma_dev() must return a
> non-NULL struct device.
> 
> - Patch 1/2: Add .get_dma_dev() to struct ntb_dev_ops and provide
>              ntb_get_dma_dev().
> 
> - Patch 2/2: Switch ntb_transport coherent allocations and frees to use
>              ntb_get_dma_dev().
> 
> No functional changes are intended by this series itself.
> 
> A follow-up patch implementing .get_dma_dev() for the vNTB EPF driver
> (drivers/pci/endpoint/functions/pci-epf-vntb.c) will be submitted
> separately to the PCI Endpoint subsystem tree. That will enable
> ntb_transport to work correctly in IOMMU-backed EPC setups.
> 
> Best regards,
> Koichiro
> 
> 
> Koichiro Den (2):
>   NTB: core: Add .get_dma_dev() callback to ntb_dev_ops
>   NTB: ntb_transport: Use ntb_get_dma_dev() for DMA buffers
> 
>  drivers/ntb/ntb_transport.c | 14 +++++++-------
>  include/linux/ntb.h         | 23 +++++++++++++++++++++++
>  2 files changed, 30 insertions(+), 7 deletions(-)
>

Re: [PATCH 0/2] NTB: Allow drivers to provide DMA mapping device

Posted by Koichiro Den 1 month, 1 week ago

On Mon, Mar 02, 2026 at 09:52:08AM -0700, Dave Jiang wrote:
> 
> 
> On 3/2/26 7:45 AM, Koichiro Den wrote:
> > Some NTB implementations are backed by a "virtual" PCI device, while the
> > actual DMA mapping context (IOMMU domain) belongs to a different device.
> > 
> > One example is vNTB, where the NTB device is represented as a virtual
> > PCI endpoint function, but DMA operations must be performed against the
> > EPC parent device, which owns the IOMMU context.
> > 
> > Today, ntb_transport implicitly relies on the NTB device's parent device
> > as the DMA mapping device. This works for most PCIe NTB hardware, but
> > breaks implementations where the NTB PCI function is not the correct
> > device to use for DMA API operations.
> 
> Actually it doesn't quite work. This resulted in 061a785a114f ("ntb: Force
> physically contiguous allocation of rx ring buffers"). As you can see it
> tries to get around the issue as a temp measure. The main issue is the
> memory window buffer is allocated before the dmaengine devices are allocated.
> So the buffer is mapped against the NTB device rather than the DMA device.
> So I think we may need to come up with a better scheme to clean up this
> issue as some of the current NTBs can utilize this change as well.

Thanks for the feedback.

I think there are two issues which are related but separable:

- 1). Ensuring the correct DMA-mapping device is used for the MW translation
      (i.e. inbound accesses from the peer).
- 2). RX-side DMA memcpy re-maps the MW source buffer against the dmaengine
      device ("double mapping").

(1) is what this series is addressing. I think this series does not worsen (2).
I agree that (2) should be improved eventually.

(Note that in some setups such as vNTB, the device returned by ntb_get_dma_dev()
can be the same as chan->device->dev, in that case the double mapping could be
optimized away. However, I undersntand that you are talking about a more
fundamental improvement.)

> 
> The per queue DMA device presents an initialization hierarchy challenge with the
> memory window context. I'm open to suggestions.  

In my view, what is written in 061a785a114f looks like the most viable long-term
direction:

    A potential future solution may be having the DMA mapping API providing a
    way to alias an existing IOVA mapping to a new device perhaps.

I do not immediately see a more practical alternative. E.g., deferring MW
inbound mapping until ntb_transport_create_queue() would require a substantial
rework, since dma_chan is determined per-QP at that stage and the mapping would
become dynamic per subrange. I doubt it would be worth doing or acceptable.
Pre-allocating dma_chans only for this purpose also seems excessive.

So I agree that (2) needs a clean-up eventually. However, in my opinion the
problem this series tries to solve is independent, and the approach here does
not interfere with that direction.

Best regards,
Koichiro

> 
> DJ
> 
> > 
> > This small series introduces an optional .get_dma_dev() callback in
> > struct ntb_dev_ops, together with a helper ntb_get_dma_dev(). If the
> > callback is not implemented, the helper falls back to the existing
> > default behavior. Drivers that implement .get_dma_dev() must return a
> > non-NULL struct device.
> > 
> > - Patch 1/2: Add .get_dma_dev() to struct ntb_dev_ops and provide
> >              ntb_get_dma_dev().
> > 
> > - Patch 2/2: Switch ntb_transport coherent allocations and frees to use
> >              ntb_get_dma_dev().
> > 
> > No functional changes are intended by this series itself.
> > 
> > A follow-up patch implementing .get_dma_dev() for the vNTB EPF driver
> > (drivers/pci/endpoint/functions/pci-epf-vntb.c) will be submitted
> > separately to the PCI Endpoint subsystem tree. That will enable
> > ntb_transport to work correctly in IOMMU-backed EPC setups.
> > 
> > Best regards,
> > Koichiro
> > 
> > 
> > Koichiro Den (2):
> >   NTB: core: Add .get_dma_dev() callback to ntb_dev_ops
> >   NTB: ntb_transport: Use ntb_get_dma_dev() for DMA buffers
> > 
> >  drivers/ntb/ntb_transport.c | 14 +++++++-------
> >  include/linux/ntb.h         | 23 +++++++++++++++++++++++
> >  2 files changed, 30 insertions(+), 7 deletions(-)
> > 
>

Re: [PATCH 0/2] NTB: Allow drivers to provide DMA mapping device

Posted by Dave Jiang 1 month ago


On 3/2/26 9:56 PM, Koichiro Den wrote:
> On Mon, Mar 02, 2026 at 09:52:08AM -0700, Dave Jiang wrote:
>>
>>
>> On 3/2/26 7:45 AM, Koichiro Den wrote:
>>> Some NTB implementations are backed by a "virtual" PCI device, while the
>>> actual DMA mapping context (IOMMU domain) belongs to a different device.
>>>
>>> One example is vNTB, where the NTB device is represented as a virtual
>>> PCI endpoint function, but DMA operations must be performed against the
>>> EPC parent device, which owns the IOMMU context.
>>>
>>> Today, ntb_transport implicitly relies on the NTB device's parent device
>>> as the DMA mapping device. This works for most PCIe NTB hardware, but
>>> breaks implementations where the NTB PCI function is not the correct
>>> device to use for DMA API operations.
>>
>> Actually it doesn't quite work. This resulted in 061a785a114f ("ntb: Force
>> physically contiguous allocation of rx ring buffers"). As you can see it
>> tries to get around the issue as a temp measure. The main issue is the
>> memory window buffer is allocated before the dmaengine devices are allocated.
>> So the buffer is mapped against the NTB device rather than the DMA device.
>> So I think we may need to come up with a better scheme to clean up this
>> issue as some of the current NTBs can utilize this change as well.
> 
> Thanks for the feedback.
> 
> I think there are two issues which are related but separable:
> 
> - 1). Ensuring the correct DMA-mapping device is used for the MW translation
>       (i.e. inbound accesses from the peer).
> - 2). RX-side DMA memcpy re-maps the MW source buffer against the dmaengine
>       device ("double mapping").
> 
> (1) is what this series is addressing. I think this series does not worsen (2).
> I agree that (2) should be improved eventually.
> 
> (Note that in some setups such as vNTB, the device returned by ntb_get_dma_dev()
> can be the same as chan->device->dev, in that case the double mapping could be
> optimized away. However, I undersntand that you are talking about a more
> fundamental improvement.)
> 
>>
>> The per queue DMA device presents an initialization hierarchy challenge with the
>> memory window context. I'm open to suggestions.  
> 
> In my view, what is written in 061a785a114f looks like the most viable long-term
> direction:
> 
>     A potential future solution may be having the DMA mapping API providing a
>     way to alias an existing IOVA mapping to a new device perhaps.
> 
> I do not immediately see a more practical alternative. E.g., deferring MW
> inbound mapping until ntb_transport_create_queue() would require a substantial
> rework, since dma_chan is determined per-QP at that stage and the mapping would
> become dynamic per subrange. I doubt it would be worth doing or acceptable.
> Pre-allocating dma_chans only for this purpose also seems excessive.
> 
> So I agree that (2) needs a clean-up eventually. However, in my opinion the
> problem this series tries to solve is independent, and the approach here does
> not interfere with that direction.

Fair assessment. For the series:
Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> 
> Best regards,
> Koichiro
> 
>>
>> DJ
>>
>>>
>>> This small series introduces an optional .get_dma_dev() callback in
>>> struct ntb_dev_ops, together with a helper ntb_get_dma_dev(). If the
>>> callback is not implemented, the helper falls back to the existing
>>> default behavior. Drivers that implement .get_dma_dev() must return a
>>> non-NULL struct device.
>>>
>>> - Patch 1/2: Add .get_dma_dev() to struct ntb_dev_ops and provide
>>>              ntb_get_dma_dev().
>>>
>>> - Patch 2/2: Switch ntb_transport coherent allocations and frees to use
>>>              ntb_get_dma_dev().
>>>
>>> No functional changes are intended by this series itself.
>>>
>>> A follow-up patch implementing .get_dma_dev() for the vNTB EPF driver
>>> (drivers/pci/endpoint/functions/pci-epf-vntb.c) will be submitted
>>> separately to the PCI Endpoint subsystem tree. That will enable
>>> ntb_transport to work correctly in IOMMU-backed EPC setups.
>>>
>>> Best regards,
>>> Koichiro
>>>
>>>
>>> Koichiro Den (2):
>>>   NTB: core: Add .get_dma_dev() callback to ntb_dev_ops
>>>   NTB: ntb_transport: Use ntb_get_dma_dev() for DMA buffers
>>>
>>>  drivers/ntb/ntb_transport.c | 14 +++++++-------
>>>  include/linux/ntb.h         | 23 +++++++++++++++++++++++
>>>  2 files changed, 30 insertions(+), 7 deletions(-)
>>>
>>
>

Re: [PATCH 0/2] NTB: Allow drivers to provide DMA mapping device

Posted by Koichiro Den 1 month ago

On Tue, Mar 03, 2026 at 08:42:53AM -0700, Dave Jiang wrote:
> 
> 
> On 3/2/26 9:56 PM, Koichiro Den wrote:
> > On Mon, Mar 02, 2026 at 09:52:08AM -0700, Dave Jiang wrote:
> >>
> >>
> >> On 3/2/26 7:45 AM, Koichiro Den wrote:
> >>> Some NTB implementations are backed by a "virtual" PCI device, while the
> >>> actual DMA mapping context (IOMMU domain) belongs to a different device.
> >>>
> >>> One example is vNTB, where the NTB device is represented as a virtual
> >>> PCI endpoint function, but DMA operations must be performed against the
> >>> EPC parent device, which owns the IOMMU context.
> >>>
> >>> Today, ntb_transport implicitly relies on the NTB device's parent device
> >>> as the DMA mapping device. This works for most PCIe NTB hardware, but
> >>> breaks implementations where the NTB PCI function is not the correct
> >>> device to use for DMA API operations.
> >>
> >> Actually it doesn't quite work. This resulted in 061a785a114f ("ntb: Force
> >> physically contiguous allocation of rx ring buffers"). As you can see it
> >> tries to get around the issue as a temp measure. The main issue is the
> >> memory window buffer is allocated before the dmaengine devices are allocated.
> >> So the buffer is mapped against the NTB device rather than the DMA device.
> >> So I think we may need to come up with a better scheme to clean up this
> >> issue as some of the current NTBs can utilize this change as well.
> > 
> > Thanks for the feedback.
> > 
> > I think there are two issues which are related but separable:
> > 
> > - 1). Ensuring the correct DMA-mapping device is used for the MW translation
> >       (i.e. inbound accesses from the peer).
> > - 2). RX-side DMA memcpy re-maps the MW source buffer against the dmaengine
> >       device ("double mapping").
> > 
> > (1) is what this series is addressing. I think this series does not worsen (2).
> > I agree that (2) should be improved eventually.
> > 
> > (Note that in some setups such as vNTB, the device returned by ntb_get_dma_dev()
> > can be the same as chan->device->dev, in that case the double mapping could be
> > optimized away. However, I undersntand that you are talking about a more
> > fundamental improvement.)
> > 
> >>
> >> The per queue DMA device presents an initialization hierarchy challenge with the
> >> memory window context. I'm open to suggestions.  
> > 
> > In my view, what is written in 061a785a114f looks like the most viable long-term
> > direction:
> > 
> >     A potential future solution may be having the DMA mapping API providing a
> >     way to alias an existing IOVA mapping to a new device perhaps.
> > 
> > I do not immediately see a more practical alternative. E.g., deferring MW
> > inbound mapping until ntb_transport_create_queue() would require a substantial
> > rework, since dma_chan is determined per-QP at that stage and the mapping would
> > become dynamic per subrange. I doubt it would be worth doing or acceptable.
> > Pre-allocating dma_chans only for this purpose also seems excessive.
> > 
> > So I agree that (2) needs a clean-up eventually. However, in my opinion the
> > problem this series tries to solve is independent, and the approach here does
> > not interfere with that direction.
> 
> Fair assessment. For the series:
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>

Thanks for the review.

Once this looks good to Jon as well and gets queued in the NTB tree, I'll submit
a small patch to PCI EP for vNTB (the real user of the interface), something
like the following:


diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
index be6c03f4516e..8aeacbae8b77 100644
--- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
+++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
@@ -1501,6 +1501,15 @@ static int vntb_epf_link_disable(struct ntb_dev *ntb)
        return 0;
 }

+static struct device *vntb_epf_get_dma_dev(struct ntb_dev *ndev)
+{
+       struct epf_ntb *ntb = ntb_ndev(ndev);
+
+       if (!ntb || !ntb->epf)
+               return NULL;
+       return ntb->epf->epc->dev.parent;
+}
+
 static const struct ntb_dev_ops vntb_epf_ops = {
        .mw_count               = vntb_epf_mw_count,
        .spad_count             = vntb_epf_spad_count,
@@ -1522,6 +1531,7 @@ static const struct ntb_dev_ops vntb_epf_ops = {
        .db_clear_mask          = vntb_epf_db_clear_mask,
        .db_clear               = vntb_epf_db_clear,
        .link_disable           = vntb_epf_link_disable,
+       .get_dma_dev            = vntb_epf_get_dma_dev,
 };

 static int pci_vntb_probe(struct pci_dev *pdev, const struct pci_device_id *id)


Best regards,
Koichiro

> 
> > 
> > Best regards,
> > Koichiro
> > 
> >>
> >> DJ
> >>
> >>>
> >>> This small series introduces an optional .get_dma_dev() callback in
> >>> struct ntb_dev_ops, together with a helper ntb_get_dma_dev(). If the
> >>> callback is not implemented, the helper falls back to the existing
> >>> default behavior. Drivers that implement .get_dma_dev() must return a
> >>> non-NULL struct device.
> >>>
> >>> - Patch 1/2: Add .get_dma_dev() to struct ntb_dev_ops and provide
> >>>              ntb_get_dma_dev().
> >>>
> >>> - Patch 2/2: Switch ntb_transport coherent allocations and frees to use
> >>>              ntb_get_dma_dev().
> >>>
> >>> No functional changes are intended by this series itself.
> >>>
> >>> A follow-up patch implementing .get_dma_dev() for the vNTB EPF driver
> >>> (drivers/pci/endpoint/functions/pci-epf-vntb.c) will be submitted
> >>> separately to the PCI Endpoint subsystem tree. That will enable
> >>> ntb_transport to work correctly in IOMMU-backed EPC setups.
> >>>
> >>> Best regards,
> >>> Koichiro
> >>>
> >>>
> >>> Koichiro Den (2):
> >>>   NTB: core: Add .get_dma_dev() callback to ntb_dev_ops
> >>>   NTB: ntb_transport: Use ntb_get_dma_dev() for DMA buffers
> >>>
> >>>  drivers/ntb/ntb_transport.c | 14 +++++++-------
> >>>  include/linux/ntb.h         | 23 +++++++++++++++++++++++
> >>>  2 files changed, 30 insertions(+), 7 deletions(-)
> >>>
> >>
> > 
>

Re: [PATCH 0/2] NTB: Allow drivers to provide DMA mapping device

Posted by Dave Jiang 1 month ago


On 3/4/26 8:56 AM, Koichiro Den wrote:
> On Tue, Mar 03, 2026 at 08:42:53AM -0700, Dave Jiang wrote:
>>
>>
>> On 3/2/26 9:56 PM, Koichiro Den wrote:
>>> On Mon, Mar 02, 2026 at 09:52:08AM -0700, Dave Jiang wrote:
>>>>
>>>>
>>>> On 3/2/26 7:45 AM, Koichiro Den wrote:
>>>>> Some NTB implementations are backed by a "virtual" PCI device, while the
>>>>> actual DMA mapping context (IOMMU domain) belongs to a different device.
>>>>>
>>>>> One example is vNTB, where the NTB device is represented as a virtual
>>>>> PCI endpoint function, but DMA operations must be performed against the
>>>>> EPC parent device, which owns the IOMMU context.
>>>>>
>>>>> Today, ntb_transport implicitly relies on the NTB device's parent device
>>>>> as the DMA mapping device. This works for most PCIe NTB hardware, but
>>>>> breaks implementations where the NTB PCI function is not the correct
>>>>> device to use for DMA API operations.
>>>>
>>>> Actually it doesn't quite work. This resulted in 061a785a114f ("ntb: Force
>>>> physically contiguous allocation of rx ring buffers"). As you can see it
>>>> tries to get around the issue as a temp measure. The main issue is the
>>>> memory window buffer is allocated before the dmaengine devices are allocated.
>>>> So the buffer is mapped against the NTB device rather than the DMA device.
>>>> So I think we may need to come up with a better scheme to clean up this
>>>> issue as some of the current NTBs can utilize this change as well.
>>>
>>> Thanks for the feedback.
>>>
>>> I think there are two issues which are related but separable:
>>>
>>> - 1). Ensuring the correct DMA-mapping device is used for the MW translation
>>>       (i.e. inbound accesses from the peer).
>>> - 2). RX-side DMA memcpy re-maps the MW source buffer against the dmaengine
>>>       device ("double mapping").
>>>
>>> (1) is what this series is addressing. I think this series does not worsen (2).
>>> I agree that (2) should be improved eventually.
>>>
>>> (Note that in some setups such as vNTB, the device returned by ntb_get_dma_dev()
>>> can be the same as chan->device->dev, in that case the double mapping could be
>>> optimized away. However, I undersntand that you are talking about a more
>>> fundamental improvement.)
>>>
>>>>
>>>> The per queue DMA device presents an initialization hierarchy challenge with the
>>>> memory window context. I'm open to suggestions.  
>>>
>>> In my view, what is written in 061a785a114f looks like the most viable long-term
>>> direction:
>>>
>>>     A potential future solution may be having the DMA mapping API providing a
>>>     way to alias an existing IOVA mapping to a new device perhaps.
>>>
>>> I do not immediately see a more practical alternative. E.g., deferring MW
>>> inbound mapping until ntb_transport_create_queue() would require a substantial
>>> rework, since dma_chan is determined per-QP at that stage and the mapping would
>>> become dynamic per subrange. I doubt it would be worth doing or acceptable.
>>> Pre-allocating dma_chans only for this purpose also seems excessive.
>>>
>>> So I agree that (2) needs a clean-up eventually. However, in my opinion the
>>> problem this series tries to solve is independent, and the approach here does
>>> not interfere with that direction.
>>
>> Fair assessment. For the series:
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> 
> Thanks for the review.
> 
> Once this looks good to Jon as well and gets queued in the NTB tree, I'll submit
> a small patch to PCI EP for vNTB (the real user of the interface), something
> like the following:
> 
> 
> diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> index be6c03f4516e..8aeacbae8b77 100644
> --- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
> +++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> @@ -1501,6 +1501,15 @@ static int vntb_epf_link_disable(struct ntb_dev *ntb)
>         return 0;
>  }
> 
> +static struct device *vntb_epf_get_dma_dev(struct ntb_dev *ndev)
> +{
> +       struct epf_ntb *ntb = ntb_ndev(ndev);
> +
> +       if (!ntb || !ntb->epf)
> +               return NULL;
> +       return ntb->epf->epc->dev.parent;
> +}
> +
>  static const struct ntb_dev_ops vntb_epf_ops = {
>         .mw_count               = vntb_epf_mw_count,
>         .spad_count             = vntb_epf_spad_count,
> @@ -1522,6 +1531,7 @@ static const struct ntb_dev_ops vntb_epf_ops = {
>         .db_clear_mask          = vntb_epf_db_clear_mask,
>         .db_clear               = vntb_epf_db_clear,
>         .link_disable           = vntb_epf_link_disable,
> +       .get_dma_dev            = vntb_epf_get_dma_dev,
>  };
> 
>  static int pci_vntb_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> 
> 

Probably should include it with this series if it's small. Having the user with new code is usually preferred.

Re: [PATCH 0/2] NTB: Allow drivers to provide DMA mapping device

Posted by Koichiro Den 1 month ago

On Wed, Mar 04, 2026 at 09:53:42AM -0700, Dave Jiang wrote:
> 
> 
> On 3/4/26 8:56 AM, Koichiro Den wrote:
> > On Tue, Mar 03, 2026 at 08:42:53AM -0700, Dave Jiang wrote:
> >>
> >>
> >> On 3/2/26 9:56 PM, Koichiro Den wrote:
> >>> On Mon, Mar 02, 2026 at 09:52:08AM -0700, Dave Jiang wrote:
> >>>>
> >>>>
> >>>> On 3/2/26 7:45 AM, Koichiro Den wrote:
> >>>>> Some NTB implementations are backed by a "virtual" PCI device, while the
> >>>>> actual DMA mapping context (IOMMU domain) belongs to a different device.
> >>>>>
> >>>>> One example is vNTB, where the NTB device is represented as a virtual
> >>>>> PCI endpoint function, but DMA operations must be performed against the
> >>>>> EPC parent device, which owns the IOMMU context.
> >>>>>
> >>>>> Today, ntb_transport implicitly relies on the NTB device's parent device
> >>>>> as the DMA mapping device. This works for most PCIe NTB hardware, but
> >>>>> breaks implementations where the NTB PCI function is not the correct
> >>>>> device to use for DMA API operations.
> >>>>
> >>>> Actually it doesn't quite work. This resulted in 061a785a114f ("ntb: Force
> >>>> physically contiguous allocation of rx ring buffers"). As you can see it
> >>>> tries to get around the issue as a temp measure. The main issue is the
> >>>> memory window buffer is allocated before the dmaengine devices are allocated.
> >>>> So the buffer is mapped against the NTB device rather than the DMA device.
> >>>> So I think we may need to come up with a better scheme to clean up this
> >>>> issue as some of the current NTBs can utilize this change as well.
> >>>
> >>> Thanks for the feedback.
> >>>
> >>> I think there are two issues which are related but separable:
> >>>
> >>> - 1). Ensuring the correct DMA-mapping device is used for the MW translation
> >>>       (i.e. inbound accesses from the peer).
> >>> - 2). RX-side DMA memcpy re-maps the MW source buffer against the dmaengine
> >>>       device ("double mapping").
> >>>
> >>> (1) is what this series is addressing. I think this series does not worsen (2).
> >>> I agree that (2) should be improved eventually.
> >>>
> >>> (Note that in some setups such as vNTB, the device returned by ntb_get_dma_dev()
> >>> can be the same as chan->device->dev, in that case the double mapping could be
> >>> optimized away. However, I undersntand that you are talking about a more
> >>> fundamental improvement.)
> >>>
> >>>>
> >>>> The per queue DMA device presents an initialization hierarchy challenge with the
> >>>> memory window context. I'm open to suggestions.  
> >>>
> >>> In my view, what is written in 061a785a114f looks like the most viable long-term
> >>> direction:
> >>>
> >>>     A potential future solution may be having the DMA mapping API providing a
> >>>     way to alias an existing IOVA mapping to a new device perhaps.
> >>>
> >>> I do not immediately see a more practical alternative. E.g., deferring MW
> >>> inbound mapping until ntb_transport_create_queue() would require a substantial
> >>> rework, since dma_chan is determined per-QP at that stage and the mapping would
> >>> become dynamic per subrange. I doubt it would be worth doing or acceptable.
> >>> Pre-allocating dma_chans only for this purpose also seems excessive.
> >>>
> >>> So I agree that (2) needs a clean-up eventually. However, in my opinion the
> >>> problem this series tries to solve is independent, and the approach here does
> >>> not interfere with that direction.
> >>
> >> Fair assessment. For the series:
> >> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> > 
> > Thanks for the review.
> > 
> > Once this looks good to Jon as well and gets queued in the NTB tree, I'll submit
> > a small patch to PCI EP for vNTB (the real user of the interface), something
> > like the following:
> > 
> > 
> > diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> > index be6c03f4516e..8aeacbae8b77 100644
> > --- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
> > +++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> > @@ -1501,6 +1501,15 @@ static int vntb_epf_link_disable(struct ntb_dev *ntb)
> >         return 0;
> >  }
> > 
> > +static struct device *vntb_epf_get_dma_dev(struct ntb_dev *ndev)
> > +{
> > +       struct epf_ntb *ntb = ntb_ndev(ndev);
> > +
> > +       if (!ntb || !ntb->epf)
> > +               return NULL;
> > +       return ntb->epf->epc->dev.parent;
> > +}
> > +
> >  static const struct ntb_dev_ops vntb_epf_ops = {
> >         .mw_count               = vntb_epf_mw_count,
> >         .spad_count             = vntb_epf_spad_count,
> > @@ -1522,6 +1531,7 @@ static const struct ntb_dev_ops vntb_epf_ops = {
> >         .db_clear_mask          = vntb_epf_db_clear_mask,
> >         .db_clear               = vntb_epf_db_clear,
> >         .link_disable           = vntb_epf_link_disable,
> > +       .get_dma_dev            = vntb_epf_get_dma_dev,
> >  };
> > 
> >  static int pci_vntb_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > 
> > 
> 
> Probably should include it with this series if it's small. Having the user with new code is usually preferred.

I thought that, since the vNTB patch wouldn't work until the NTB changes are in,
asking both the NTB and PCI EP maintainers to coordinate the apply order might
be a bit awkward.

That said, if preferable, I can include the vNTB change in this series and
explicitly ask the PCI EP maintainers not to pick up (new) Patch 3 until the NTB
maintainers have acked and applied Patch 1-2.

I'd also appreciate any thoughts from Jon or others on this (i.e. keeping
this series NTB tree-only vs. including the vNTB change as well), as well
as any feedback on this v1 series itself.

P.S. I sent a corrected code snippet a few minutes after my original post. The
original snippet above was wrong, as it would violate the kernel-doc in Patch 1:

  "Drivers that implement .get_dma_dev() must return a non-NULL pointer."

Best regards,
Koichiro

>

Re: [PATCH 0/2] NTB: Allow drivers to provide DMA mapping device

Posted by Dave Jiang 1 month ago


On 3/4/26 8:23 PM, Koichiro Den wrote:
> On Wed, Mar 04, 2026 at 09:53:42AM -0700, Dave Jiang wrote:

<snip>

>> Probably should include it with this series if it's small. Having the user with new code is usually preferred.
> 
> I thought that, since the vNTB patch wouldn't work until the NTB changes are in,
> asking both the NTB and PCI EP maintainers to coordinate the apply order might
> be a bit awkward.
> 
> That said, if preferable, I can include the vNTB change in this series and
> explicitly ask the PCI EP maintainers not to pick up (new) Patch 3 until the NTB
> maintainers have acked and applied Patch 1-2.

Given that most of the patches are PCI EP, I think with acks from NTB, the whole thing can go through PCI EP if that works for you.

DJ

> 
> I'd also appreciate any thoughts from Jon or others on this (i.e. keeping
> this series NTB tree-only vs. including the vNTB change as well), as well
> as any feedback on this v1 series itself.
> 
> P.S. I sent a corrected code snippet a few minutes after my original post. The
> original snippet above was wrong, as it would violate the kernel-doc in Patch 1:
> 
>   "Drivers that implement .get_dma_dev() must return a non-NULL pointer."
> 
> Best regards,
> Koichiro
> 
>>

Re: [PATCH 0/2] NTB: Allow drivers to provide DMA mapping device

Posted by Koichiro Den 1 month ago

On Thu, Mar 05, 2026 at 12:56:12AM +0900, Koichiro Den wrote:
> On Tue, Mar 03, 2026 at 08:42:53AM -0700, Dave Jiang wrote:
> > 
> > 
> > On 3/2/26 9:56 PM, Koichiro Den wrote:
> > > On Mon, Mar 02, 2026 at 09:52:08AM -0700, Dave Jiang wrote:
> > >>
> > >>
> > >> On 3/2/26 7:45 AM, Koichiro Den wrote:
> > >>> Some NTB implementations are backed by a "virtual" PCI device, while the
> > >>> actual DMA mapping context (IOMMU domain) belongs to a different device.
> > >>>
> > >>> One example is vNTB, where the NTB device is represented as a virtual
> > >>> PCI endpoint function, but DMA operations must be performed against the
> > >>> EPC parent device, which owns the IOMMU context.
> > >>>
> > >>> Today, ntb_transport implicitly relies on the NTB device's parent device
> > >>> as the DMA mapping device. This works for most PCIe NTB hardware, but
> > >>> breaks implementations where the NTB PCI function is not the correct
> > >>> device to use for DMA API operations.
> > >>
> > >> Actually it doesn't quite work. This resulted in 061a785a114f ("ntb: Force
> > >> physically contiguous allocation of rx ring buffers"). As you can see it
> > >> tries to get around the issue as a temp measure. The main issue is the
> > >> memory window buffer is allocated before the dmaengine devices are allocated.
> > >> So the buffer is mapped against the NTB device rather than the DMA device.
> > >> So I think we may need to come up with a better scheme to clean up this
> > >> issue as some of the current NTBs can utilize this change as well.
> > > 
> > > Thanks for the feedback.
> > > 
> > > I think there are two issues which are related but separable:
> > > 
> > > - 1). Ensuring the correct DMA-mapping device is used for the MW translation
> > >       (i.e. inbound accesses from the peer).
> > > - 2). RX-side DMA memcpy re-maps the MW source buffer against the dmaengine
> > >       device ("double mapping").
> > > 
> > > (1) is what this series is addressing. I think this series does not worsen (2).
> > > I agree that (2) should be improved eventually.
> > > 
> > > (Note that in some setups such as vNTB, the device returned by ntb_get_dma_dev()
> > > can be the same as chan->device->dev, in that case the double mapping could be
> > > optimized away. However, I undersntand that you are talking about a more
> > > fundamental improvement.)
> > > 
> > >>
> > >> The per queue DMA device presents an initialization hierarchy challenge with the
> > >> memory window context. I'm open to suggestions.  
> > > 
> > > In my view, what is written in 061a785a114f looks like the most viable long-term
> > > direction:
> > > 
> > >     A potential future solution may be having the DMA mapping API providing a
> > >     way to alias an existing IOVA mapping to a new device perhaps.
> > > 
> > > I do not immediately see a more practical alternative. E.g., deferring MW
> > > inbound mapping until ntb_transport_create_queue() would require a substantial
> > > rework, since dma_chan is determined per-QP at that stage and the mapping would
> > > become dynamic per subrange. I doubt it would be worth doing or acceptable.
> > > Pre-allocating dma_chans only for this purpose also seems excessive.
> > > 
> > > So I agree that (2) needs a clean-up eventually. However, in my opinion the
> > > problem this series tries to solve is independent, and the approach here does
> > > not interfere with that direction.
> > 
> > Fair assessment. For the series:
> > Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> 
> Thanks for the review.
> 
> Once this looks good to Jon as well and gets queued in the NTB tree, I'll submit
> a small patch to PCI EP for vNTB (the real user of the interface), something
> like the following:
> 
> 
> diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> index be6c03f4516e..8aeacbae8b77 100644
> --- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
> +++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> @@ -1501,6 +1501,15 @@ static int vntb_epf_link_disable(struct ntb_dev *ntb)
>         return 0;
>  }
> 
> +static struct device *vntb_epf_get_dma_dev(struct ntb_dev *ndev)
> +{
> +       struct epf_ntb *ntb = ntb_ndev(ndev);
> +
> +       if (!ntb || !ntb->epf)
> +               return NULL;
> +       return ntb->epf->epc->dev.parent;
> +}
> +
>  static const struct ntb_dev_ops vntb_epf_ops = {
>         .mw_count               = vntb_epf_mw_count,
>         .spad_count             = vntb_epf_spad_count,
> @@ -1522,6 +1531,7 @@ static const struct ntb_dev_ops vntb_epf_ops = {
>         .db_clear_mask          = vntb_epf_db_clear_mask,
>         .db_clear               = vntb_epf_db_clear,
>         .link_disable           = vntb_epf_link_disable,
> +       .get_dma_dev            = vntb_epf_get_dma_dev,
>  };
> 
>  static int pci_vntb_probe(struct pci_dev *pdev, const struct pci_device_id *id)

No, sorry, my mistake. That was incorrect. It should look like the following:


diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
index 20a400e83439..e5433404f573 100644
--- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
+++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
@@ -1436,6 +1436,14 @@ static int vntb_epf_link_disable(struct ntb_dev *ntb)
        return 0;
 }

+static struct device *vntb_epf_get_dma_dev(struct ntb_dev *ndev)
+{
+       struct epf_ntb *ntb = ntb_ndev(ndev);
+       struct pci_epc *epc = ntb->epf->epc;
+
+       return epc->dev.parent;
+}
+
 static const struct ntb_dev_ops vntb_epf_ops = {
        .mw_count               = vntb_epf_mw_count,
        .spad_count             = vntb_epf_spad_count,
@@ -1457,6 +1465,7 @@ static const struct ntb_dev_ops vntb_epf_ops = {
        .db_clear_mask          = vntb_epf_db_clear_mask,
        .db_clear               = vntb_epf_db_clear,
        .link_disable           = vntb_epf_link_disable,
+       .get_dma_dev            = vntb_epf_get_dma_dev,
 };

 static int pci_vntb_probe(struct pci_dev *pdev, const struct pci_device_id *id)


Sorry for the noise.

Best regards,
Koichiro

> 
> 
> Best regards,
> Koichiro
> 
> > 
> > > 
> > > Best regards,
> > > Koichiro
> > > 
> > >>
> > >> DJ
> > >>
> > >>>
> > >>> This small series introduces an optional .get_dma_dev() callback in
> > >>> struct ntb_dev_ops, together with a helper ntb_get_dma_dev(). If the
> > >>> callback is not implemented, the helper falls back to the existing
> > >>> default behavior. Drivers that implement .get_dma_dev() must return a
> > >>> non-NULL struct device.
> > >>>
> > >>> - Patch 1/2: Add .get_dma_dev() to struct ntb_dev_ops and provide
> > >>>              ntb_get_dma_dev().
> > >>>
> > >>> - Patch 2/2: Switch ntb_transport coherent allocations and frees to use
> > >>>              ntb_get_dma_dev().
> > >>>
> > >>> No functional changes are intended by this series itself.
> > >>>
> > >>> A follow-up patch implementing .get_dma_dev() for the vNTB EPF driver
> > >>> (drivers/pci/endpoint/functions/pci-epf-vntb.c) will be submitted
> > >>> separately to the PCI Endpoint subsystem tree. That will enable
> > >>> ntb_transport to work correctly in IOMMU-backed EPC setups.
> > >>>
> > >>> Best regards,
> > >>> Koichiro
> > >>>
> > >>>
> > >>> Koichiro Den (2):
> > >>>   NTB: core: Add .get_dma_dev() callback to ntb_dev_ops
> > >>>   NTB: ntb_transport: Use ntb_get_dma_dev() for DMA buffers
> > >>>
> > >>>  drivers/ntb/ntb_transport.c | 14 +++++++-------
> > >>>  include/linux/ntb.h         | 23 +++++++++++++++++++++++
> > >>>  2 files changed, 30 insertions(+), 7 deletions(-)
> > >>>
> > >>
> > > 
> >