[RFC PATCH] nvme: enable PCI P2PDMA support for RDMA transport

Shivaji Kant posted 1 patch 10 hours ago
drivers/nvme/host/core.c |  9 +++++++++
drivers/nvme/host/rdma.c | 15 +++++++++++++++
2 files changed, 24 insertions(+)
[RFC PATCH] nvme: enable PCI P2PDMA support for RDMA transport
Posted by Shivaji Kant 10 hours ago
Enable BLK_FEAT_PCI_P2PDMA on the NVMe when the underlying
RDMA controller supports it.

blk_stack_limits() currently filters out this feature bit because it is
absent from BLK_FEAT_INHERIT_MASK. Manually re-assert the capability
in nvme_update_ns_info() after the stacking operation.

Hardware reachability remains enforced by late-stage distance checks
during DMA mapping.

Suggested-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Shivaji Kant <shivajikant@google.com>
---
 drivers/nvme/host/core.c |  9 +++++++++
 drivers/nvme/host/rdma.c | 15 +++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index a2da54f974fa..0d7b0f286895 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2205,6 +2205,15 @@ static int nvme_update_ns_info_generic(struct nvme_ns *ns,
 	nvme_set_ctrl_limits(ns->ctrl, &lim, false);
 
 	memflags = blk_mq_freeze_queue(ns->disk->queue);
+
+	/*
+	 * Explicitly check for P2PDMA support as BLK_FEAT_PCI_P2PDMA
+	 * is filtered out by queue_limits_stack_bdev().
+	 */
+	if (ns->ctrl->ops->supports_pci_p2pdma &&
+	   ns->ctrl->ops->supports_pci_p2pdma(ns->ctrl))
+		lim.features |= BLK_FEAT_PCI_P2PDMA;
+
 	ret = queue_limits_commit_update(ns->disk->queue, &lim);
 	set_disk_ro(ns->disk, nvme_ns_is_readonly(ns, info));
 	blk_mq_unfreeze_queue(ns->disk->queue, memflags);
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 35c0822edb2d..3ce6f3e476b0 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -2189,6 +2189,20 @@ static void nvme_rdma_reset_ctrl_work(struct work_struct *work)
 	nvme_rdma_reconnect_or_remove(ctrl, ret);
 }
 
+static bool nvme_rdma_supports_pci_p2pdma(struct nvme_ctrl *ctrl)
+{
+	struct nvme_rdma_ctrl *r_ctrl = to_rdma_ctrl(ctrl);
+	bool supported = false;
+
+	if (r_ctrl && r_ctrl->device)
+		supported = ib_dma_pci_p2p_dma_supported(r_ctrl->device->dev);
+
+	dev_dbg(ctrl->device, "PCI P2PDMA support result: %s\n",
+			supported ? "PASSED" : "FAILED (HW/Driver restriction)");
+
+	return supported;
+}
+
 static const struct nvme_ctrl_ops nvme_rdma_ctrl_ops = {
 	.name			= "rdma",
 	.module			= THIS_MODULE,
@@ -2203,6 +2217,7 @@ static const struct nvme_ctrl_ops nvme_rdma_ctrl_ops = {
 	.get_address		= nvmf_get_address,
 	.stop_ctrl		= nvme_rdma_stop_ctrl,
 	.get_virt_boundary	= nvme_get_virt_boundary,
+	.supports_pci_p2pdma	= nvme_rdma_supports_pci_p2pdma,
 };
 
 /*
-- 
2.53.0.1213.gd9a14994de-goog
Re: [RFC PATCH] nvme: enable PCI P2PDMA support for RDMA transport
Posted by Christoph Hellwig 6 hours ago
On Wed, Apr 01, 2026 at 10:34:41AM +0000, Shivaji Kant wrote:
> Enable BLK_FEAT_PCI_P2PDMA on the NVMe when the underlying
> RDMA controller supports it.
> 
> blk_stack_limits() currently filters out this feature bit because it is
> absent from BLK_FEAT_INHERIT_MASK. Manually re-assert the capability
> in nvme_update_ns_info() after the stacking operation.

This is really two different features/fixes and should be two patches.
Note that Chaitanya also has an outstanding patch about p2p on multipath,
so please work with him.

> Hardware reachability remains enforced by late-stage distance checks
> during DMA mapping.

I don't know what this is supposed to mean.  Callers need to check the
reachability first before submitting P2P I/O.

> +static bool nvme_rdma_supports_pci_p2pdma(struct nvme_ctrl *ctrl)
> +{
> +	struct nvme_rdma_ctrl *r_ctrl = to_rdma_ctrl(ctrl);
> +	bool supported = false;
> +
> +	if (r_ctrl && r_ctrl->device)

to_rdma_ctrl is a wrapper around container_of, so r_ctrl can't be
NULL for a non-NULL ctrl.  ->device also should not NULL because it
is set up before namespaces are probed.

> +		supported = ib_dma_pci_p2p_dma_supported(r_ctrl->device->dev);
> +
> +	dev_dbg(ctrl->device, "PCI P2PDMA support result: %s\n",
> +			supported ? "PASSED" : "FAILED (HW/Driver restriction)");

Overly long line, and screaming isn't really something we do in our
messages.  We also don't do that debug message in PCI, so please just
drop it.  IF you think this is important enough add a tracepoint in the
core code in a separate patch.
Re: [RFC PATCH] nvme: enable PCI P2PDMA support for RDMA transport
Posted by Pranjal Shrivastava 3 hours ago
On Wed, Apr 01, 2026 at 04:17:06PM +0200, Christoph Hellwig wrote:
> On Wed, Apr 01, 2026 at 10:34:41AM +0000, Shivaji Kant wrote:
> > Enable BLK_FEAT_PCI_P2PDMA on the NVMe when the underlying
> > RDMA controller supports it.
> > 
> > blk_stack_limits() currently filters out this feature bit because it is
> > absent from BLK_FEAT_INHERIT_MASK. Manually re-assert the capability
> > in nvme_update_ns_info() after the stacking operation.
> 
> This is really two different features/fixes and should be two patches.
> Note that Chaitanya also has an outstanding patch about p2p on multipath,
> so please work with him.
> 

Ack. 
Shivaji, I believe this [1] is the patch Christoph's referring to.

> > Hardware reachability remains enforced by late-stage distance checks
> > during DMA mapping.
> 
> I don't know what this is supposed to mean.  Callers need to check the
> reachability first before submitting P2P I/O.
> 
> > +static bool nvme_rdma_supports_pci_p2pdma(struct nvme_ctrl *ctrl)
> > +{
> > +	struct nvme_rdma_ctrl *r_ctrl = to_rdma_ctrl(ctrl);
> > +	bool supported = false;
> > +
> > +	if (r_ctrl && r_ctrl->device)
> 
> to_rdma_ctrl is a wrapper around container_of, so r_ctrl can't be
> NULL for a non-NULL ctrl.  ->device also should not NULL because it
> is set up before namespaces are probed.
> 
> > +		supported = ib_dma_pci_p2p_dma_supported(r_ctrl->device->dev);
> > +
> > +	dev_dbg(ctrl->device, "PCI P2PDMA support result: %s\n",
> > +			supported ? "PASSED" : "FAILED (HW/Driver restriction)");
> 
> Overly long line, and screaming isn't really something we do in our
> messages.  We also don't do that debug message in PCI, so please just
> drop it.  IF you think this is important enough add a tracepoint in the
> core code in a separate patch.
> 

+1, we should drop the log and add a TP if necessary.

Thanks,
Praan

[1] https://lore.kernel.org/all/20260323234416.46944-3-kch@nvidia.com/
Re: [RFC PATCH] nvme: enable PCI P2PDMA support for RDMA transport
Posted by Shivaji Kant 2 hours ago
Hi,
Thanks for the reviews.

On Wed, Apr 1, 2026 at 11:13 PM Pranjal Shrivastava <praan@google.com> wrote:
>
> On Wed, Apr 01, 2026 at 04:17:06PM +0200, Christoph Hellwig wrote:
> > On Wed, Apr 01, 2026 at 10:34:41AM +0000, Shivaji Kant wrote:
> > > Enable BLK_FEAT_PCI_P2PDMA on the NVMe when the underlying
> > > RDMA controller supports it.
> > >
> > > blk_stack_limits() currently filters out this feature bit because it is
> > > absent from BLK_FEAT_INHERIT_MASK. Manually re-assert the capability
> > > in nvme_update_ns_info() after the stacking operation.
> >
> > This is really two different features/fixes and should be two patches.
> > Note that Chaitanya also has an outstanding patch about p2p on multipath,
> > so please work with him.
> >
>
> Ack.
> Shivaji, I believe this [1] is the patch Christoph's referring to.

Ack. Let me work with this.

>
> > > Hardware reachability remains enforced by late-stage distance checks
> > > during DMA mapping.
> >
> > I don't know what this is supposed to mean.  Callers need to check the
> > reachability first before submitting P2P I/O.
> >
> > > +static bool nvme_rdma_supports_pci_p2pdma(struct nvme_ctrl *ctrl)
> > > +{
> > > +   struct nvme_rdma_ctrl *r_ctrl = to_rdma_ctrl(ctrl);
> > > +   bool supported = false;
> > > +
> > > +   if (r_ctrl && r_ctrl->device)
> >
> > to_rdma_ctrl is a wrapper around container_of, so r_ctrl can't be
> > NULL for a non-NULL ctrl.  ->device also should not NULL because it
> > is set up before namespaces are probed.
> >
> > > +           supported = ib_dma_pci_p2p_dma_supported(r_ctrl->device->dev);
> > > +
> > > +   dev_dbg(ctrl->device, "PCI P2PDMA support result: %s\n",
> > > +                   supported ? "PASSED" : "FAILED (HW/Driver restriction)");
> >
> > Overly long line, and screaming isn't really something we do in our
> > messages.  We also don't do that debug message in PCI, so please just
> > drop it.  IF you think this is important enough add a tracepoint in the
> > core code in a separate patch.
> >
>
> +1, we should drop the log and add a TP if necessary.

Sure, sounds good. will incorporate these changes in v2. thanks.

>
> Thanks,
> Praan
>
> [1] https://lore.kernel.org/all/20260323234416.46944-3-kch@nvidia.com/

Regards
Shivaji