From nobody Thu Oct 2 21:39:17 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A4A4C1D5146; Thu, 11 Sep 2025 11:33:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757590433; cv=none; b=lYASKfU2JmDQfXmSA09bxHyjb7myOW49JhI7TpSfWYfvVIZT3aHIr9gz9AUEZW+2EBimoG+pL5bhay82QSq1t0VUuzIQI4BWtSCzGz4Nb+bJMFcpPQDcMaOg3US5JXwbUNRhNR0xWphNJFBBlIjc1wkMmAlkmlN4qwYk54r3Upk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757590433; c=relaxed/simple; bh=/jWnckJcyHdF3R+tvIGXmaDkojDLmzwRv9Qj7ZAK2Iw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cFY/HWz6X7EPaOwJMLUsfdHxwTLpIINgFRnnmjCe7LvJSeYtSiEXXVvTk0Es3YUp55UfZGh9gmXkU/iSx3kQRGi9SzwZHVKQremTuc0UlfcES65vyol012EXKaUU945r9R+/4NRZTerPXYYRBnj4/NGfLQgLbYrKrxIoNWjRgRk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=XGE9VT7K; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="XGE9VT7K" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 96608C4CEF0; Thu, 11 Sep 2025 11:33:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1757590433; bh=/jWnckJcyHdF3R+tvIGXmaDkojDLmzwRv9Qj7ZAK2Iw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=XGE9VT7KJjAu2tRs6J1Zms4gR3Ujryz6sFXPRbUg1pZJCkSold7ED7XNZVQYxkyN7 KNhmVOvpQv/xZcDdU1sLYJ9bP6EjsRVu2S/zBqecECEehCEUxCzuyg5/7R3A2Intaj ZECk3YarsNGSXJe8Kg4U985Ph453wRa+y0A2xI4f0KzXi43sTQUUT/e6Ea60wubMc9 VBbeDnZ98U0zsVtnV9tGCi2rMA3eTWtqevZCgWOpqKcEhN8Cn2YDcMtkcHveJ3KFGU 7blZktY5W10rpc+XFQXEDt5nnDWj9kH8s5Rw9LmZjtllc+u4rX6Jys6qxLfH8ZvgQ8 aaeEgANRUGKRw== From: Leon Romanovsky To: Alex Williamson Cc: Leon Romanovsky , Jason Gunthorpe , Andrew Morton , Bjorn Helgaas , =?UTF-8?q?Christian=20K=C3=B6nig?= , dri-devel@lists.freedesktop.org, iommu@lists.linux.dev, Jens Axboe , Joerg Roedel , kvm@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, linux-mm@kvack.org, linux-pci@vger.kernel.org, Logan Gunthorpe , Marek Szyprowski , Robin Murphy , Sumit Semwal , Vivek Kasireddy , Will Deacon Subject: [PATCH v2 03/10] PCI/P2PDMA: Refactor to separate core P2P functionality from memory allocation Date: Thu, 11 Sep 2025 14:33:07 +0300 Message-ID: <1e2cb89ea76a92949d06a804e3ab97478e7cacbb.1757589589.git.leon@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Leon Romanovsky Refactor the PCI P2PDMA subsystem to separate the core peer-to-peer DMA functionality from the optional memory allocation layer. This creates a two-tier architecture: The core layer provides P2P mapping functionality for physical addresses based on PCI device MMIO BARs and integrates with the DMA API for mapping operations. This layer is required for all P2PDMA users. The optional upper layer provides memory allocation capabilities including gen_pool allocator, struct page support, and sysfs interface for user space access. This separation allows subsystems like VFIO to use only the core P2P mapping functionality without the overhead of memory allocation features they don't need. The core functionality is now available through the new pci_p2pdma_enable() function that returns a p2pdma_provider structure. Signed-off-by: Leon Romanovsky --- drivers/pci/p2pdma.c | 129 +++++++++++++++++++++++++++---------- include/linux/pci-p2pdma.h | 5 ++ 2 files changed, 100 insertions(+), 34 deletions(-) diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c index 176a99232fdca..c22cbb3a26030 100644 --- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -25,11 +25,12 @@ struct pci_p2pdma { struct gen_pool *pool; bool p2pmem_published; struct xarray map_types; + struct p2pdma_provider mem[PCI_STD_NUM_BARS]; }; =20 struct pci_p2pdma_pagemap { struct dev_pagemap pgmap; - struct p2pdma_provider mem; + struct p2pdma_provider *mem; }; =20 static struct pci_p2pdma_pagemap *to_p2p_pgmap(struct dev_pagemap *pgmap) @@ -204,7 +205,7 @@ static void p2pdma_page_free(struct page *page) struct pci_p2pdma_pagemap *pgmap =3D to_p2p_pgmap(page_pgmap(page)); /* safe to dereference while a reference is held to the percpu ref */ struct pci_p2pdma *p2pdma =3D rcu_dereference_protected( - to_pci_dev(pgmap->mem.owner)->p2pdma, 1); + to_pci_dev(pgmap->mem->owner)->p2pdma, 1); struct percpu_ref *ref; =20 gen_pool_free_owner(p2pdma->pool, (uintptr_t)page_to_virt(page), @@ -227,44 +228,93 @@ static void pci_p2pdma_release(void *data) =20 /* Flush and disable pci_alloc_p2p_mem() */ pdev->p2pdma =3D NULL; - synchronize_rcu(); + if (p2pdma->pool) + synchronize_rcu(); + xa_destroy(&p2pdma->map_types); + + if (!p2pdma->pool) + return; =20 gen_pool_destroy(p2pdma->pool); sysfs_remove_group(&pdev->dev.kobj, &p2pmem_group); - xa_destroy(&p2pdma->map_types); } =20 -static int pci_p2pdma_setup(struct pci_dev *pdev) +/** + * pcim_p2pdma_enable - Enable peer-to-peer DMA support for a PCI device + * @pdev: The PCI device to enable P2PDMA for + * @bar: BAR index to get provider + * + * This function initializes the peer-to-peer DMA infrastructure for a PCI + * device. It allocates and sets up the necessary data structures to suppo= rt + * P2PDMA operations, including mapping type tracking. + */ +struct p2pdma_provider *pcim_p2pdma_enable(struct pci_dev *pdev, int bar) { - int error =3D -ENOMEM; struct pci_p2pdma *p2p; + int i, ret; + + p2p =3D rcu_dereference_protected(pdev->p2pdma, 1); + if (p2p) + /* PCI device was "rebound" to the driver */ + return &p2p->mem[bar]; =20 p2p =3D devm_kzalloc(&pdev->dev, sizeof(*p2p), GFP_KERNEL); if (!p2p) - return -ENOMEM; + return ERR_PTR(-ENOMEM); =20 xa_init(&p2p->map_types); + /* + * Iterate over all standard PCI BARs and record only those that + * correspond to MMIO regions. Skip non-memory resources (e.g. I/O + * port BARs) since they cannot be used for peer-to-peer (P2P) + * transactions. + */ + for (i =3D 0; i < PCI_STD_NUM_BARS; i++) { + if (!(pci_resource_flags(pdev, i) & IORESOURCE_MEM)) + continue; =20 - p2p->pool =3D gen_pool_create(PAGE_SHIFT, dev_to_node(&pdev->dev)); - if (!p2p->pool) - goto out; + p2p->mem[i].owner =3D &pdev->dev; + p2p->mem[i].bus_offset =3D + pci_bus_address(pdev, i) - pci_resource_start(pdev, i); + } =20 - error =3D devm_add_action_or_reset(&pdev->dev, pci_p2pdma_release, pdev); - if (error) - goto out_pool_destroy; + ret =3D devm_add_action_or_reset(&pdev->dev, pci_p2pdma_release, pdev); + if (ret) + goto out_p2p; =20 - error =3D sysfs_create_group(&pdev->dev.kobj, &p2pmem_group); - if (error) + rcu_assign_pointer(pdev->p2pdma, p2p); + return &p2p->mem[bar]; + +out_p2p: + devm_kfree(&pdev->dev, p2p); + return ERR_PTR(ret); +} +EXPORT_SYMBOL_GPL(pcim_p2pdma_enable); + +static int pci_p2pdma_setup_pool(struct pci_dev *pdev) +{ + struct pci_p2pdma *p2pdma; + int ret; + + p2pdma =3D rcu_dereference_protected(pdev->p2pdma, 1); + if (p2pdma->pool) + /* We already setup pools, do nothing, */ + return 0; + + p2pdma->pool =3D gen_pool_create(PAGE_SHIFT, dev_to_node(&pdev->dev)); + if (!p2pdma->pool) + return -ENOMEM; + + ret =3D sysfs_create_group(&pdev->dev.kobj, &p2pmem_group); + if (ret) goto out_pool_destroy; =20 - rcu_assign_pointer(pdev->p2pdma, p2p); return 0; =20 out_pool_destroy: - gen_pool_destroy(p2p->pool); -out: - devm_kfree(&pdev->dev, p2p); - return error; + gen_pool_destroy(p2pdma->pool); + p2pdma->pool =3D NULL; + return ret; } =20 static void pci_p2pdma_unmap_mappings(void *data) @@ -276,7 +326,7 @@ static void pci_p2pdma_unmap_mappings(void *data) * unmap_mapping_range() on the inode, teardown any existing userspace * mappings and prevent new ones from being created. */ - sysfs_remove_file_from_group(&p2p_pgmap->mem.owner->kobj, + sysfs_remove_file_from_group(&p2p_pgmap->mem->owner->kobj, &p2pmem_alloc_attr.attr, p2pmem_group.name); } @@ -295,6 +345,7 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int b= ar, size_t size, u64 offset) { struct pci_p2pdma_pagemap *p2p_pgmap; + struct p2pdma_provider *mem; struct dev_pagemap *pgmap; struct pci_p2pdma *p2pdma; void *addr; @@ -312,15 +363,25 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int= bar, size_t size, if (size + offset > pci_resource_len(pdev, bar)) return -EINVAL; =20 - if (!pdev->p2pdma) { - error =3D pci_p2pdma_setup(pdev); + p2pdma =3D rcu_dereference_protected(pdev->p2pdma, 1); + if (!p2pdma) { + mem =3D pcim_p2pdma_enable(pdev, bar); + if (IS_ERR(mem)) + return PTR_ERR(mem); + + error =3D pci_p2pdma_setup_pool(pdev); if (error) return error; - } + + p2pdma =3D rcu_dereference_protected(pdev->p2pdma, 1); + } else + mem =3D &p2pdma->mem[bar]; =20 p2p_pgmap =3D devm_kzalloc(&pdev->dev, sizeof(*p2p_pgmap), GFP_KERNEL); - if (!p2p_pgmap) - return -ENOMEM; + if (!p2p_pgmap) { + error =3D -ENOMEM; + goto free_pool; + } =20 pgmap =3D &p2p_pgmap->pgmap; pgmap->range.start =3D pci_resource_start(pdev, bar) + offset; @@ -328,9 +389,7 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int b= ar, size_t size, pgmap->nr_range =3D 1; pgmap->type =3D MEMORY_DEVICE_PCI_P2PDMA; pgmap->ops =3D &p2pdma_pgmap_ops; - p2p_pgmap->mem.owner =3D &pdev->dev; - p2p_pgmap->mem.bus_offset =3D - pci_bus_address(pdev, bar) - pci_resource_start(pdev, bar); + p2p_pgmap->mem =3D mem; =20 addr =3D devm_memremap_pages(&pdev->dev, pgmap); if (IS_ERR(addr)) { @@ -343,7 +402,6 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int b= ar, size_t size, if (error) goto pages_free; =20 - p2pdma =3D rcu_dereference_protected(pdev->p2pdma, 1); error =3D gen_pool_add_owner(p2pdma->pool, (unsigned long)addr, pci_bus_address(pdev, bar) + offset, range_len(&pgmap->range), dev_to_node(&pdev->dev), @@ -359,7 +417,10 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int = bar, size_t size, pages_free: devm_memunmap_pages(&pdev->dev, pgmap); pgmap_free: - devm_kfree(&pdev->dev, pgmap); + devm_kfree(&pdev->dev, p2p_pgmap); +free_pool: + sysfs_remove_group(&pdev->dev.kobj, &p2pmem_group); + gen_pool_destroy(p2pdma->pool); return error; } EXPORT_SYMBOL_GPL(pci_p2pdma_add_resource); @@ -1008,11 +1069,11 @@ void __pci_p2pdma_update_state(struct pci_p2pdma_ma= p_state *state, { struct pci_p2pdma_pagemap *p2p_pgmap =3D to_p2p_pgmap(page_pgmap(page)); =20 - if (state->mem =3D=3D &p2p_pgmap->mem) + if (state->mem =3D=3D p2p_pgmap->mem) return; =20 - state->mem =3D &p2p_pgmap->mem; - state->map =3D pci_p2pdma_map_type(&p2p_pgmap->mem, dev); + state->mem =3D p2p_pgmap->mem; + state->map =3D pci_p2pdma_map_type(p2p_pgmap->mem, dev); } =20 /** diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h index eef96636c67e6..888ad7b0c54cf 100644 --- a/include/linux/pci-p2pdma.h +++ b/include/linux/pci-p2pdma.h @@ -27,6 +27,7 @@ struct p2pdma_provider { }; =20 #ifdef CONFIG_PCI_P2PDMA +struct p2pdma_provider *pcim_p2pdma_enable(struct pci_dev *pdev, int bar); int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, u64 offset); int pci_p2pdma_distance_many(struct pci_dev *provider, struct device **cli= ents, @@ -45,6 +46,10 @@ int pci_p2pdma_enable_store(const char *page, struct pci= _dev **p2p_dev, ssize_t pci_p2pdma_enable_show(char *page, struct pci_dev *p2p_dev, bool use_p2pdma); #else /* CONFIG_PCI_P2PDMA */ +static inline struct p2pdma_provider *pcim_p2pdma_enable(struct pci_dev *p= dev, int bar) +{ + return ERR_PTR(-EOPNOTSUPP); +} static inline int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, u64 offset) { --=20 2.51.0