From nobody Mon Oct 6 08:22:08 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B9B572E5B2F; Wed, 23 Jul 2025 13:02:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753275720; cv=none; b=RFmXc6jzv3KOhJG2lJFSPhV92sWrl8AXhdjVw05+4LHWm7fJkdDRqNy+TjdHqlWin/mZNgztDQes7c7R5Ijgq3TxOOwiawdJO0+PUmmhR1UuO0YUUMYyjfzAcWvXNj8Wi5POhjG9MzeJcY4HldvWKeBxyIN7tju0Yc2pa5pH0Zs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753275720; c=relaxed/simple; bh=DdEPBcDoyZyAyQmgxVcA/USPwfaH12G0OefCJEr49ic=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PZCR194sXZBA8vAWt+FbE+TP/8knJtta7HK/RKYpDyJ+mBQvesXwKbYLYWOOYRLN2sPGh5yOwPc9IZgLjlqt8VhvgRwmpRlzASdE4tpfQxpWSyvg0a5PXeSTPE+Ym6SBZqKJutW8fljyBDOVFmyqq8h8m4rKUmDuLAv9+UNVU2I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ETK2qZcN; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ETK2qZcN" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 762A9C4CEE7; Wed, 23 Jul 2025 13:01:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1753275720; bh=DdEPBcDoyZyAyQmgxVcA/USPwfaH12G0OefCJEr49ic=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ETK2qZcN6+sQYJHSkoLVV0k/yH8twcRhKOYJgr93JPgv0nrXxZCzfFwSW1/G62IK+ dHXoqPt/Wx9t/L3G4sdDyxTDD3f4GLS20xqlO/uiD8S/BJbeMCdXOYPXBGSX5rWv8h uU36XgMIC3kZdDlUYy2xEFpyQkWd7Ao9AI7MS+qm7TPCbsSNBulQ2TAtVt47DFfyvb 2rHJLTEmxms3qV80raVb15Ef5YlYHRpwCIAejwL3MXphMFCzgKfNnc/M/MphZcoMgV btntopHEIl6egN2Hxfn87m1H+IhrWppCwky4m3zkIqrXkZG2T894pcTMD+L6pjBS5w +ULHryOUF5DNQ== From: Leon Romanovsky To: Alex Williamson Cc: Leon Romanovsky , Christoph Hellwig , Jason Gunthorpe , Andrew Morton , Bjorn Helgaas , =?UTF-8?q?Christian=20K=C3=B6nig?= , dri-devel@lists.freedesktop.org, iommu@lists.linux.dev, Jens Axboe , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Joerg Roedel , kvm@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, linux-mm@kvack.org, linux-pci@vger.kernel.org, Logan Gunthorpe , Marek Szyprowski , Robin Murphy , Sumit Semwal , Vivek Kasireddy , Will Deacon Subject: [PATCH 01/10] PCI/P2PDMA: Remove redundant bus_offset from map state Date: Wed, 23 Jul 2025 16:00:02 +0300 Message-ID: X-Mailer: git-send-email 2.50.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Leon Romanovsky Remove the bus_off field from pci_p2pdma_map_state since it duplicates information already available in the pgmap structure. The bus_offset is only used in one location (pci_p2pdma_bus_addr_map) and is always identical to pgmap->bus_offset. Signed-off-by: Jason Gunthorpe Signed-off-by: Leon Romanovsky Reviewed-by: Christoph Hellwig --- drivers/pci/p2pdma.c | 1 - include/linux/pci-p2pdma.h | 3 +-- 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c index 8d955c25aed36..fe347ed7fd8f4 100644 --- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -1009,7 +1009,6 @@ void __pci_p2pdma_update_state(struct pci_p2pdma_map_= state *state, { state->pgmap =3D page_pgmap(page); state->map =3D pci_p2pdma_map_type(state->pgmap, dev); - state->bus_off =3D to_p2p_pgmap(state->pgmap)->bus_offset; } =20 /** diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h index 075c20b161d98..b502fc8b49bf9 100644 --- a/include/linux/pci-p2pdma.h +++ b/include/linux/pci-p2pdma.h @@ -146,7 +146,6 @@ enum pci_p2pdma_map_type { struct pci_p2pdma_map_state { struct dev_pagemap *pgmap; enum pci_p2pdma_map_type map; - u64 bus_off; }; =20 /* helper for pci_p2pdma_state(), do not use directly */ @@ -186,7 +185,7 @@ static inline dma_addr_t pci_p2pdma_bus_addr_map(struct pci_p2pdma_map_state *state, phys_addr_t pa= ddr) { WARN_ON_ONCE(state->map !=3D PCI_P2PDMA_MAP_BUS_ADDR); - return paddr + state->bus_off; + return paddr + to_p2p_pgmap(state->pgmap)->bus_offsetf; } =20 #endif /* _LINUX_PCI_P2P_H */ --=20 2.50.1 From nobody Mon Oct 6 08:22:08 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 883702E973C; Wed, 23 Jul 2025 13:02:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753275732; cv=none; b=YS6Q/1Cn4Rm/ArPZZfVWvx9Vm+LSwAM7O3To/H0JcGJZ/dFhe8bER37qmckkC18asF4YZwcwHhXna8kxKlmZcSTjRadnISJkFn1NAMAFyRbB8cmmme56sqaasV3KHYWnvgmbGP4ylsbvd5a6eWv+tT19MRgd0TCb42ALmAAb3HE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753275732; c=relaxed/simple; bh=GXlQnmpexeTqci2NeyVI9G56rXBq53z16W1gp+u+W34=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=idhVM8/XTpRiyYFsg2eirdoUaAiramzfYtRaXuSyvPRgvgcbFpktHtMqUzs75qbkszH59HKz23nVQ2muV/V97QWBaGAS5wkmRKfaxsdDhP8zzJR6pdmC2pVUmMs1B/uABR+v6nuCu0T2g+X/IARW5UIYIEx0UXIgyljM+0i76LY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=e+o46CZ+; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="e+o46CZ+" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B394AC4CEE7; Wed, 23 Jul 2025 13:02:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1753275732; bh=GXlQnmpexeTqci2NeyVI9G56rXBq53z16W1gp+u+W34=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=e+o46CZ+1BVXjwu+WWdHlJVcgB06PCQN9DTipLM1BXLZibSweX/79i0ktABh5Wsp7 WKbql3hFitkoH6uxbDjEwyXruYzw4YY5M6fSdyLtQzznD6f+h9bSbaFlVvYVZossoc m0AFtWGU9sMGGGEVCcD2Lbvp+Fli76ujFDDtB/w0BkVQlxlcs15pAQKSl30k4y9kdi 7aG1VmGwS8/feMwD4P3W0v6mLUlPwWheztEcUAB+WsxA7u5cq/tIlZpOZa/QqX3IEF oHfNuID6jKxDBaWswKnKFSNnaBx+iG59Ww1QmDUuufZ8TfQavpmmt4LIqoTDSbknte sekB5LxhhhZHA== From: Leon Romanovsky To: Alex Williamson Cc: Leon Romanovsky , Christoph Hellwig , Jason Gunthorpe , Andrew Morton , Bjorn Helgaas , =?UTF-8?q?Christian=20K=C3=B6nig?= , dri-devel@lists.freedesktop.org, iommu@lists.linux.dev, Jens Axboe , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Joerg Roedel , kvm@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, linux-mm@kvack.org, linux-pci@vger.kernel.org, Logan Gunthorpe , Marek Szyprowski , Robin Murphy , Sumit Semwal , Vivek Kasireddy , Will Deacon Subject: [PATCH 02/10] PCI/P2PDMA: Introduce p2pdma_provider structure for cleaner abstraction Date: Wed, 23 Jul 2025 16:00:03 +0300 Message-ID: X-Mailer: git-send-email 2.50.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Leon Romanovsky Extract the core P2PDMA provider information (device owner and bus offset) from the dev_pagemap into a dedicated p2pdma_provider structure. This creates a cleaner separation between the memory management layer and the P2PDMA functionality. The new p2pdma_provider structure contains: - owner: pointer to the providing device - bus_offset: computed offset for non-host transactions This refactoring simplifies the P2PDMA state management by removing the need to access pgmap internals directly. The pci_p2pdma_map_state now stores a pointer to the provider instead of the pgmap, making the API more explicit and easier to understand. Signed-off-by: Jason Gunthorpe Signed-off-by: Leon Romanovsky --- drivers/pci/p2pdma.c | 42 +++++++++++++++++++++----------------- include/linux/pci-p2pdma.h | 18 ++++++++++++---- 2 files changed, 37 insertions(+), 23 deletions(-) diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c index fe347ed7fd8f4..5a310026bd24f 100644 --- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -28,9 +28,8 @@ struct pci_p2pdma { }; =20 struct pci_p2pdma_pagemap { - struct pci_dev *provider; - u64 bus_offset; struct dev_pagemap pgmap; + struct p2pdma_provider mem; }; =20 static struct pci_p2pdma_pagemap *to_p2p_pgmap(struct dev_pagemap *pgmap) @@ -204,8 +203,8 @@ static void p2pdma_page_free(struct page *page) { struct pci_p2pdma_pagemap *pgmap =3D to_p2p_pgmap(page_pgmap(page)); /* safe to dereference while a reference is held to the percpu ref */ - struct pci_p2pdma *p2pdma =3D - rcu_dereference_protected(pgmap->provider->p2pdma, 1); + struct pci_p2pdma *p2pdma =3D rcu_dereference_protected( + to_pci_dev(pgmap->mem.owner)->p2pdma, 1); struct percpu_ref *ref; =20 gen_pool_free_owner(p2pdma->pool, (uintptr_t)page_to_virt(page), @@ -270,14 +269,15 @@ static int pci_p2pdma_setup(struct pci_dev *pdev) =20 static void pci_p2pdma_unmap_mappings(void *data) { - struct pci_dev *pdev =3D data; + struct pci_p2pdma_pagemap *p2p_pgmap =3D data; =20 /* * Removing the alloc attribute from sysfs will call * unmap_mapping_range() on the inode, teardown any existing userspace * mappings and prevent new ones from being created. */ - sysfs_remove_file_from_group(&pdev->dev.kobj, &p2pmem_alloc_attr.attr, + sysfs_remove_file_from_group(&p2p_pgmap->mem.owner->kobj, + &p2pmem_alloc_attr.attr, p2pmem_group.name); } =20 @@ -328,10 +328,9 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int = bar, size_t size, pgmap->nr_range =3D 1; pgmap->type =3D MEMORY_DEVICE_PCI_P2PDMA; pgmap->ops =3D &p2pdma_pgmap_ops; - - p2p_pgmap->provider =3D pdev; - p2p_pgmap->bus_offset =3D pci_bus_address(pdev, bar) - - pci_resource_start(pdev, bar); + p2p_pgmap->mem.owner =3D &pdev->dev; + p2p_pgmap->mem.bus_offset =3D + pci_bus_address(pdev, bar) - pci_resource_start(pdev, bar); =20 addr =3D devm_memremap_pages(&pdev->dev, pgmap); if (IS_ERR(addr)) { @@ -340,7 +339,7 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int b= ar, size_t size, } =20 error =3D devm_add_action_or_reset(&pdev->dev, pci_p2pdma_unmap_mappings, - pdev); + p2p_pgmap); if (error) goto pages_free; =20 @@ -973,16 +972,16 @@ void pci_p2pmem_publish(struct pci_dev *pdev, bool pu= blish) } EXPORT_SYMBOL_GPL(pci_p2pmem_publish); =20 -static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pg= map, - struct device *dev) +static enum pci_p2pdma_map_type +pci_p2pdma_map_type(struct p2pdma_provider *provider, struct device *dev) { enum pci_p2pdma_map_type type =3D PCI_P2PDMA_MAP_NOT_SUPPORTED; - struct pci_dev *provider =3D to_p2p_pgmap(pgmap)->provider; + struct pci_dev *pdev =3D to_pci_dev(provider->owner); struct pci_dev *client; struct pci_p2pdma *p2pdma; int dist; =20 - if (!provider->p2pdma) + if (!pdev->p2pdma) return PCI_P2PDMA_MAP_NOT_SUPPORTED; =20 if (!dev_is_pci(dev)) @@ -991,7 +990,7 @@ static enum pci_p2pdma_map_type pci_p2pdma_map_type(str= uct dev_pagemap *pgmap, client =3D to_pci_dev(dev); =20 rcu_read_lock(); - p2pdma =3D rcu_dereference(provider->p2pdma); + p2pdma =3D rcu_dereference(pdev->p2pdma); =20 if (p2pdma) type =3D xa_to_value(xa_load(&p2pdma->map_types, @@ -999,7 +998,7 @@ static enum pci_p2pdma_map_type pci_p2pdma_map_type(str= uct dev_pagemap *pgmap, rcu_read_unlock(); =20 if (type =3D=3D PCI_P2PDMA_MAP_UNKNOWN) - return calc_map_type_and_dist(provider, client, &dist, true); + return calc_map_type_and_dist(pdev, client, &dist, true); =20 return type; } @@ -1007,8 +1006,13 @@ static enum pci_p2pdma_map_type pci_p2pdma_map_type(= struct dev_pagemap *pgmap, void __pci_p2pdma_update_state(struct pci_p2pdma_map_state *state, struct device *dev, struct page *page) { - state->pgmap =3D page_pgmap(page); - state->map =3D pci_p2pdma_map_type(state->pgmap, dev); + struct pci_p2pdma_pagemap *p2p_pgmap =3D to_p2p_pgmap(page_pgmap(page)); + + if (state->mem =3D=3D &p2p_pgmap->mem) + return; + + state->mem =3D &p2p_pgmap->mem; + state->map =3D pci_p2pdma_map_type(&p2p_pgmap->mem, dev); } =20 /** diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h index b502fc8b49bf9..27a2c399f47da 100644 --- a/include/linux/pci-p2pdma.h +++ b/include/linux/pci-p2pdma.h @@ -16,6 +16,16 @@ struct block_device; struct scatterlist; =20 +/** + * struct p2pdma_provider + * + * A p2pdma provider is a range of MMIO address space available to the CPU. + */ +struct p2pdma_provider { + struct device *owner; + u64 bus_offset; +}; + #ifdef CONFIG_PCI_P2PDMA int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, u64 offset); @@ -144,10 +154,11 @@ enum pci_p2pdma_map_type { }; =20 struct pci_p2pdma_map_state { - struct dev_pagemap *pgmap; + struct p2pdma_provider *mem; enum pci_p2pdma_map_type map; }; =20 + /* helper for pci_p2pdma_state(), do not use directly */ void __pci_p2pdma_update_state(struct pci_p2pdma_map_state *state, struct device *dev, struct page *page); @@ -166,8 +177,7 @@ pci_p2pdma_state(struct pci_p2pdma_map_state *state, st= ruct device *dev, struct page *page) { if (IS_ENABLED(CONFIG_PCI_P2PDMA) && is_pci_p2pdma_page(page)) { - if (state->pgmap !=3D page_pgmap(page)) - __pci_p2pdma_update_state(state, dev, page); + __pci_p2pdma_update_state(state, dev, page); return state->map; } return PCI_P2PDMA_MAP_NONE; @@ -185,7 +195,7 @@ static inline dma_addr_t pci_p2pdma_bus_addr_map(struct pci_p2pdma_map_state *state, phys_addr_t pa= ddr) { WARN_ON_ONCE(state->map !=3D PCI_P2PDMA_MAP_BUS_ADDR); - return paddr + to_p2p_pgmap(state->pgmap)->bus_offsetf; + return paddr + state->mem->bus_offset; } =20 #endif /* _LINUX_PCI_P2P_H */ --=20 2.50.1 From nobody Mon Oct 6 08:22:08 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 662ED2E7629; Wed, 23 Jul 2025 13:02:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753275724; cv=none; b=lClwl8iBpTTEfZ0acJF2nPcj6g8hZVdDHtAUmoP2b85uairAPz1GnMOEhQmGgLKoDKQRCDOAwIqeIJhSgEaNfUu4mibN6N5iInUNnx7yHjcg/HLVGwqvyOJ81MmKQlpQjT7FzkcvgCnuR55+z9NLayBp3TE6PQKlfKRpLIP5XgA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753275724; c=relaxed/simple; bh=uuCb+V2zNc5XSrBnwxrMawyQmHST6eRwzZ5HxKmtUAU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KJJGILTG+u3bs3pa9l7K4luxWcZ9yo7srh25CGl7/Qj71+/TNPNEsB/tTSK8CgUsJfYWK5ba0bsFDfsTw/SPppaN5Y4jUOo48ZVPRcFvhumuoELQ3uhb+N/hs2yyflVO+UI9r/Qnn4CDuo0GUzVmFaHV8NkeMvCaur0kG6Y/DO0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=CMXvNedL; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="CMXvNedL" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4D21EC4CEF5; Wed, 23 Jul 2025 13:02:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1753275724; bh=uuCb+V2zNc5XSrBnwxrMawyQmHST6eRwzZ5HxKmtUAU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=CMXvNedLSRoRNIwJaHFwymWQJoLQyy5HeCJn+iVUJAlK00FGokYtqVBrXV4haQEzj 4iM1gvpoEs6p9cZdfKk01lySbx1k78BYRLrfwsTbzRoWU+Q+XH+Qd9DgCFFmNzMZs5 L4oDttcRXsp/Ta90c0k8VhrbYHR9/CeOaCy2K7cOOav1SR+AaaCAhZ8YMlUBM1+C3G YmlkbaQZzVl0iP3yUEOECQOLzv8Tu8kLt0j1LO5zFl7MasnoEyvYAVwA7owgkFzqoV AkQjTLVdtMNW/QuID+PRdNgGe47BkgzJUFzNMwQOk/GEInoTws1TVgMkin8/vgBYB5 3BbS2oJqswGwA== From: Leon Romanovsky To: Alex Williamson Cc: Leon Romanovsky , Christoph Hellwig , Jason Gunthorpe , Andrew Morton , Bjorn Helgaas , =?UTF-8?q?Christian=20K=C3=B6nig?= , dri-devel@lists.freedesktop.org, iommu@lists.linux.dev, Jens Axboe , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Joerg Roedel , kvm@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, linux-mm@kvack.org, linux-pci@vger.kernel.org, Logan Gunthorpe , Marek Szyprowski , Robin Murphy , Sumit Semwal , Vivek Kasireddy , Will Deacon Subject: [PATCH 03/10] PCI/P2PDMA: Simplify bus address mapping API Date: Wed, 23 Jul 2025 16:00:04 +0300 Message-ID: <30640b5e4ec975f928e685b92aaaf3e2e5e08f72.1753274085.git.leonro@nvidia.com> X-Mailer: git-send-email 2.50.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Leon Romanovsky Update the pci_p2pdma_bus_addr_map() function to take a direct pointer to the p2pdma_provider structure instead of the pci_p2pdma_map_state. This simplifies the API by removing the need for callers to extract the provider from the state structure. The change updates all callers across the kernel (block layer, IOMMU, DMA direct, and HMM) to pass the provider pointer directly, making the code more explicit and reducing unnecessary indirection. This also removes the runtime warning check since callers now have direct control over which provider they use. Signed-off-by: Leon Romanovsky --- block/blk-mq-dma.c | 2 +- drivers/iommu/dma-iommu.c | 4 ++-- include/linux/pci-p2pdma.h | 7 +++---- kernel/dma/direct.c | 4 ++-- mm/hmm.c | 2 +- 5 files changed, 9 insertions(+), 10 deletions(-) diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c index 37e2142be4f7d..eeac653e3f3bd 100644 --- a/block/blk-mq-dma.c +++ b/block/blk-mq-dma.c @@ -79,7 +79,7 @@ static inline bool blk_can_dma_map_iova(struct request *r= eq, =20 static bool blk_dma_map_bus(struct blk_dma_iter *iter, struct phys_vec *ve= c) { - iter->addr =3D pci_p2pdma_bus_addr_map(&iter->p2pdma, vec->paddr); + iter->addr =3D pci_p2pdma_bus_addr_map(iter->p2pdma.mem, vec->paddr); iter->len =3D vec->len; return true; } diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index cd4bc22efa966..1853a969e1978 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -1427,8 +1427,8 @@ int iommu_dma_map_sg(struct device *dev, struct scatt= erlist *sg, int nents, * as a bus address, __finalise_sg() will copy the dma * address into the output segment. */ - s->dma_address =3D pci_p2pdma_bus_addr_map(&p2pdma_state, - sg_phys(s)); + s->dma_address =3D pci_p2pdma_bus_addr_map( + p2pdma_state.mem, sg_phys(s)); sg_dma_len(s) =3D sg->length; sg_dma_mark_bus_address(s); continue; diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h index 27a2c399f47da..eef96636c67e6 100644 --- a/include/linux/pci-p2pdma.h +++ b/include/linux/pci-p2pdma.h @@ -186,16 +186,15 @@ pci_p2pdma_state(struct pci_p2pdma_map_state *state, = struct device *dev, /** * pci_p2pdma_bus_addr_map - Translate a physical address to a bus address * for a PCI_P2PDMA_MAP_BUS_ADDR transfer. - * @state: P2P state structure + * @provider: P2P provider structure * @paddr: physical address to map * * Map a physically contiguous PCI_P2PDMA_MAP_BUS_ADDR transfer. */ static inline dma_addr_t -pci_p2pdma_bus_addr_map(struct pci_p2pdma_map_state *state, phys_addr_t pa= ddr) +pci_p2pdma_bus_addr_map(struct p2pdma_provider *provider, phys_addr_t padd= r) { - WARN_ON_ONCE(state->map !=3D PCI_P2PDMA_MAP_BUS_ADDR); - return paddr + state->mem->bus_offset; + return paddr + provider->bus_offset; } =20 #endif /* _LINUX_PCI_P2P_H */ diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index fa75e30700730..de34ee5903766 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -484,8 +484,8 @@ int dma_direct_map_sg(struct device *dev, struct scatte= rlist *sgl, int nents, } break; case PCI_P2PDMA_MAP_BUS_ADDR: - sg->dma_address =3D pci_p2pdma_bus_addr_map(&p2pdma_state, - sg_phys(sg)); + sg->dma_address =3D pci_p2pdma_bus_addr_map( + p2pdma_state.mem, sg_phys(sg)); sg_dma_mark_bus_address(sg); continue; default: diff --git a/mm/hmm.c b/mm/hmm.c index 9354fae3ae06f..f9970b0e527ed 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -755,7 +755,7 @@ dma_addr_t hmm_dma_map_pfn(struct device *dev, struct h= mm_dma_map *map, break; case PCI_P2PDMA_MAP_BUS_ADDR: pfns[idx] |=3D HMM_PFN_P2PDMA_BUS | HMM_PFN_DMA_MAPPED; - return pci_p2pdma_bus_addr_map(p2pdma_state, paddr); + return pci_p2pdma_bus_addr_map(p2pdma_state->mem, paddr); default: return DMA_MAPPING_ERROR; } --=20 2.50.1 From nobody Mon Oct 6 08:22:08 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 91D922E7634; Wed, 23 Jul 2025 13:02:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753275728; cv=none; b=SHKuRxlJ1HVcaZo0y+qlOWvev4wNII+cRem79oajPlrbFqXMFI03ElQ/A/adCoU/JGZSY53vYh6+0ET+3XbxrdKnvqAN+GWJ/UBEUC3dbWjOQsXT4k1t8vGGJ13DA+jNxk889EaV/AqfsSy+s9ODs8VlF1pdvs8lIFKbTsHjD3I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753275728; c=relaxed/simple; bh=QlCxMBTRZAiP5c3NMYNkdFJpkHArvyU+1xEwTfBZhP0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mOXtFeBhgqAiocJpKMaM7ruuUVuF3oCAiAOxJUPruaq19cAnO3COuSPBPfD7QfsanVEvscC5/3cj4Q7k1HgMKbYJhHgncRHGfXj5TfFPZm1FQfaNErbX6aS9Bczb2/5WAZK4wBiRIltH71MgwR45fXWBM8Z0/ZRBnrUy8cDEQdU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=bbL7PJy7; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="bbL7PJy7" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 32C6BC4CEE7; Wed, 23 Jul 2025 13:02:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1753275728; bh=QlCxMBTRZAiP5c3NMYNkdFJpkHArvyU+1xEwTfBZhP0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=bbL7PJy7z/i/XYjufuxpMmeDaHyyR3daAHP5he1hjDMcrAag/gp4J2o2k/v8skMr3 R25bAkiEdPLj98TYfsFae7gsqoML6FEk3vEK6KtB7tf4gyjSQzojNS8lNOskp4bTx9 9Qoq8aK9CGCvELFh9o6r0NtfFvwpoxyIKNem5uT/B2PHJbhWSYi/smgEYkXOPUd31d vxUdLB2JEvLdGfC2iA6AH0Y8RDM3aJ830005MyHWxiRE6KVxWhQzqrkgM1j6ZL9yGm LYqSUfpT2Pb9LrpTlSakKBN8oyEUIMe/SmczIAeN5jjmtgOuY9bm7baIuqvXRLjOVA s3xLDVtvj6HzA== From: Leon Romanovsky To: Alex Williamson Cc: Leon Romanovsky , Christoph Hellwig , Jason Gunthorpe , Andrew Morton , Bjorn Helgaas , =?UTF-8?q?Christian=20K=C3=B6nig?= , dri-devel@lists.freedesktop.org, iommu@lists.linux.dev, Jens Axboe , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Joerg Roedel , kvm@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, linux-mm@kvack.org, linux-pci@vger.kernel.org, Logan Gunthorpe , Marek Szyprowski , Robin Murphy , Sumit Semwal , Vivek Kasireddy , Will Deacon Subject: [PATCH 04/10] PCI/P2PDMA: Refactor to separate core P2P functionality from memory allocation Date: Wed, 23 Jul 2025 16:00:05 +0300 Message-ID: X-Mailer: git-send-email 2.50.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Leon Romanovsky Refactor the PCI P2PDMA subsystem to separate the core peer-to-peer DMA functionality from the optional memory allocation layer. This creates a two-tier architecture: The core layer provides P2P mapping functionality for physical addresses based on PCI device MMIO BARs and integrates with the DMA API for mapping operations. This layer is required for all P2PDMA users. The optional upper layer provides memory allocation capabilities including gen_pool allocator, struct page support, and sysfs interface for user space access. This separation allows subsystems like VFIO to use only the core P2P mapping functionality without the overhead of memory allocation features they don't need. The core functionality is now available through the new pci_p2pdma_enable() function that returns a p2pdma_provider structure. Signed-off-by: Leon Romanovsky --- drivers/pci/p2pdma.c | 108 +++++++++++++++++++++++++------------ include/linux/pci-p2pdma.h | 5 ++ 2 files changed, 80 insertions(+), 33 deletions(-) diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c index 5a310026bd24f..8e2525618d922 100644 --- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -25,11 +25,12 @@ struct pci_p2pdma { struct gen_pool *pool; bool p2pmem_published; struct xarray map_types; + struct p2pdma_provider mem; }; =20 struct pci_p2pdma_pagemap { struct dev_pagemap pgmap; - struct p2pdma_provider mem; + struct p2pdma_provider *mem; }; =20 static struct pci_p2pdma_pagemap *to_p2p_pgmap(struct dev_pagemap *pgmap) @@ -204,7 +205,7 @@ static void p2pdma_page_free(struct page *page) struct pci_p2pdma_pagemap *pgmap =3D to_p2p_pgmap(page_pgmap(page)); /* safe to dereference while a reference is held to the percpu ref */ struct pci_p2pdma *p2pdma =3D rcu_dereference_protected( - to_pci_dev(pgmap->mem.owner)->p2pdma, 1); + to_pci_dev(pgmap->mem->owner)->p2pdma, 1); struct percpu_ref *ref; =20 gen_pool_free_owner(p2pdma->pool, (uintptr_t)page_to_virt(page), @@ -227,44 +228,77 @@ static void pci_p2pdma_release(void *data) =20 /* Flush and disable pci_alloc_p2p_mem() */ pdev->p2pdma =3D NULL; - synchronize_rcu(); + if (p2pdma->pool) + synchronize_rcu(); + xa_destroy(&p2pdma->map_types); + + if (!p2pdma->pool) + return; =20 gen_pool_destroy(p2pdma->pool); sysfs_remove_group(&pdev->dev.kobj, &p2pmem_group); - xa_destroy(&p2pdma->map_types); } =20 -static int pci_p2pdma_setup(struct pci_dev *pdev) +/** + * pci_p2pdma_enable - Enable peer-to-peer DMA support for a PCI device + * @pdev: The PCI device to enable P2PDMA for + * + * This function initializes the peer-to-peer DMA infrastructure for a PCI + * device. It allocates and sets up the necessary data structures to suppo= rt + * P2PDMA operations, including mapping type tracking. + */ +struct p2pdma_provider *pci_p2pdma_enable(struct pci_dev *pdev) { - int error =3D -ENOMEM; struct pci_p2pdma *p2p; + int ret; =20 p2p =3D devm_kzalloc(&pdev->dev, sizeof(*p2p), GFP_KERNEL); if (!p2p) - return -ENOMEM; + return ERR_PTR(-ENOMEM); =20 xa_init(&p2p->map_types); + p2p->mem.owner =3D &pdev->dev; + /* On all p2p platforms bus_offset is the same for all BARs */ + p2p->mem.bus_offset =3D + pci_bus_address(pdev, 0) - pci_resource_start(pdev, 0); =20 - p2p->pool =3D gen_pool_create(PAGE_SHIFT, dev_to_node(&pdev->dev)); - if (!p2p->pool) - goto out; + ret =3D devm_add_action_or_reset(&pdev->dev, pci_p2pdma_release, pdev); + if (ret) + goto out_p2p; =20 - error =3D devm_add_action_or_reset(&pdev->dev, pci_p2pdma_release, pdev); - if (error) - goto out_pool_destroy; + rcu_assign_pointer(pdev->p2pdma, p2p); + return &p2p->mem; =20 - error =3D sysfs_create_group(&pdev->dev.kobj, &p2pmem_group); - if (error) +out_p2p: + devm_kfree(&pdev->dev, p2p); + return ERR_PTR(ret); +} +EXPORT_SYMBOL_GPL(pci_p2pdma_enable); + +static int pci_p2pdma_setup_pool(struct pci_dev *pdev) +{ + struct pci_p2pdma *p2pdma; + int ret; + + p2pdma =3D rcu_dereference_protected(pdev->p2pdma, 1); + if (p2pdma->pool) + /* We already setup pools, do nothing, */ + return 0; + + p2pdma->pool =3D gen_pool_create(PAGE_SHIFT, dev_to_node(&pdev->dev)); + if (!p2pdma->pool) + return -ENOMEM; + + ret =3D sysfs_create_group(&pdev->dev.kobj, &p2pmem_group); + if (ret) goto out_pool_destroy; =20 - rcu_assign_pointer(pdev->p2pdma, p2p); return 0; =20 out_pool_destroy: - gen_pool_destroy(p2p->pool); -out: - devm_kfree(&pdev->dev, p2p); - return error; + gen_pool_destroy(p2pdma->pool); + p2pdma->pool =3D NULL; + return ret; } =20 static void pci_p2pdma_unmap_mappings(void *data) @@ -276,7 +310,7 @@ static void pci_p2pdma_unmap_mappings(void *data) * unmap_mapping_range() on the inode, teardown any existing userspace * mappings and prevent new ones from being created. */ - sysfs_remove_file_from_group(&p2p_pgmap->mem.owner->kobj, + sysfs_remove_file_from_group(&p2p_pgmap->mem->owner->kobj, &p2pmem_alloc_attr.attr, p2pmem_group.name); } @@ -295,6 +329,7 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int b= ar, size_t size, u64 offset) { struct pci_p2pdma_pagemap *p2p_pgmap; + struct p2pdma_provider *mem; struct dev_pagemap *pgmap; struct pci_p2pdma *p2pdma; void *addr; @@ -312,15 +347,22 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int= bar, size_t size, if (size + offset > pci_resource_len(pdev, bar)) return -EINVAL; =20 - if (!pdev->p2pdma) { - error =3D pci_p2pdma_setup(pdev); + p2pdma =3D rcu_dereference_protected(pdev->p2pdma, 1); + if (!p2pdma) { + mem =3D pci_p2pdma_enable(pdev); + if (IS_ERR(mem)) + return PTR_ERR(mem); + + error =3D pci_p2pdma_setup_pool(pdev); if (error) return error; } =20 p2p_pgmap =3D devm_kzalloc(&pdev->dev, sizeof(*p2p_pgmap), GFP_KERNEL); - if (!p2p_pgmap) - return -ENOMEM; + if (!p2p_pgmap) { + error =3D -ENOMEM; + goto free_pool; + } =20 pgmap =3D &p2p_pgmap->pgmap; pgmap->range.start =3D pci_resource_start(pdev, bar) + offset; @@ -328,9 +370,7 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int b= ar, size_t size, pgmap->nr_range =3D 1; pgmap->type =3D MEMORY_DEVICE_PCI_P2PDMA; pgmap->ops =3D &p2pdma_pgmap_ops; - p2p_pgmap->mem.owner =3D &pdev->dev; - p2p_pgmap->mem.bus_offset =3D - pci_bus_address(pdev, bar) - pci_resource_start(pdev, bar); + p2p_pgmap->mem =3D mem; =20 addr =3D devm_memremap_pages(&pdev->dev, pgmap); if (IS_ERR(addr)) { @@ -343,7 +383,6 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int b= ar, size_t size, if (error) goto pages_free; =20 - p2pdma =3D rcu_dereference_protected(pdev->p2pdma, 1); error =3D gen_pool_add_owner(p2pdma->pool, (unsigned long)addr, pci_bus_address(pdev, bar) + offset, range_len(&pgmap->range), dev_to_node(&pdev->dev), @@ -359,7 +398,10 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int = bar, size_t size, pages_free: devm_memunmap_pages(&pdev->dev, pgmap); pgmap_free: - devm_kfree(&pdev->dev, pgmap); + devm_kfree(&pdev->dev, p2p_pgmap); +free_pool: + sysfs_remove_group(&pdev->dev.kobj, &p2pmem_group); + gen_pool_destroy(p2pdma->pool); return error; } EXPORT_SYMBOL_GPL(pci_p2pdma_add_resource); @@ -1008,11 +1050,11 @@ void __pci_p2pdma_update_state(struct pci_p2pdma_ma= p_state *state, { struct pci_p2pdma_pagemap *p2p_pgmap =3D to_p2p_pgmap(page_pgmap(page)); =20 - if (state->mem =3D=3D &p2p_pgmap->mem) + if (state->mem =3D=3D p2p_pgmap->mem) return; =20 - state->mem =3D &p2p_pgmap->mem; - state->map =3D pci_p2pdma_map_type(&p2p_pgmap->mem, dev); + state->mem =3D p2p_pgmap->mem; + state->map =3D pci_p2pdma_map_type(p2p_pgmap->mem, dev); } =20 /** diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h index eef96636c67e6..83f11dc8659a7 100644 --- a/include/linux/pci-p2pdma.h +++ b/include/linux/pci-p2pdma.h @@ -27,6 +27,7 @@ struct p2pdma_provider { }; =20 #ifdef CONFIG_PCI_P2PDMA +struct p2pdma_provider *pci_p2pdma_enable(struct pci_dev *pdev); int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, u64 offset); int pci_p2pdma_distance_many(struct pci_dev *provider, struct device **cli= ents, @@ -45,6 +46,10 @@ int pci_p2pdma_enable_store(const char *page, struct pci= _dev **p2p_dev, ssize_t pci_p2pdma_enable_show(char *page, struct pci_dev *p2p_dev, bool use_p2pdma); #else /* CONFIG_PCI_P2PDMA */ +static inline struct p2pdma_provider *pci_p2pdma_enable(struct pci_dev *pd= ev) +{ + return ERR_PTR(-EOPNOTSUPP); +} static inline int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, u64 offset) { --=20 2.50.1 From nobody Mon Oct 6 08:22:08 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B64C02EA733; Wed, 23 Jul 2025 13:02:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753275740; cv=none; b=co4QcJOdekame6iElGjZjWNZKnnK1CEXesXYasjTurdriZf01hZvgAQuVsDS9v9LcMSxWkBMgurvZZYqZRk7xTVHZxJUcuPfcQraXzr+/z1qkn5Nhxc711dX1gX8yQ09mjfXN3NyeW3idlZDnVNAmUD54ucqNLw2PyFGu4uQfWs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753275740; c=relaxed/simple; bh=nlUJ1hQe0rB1BLNxXOxhzCPvBdRkQwroi+XicJov0lY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=FyMF9b1JGCGLwzkpA+BuR8df5ID4qkp5T/z++QQtvO2LNfYiuXkBV9GvkvyRIJNK1x3PT7fEaW+Lv2O1n9q1bhtOshi2mT68T1UGfEqxnrrRpOVpo65VRRaNhOnSjk41pcUhk7X10Ag/SakByeXz9W7v+UhjG7uMzcWV5ehTBuA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=lnZbBneV; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="lnZbBneV" Received: by smtp.kernel.org (Postfix) with ESMTPSA id CA817C4CEF5; Wed, 23 Jul 2025 13:02:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1753275740; bh=nlUJ1hQe0rB1BLNxXOxhzCPvBdRkQwroi+XicJov0lY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=lnZbBneVQfUXSDdBql+EZe6zUPnDCYl40gxPGQaQTpTisdj7QdzBIfdxeIS9iqGf5 6puO+R1/00D4satUiVhf4j97G1kDFuoNFKSoXuoQqzvIY0CT2wGNSAsqOGN6P9vTrP zgoCsH624S2mClgyibPDHZkjWn1v/u+99YU31N6M1oUyu6dq/jOIIU7fY+rNI8uNdk aVeid0uxsaaxYFwZecPCjTct9p66eTfDPOdhPl11oR1Ivj87NdTex/7PH36QXDPkLh zDxMqqF0YDhJFnjasAZJfayzwGKvYlBvIZpwQ4n6LdpzK1TGYThmZ+NAQDx33bCMpL RCwOpMO9Ex61w== From: Leon Romanovsky To: Alex Williamson Cc: Leon Romanovsky , Christoph Hellwig , Jason Gunthorpe , Andrew Morton , Bjorn Helgaas , =?UTF-8?q?Christian=20K=C3=B6nig?= , dri-devel@lists.freedesktop.org, iommu@lists.linux.dev, Jens Axboe , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Joerg Roedel , kvm@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, linux-mm@kvack.org, linux-pci@vger.kernel.org, Logan Gunthorpe , Marek Szyprowski , Robin Murphy , Sumit Semwal , Vivek Kasireddy , Will Deacon Subject: [PATCH 05/10] PCI/P2PDMA: Export pci_p2pdma_map_type() function Date: Wed, 23 Jul 2025 16:00:06 +0300 Message-ID: <82e62eb59afcd39b68ae143573d5ed113a92344e.1753274085.git.leonro@nvidia.com> X-Mailer: git-send-email 2.50.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Leon Romanovsky Export the pci_p2pdma_map_type() function to allow external modules and subsystems to determine the appropriate mapping type for P2PDMA transfers between a provider and target device. The function determines whether peer-to-peer DMA transfers can be done directly through PCI switches (PCI_P2PDMA_MAP_BUS_ADDR) or must go through the host bridge (PCI_P2PDMA_MAP_THRU_HOST_BRIDGE), or if the transfer is not supported at all. This export enables subsystems like VFIO to properly handle P2PDMA operations by querying the mapping type before attempting transfers, ensuring correct DMA address programming and error handling. Signed-off-by: Leon Romanovsky --- drivers/pci/p2pdma.c | 15 ++++++- include/linux/pci-p2pdma.h | 85 +++++++++++++++++++++----------------- 2 files changed, 59 insertions(+), 41 deletions(-) diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c index 8e2525618d922..326c7d88a1690 100644 --- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -1014,8 +1014,18 @@ void pci_p2pmem_publish(struct pci_dev *pdev, bool p= ublish) } EXPORT_SYMBOL_GPL(pci_p2pmem_publish); =20 -static enum pci_p2pdma_map_type -pci_p2pdma_map_type(struct p2pdma_provider *provider, struct device *dev) +/** + * pci_p2pdma_map_type - Determine the mapping type for P2PDMA transfers + * @provider: P2PDMA provider structure + * @dev: Target device for the transfer + * + * Determines how peer-to-peer DMA transfers should be mapped between + * the provider and the target device. The mapping type indicates whether + * the transfer can be done directly through PCI switches or must go + * through the host bridge. + */ +enum pci_p2pdma_map_type pci_p2pdma_map_type(struct p2pdma_provider *provi= der, + struct device *dev) { enum pci_p2pdma_map_type type =3D PCI_P2PDMA_MAP_NOT_SUPPORTED; struct pci_dev *pdev =3D to_pci_dev(provider->owner); @@ -1044,6 +1054,7 @@ pci_p2pdma_map_type(struct p2pdma_provider *provider,= struct device *dev) =20 return type; } +EXPORT_SYMBOL_GPL(pci_p2pdma_map_type); =20 void __pci_p2pdma_update_state(struct pci_p2pdma_map_state *state, struct device *dev, struct page *page) diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h index 83f11dc8659a7..dea98baee5ce2 100644 --- a/include/linux/pci-p2pdma.h +++ b/include/linux/pci-p2pdma.h @@ -26,6 +26,45 @@ struct p2pdma_provider { u64 bus_offset; }; =20 +enum pci_p2pdma_map_type { + /* + * PCI_P2PDMA_MAP_UNKNOWN: Used internally as an initial state before + * the mapping type has been calculated. Exported routines for the API + * will never return this value. + */ + PCI_P2PDMA_MAP_UNKNOWN =3D 0, + + /* + * Not a PCI P2PDMA transfer. + */ + PCI_P2PDMA_MAP_NONE, + + /* + * PCI_P2PDMA_MAP_NOT_SUPPORTED: Indicates the transaction will + * traverse the host bridge and the host bridge is not in the + * allowlist. DMA Mapping routines should return an error when + * this is returned. + */ + PCI_P2PDMA_MAP_NOT_SUPPORTED, + + /* + * PCI_P2PDMA_MAP_BUS_ADDR: Indicates that two devices can talk to + * each other directly through a PCI switch and the transaction will + * not traverse the host bridge. Such a mapping should program + * the DMA engine with PCI bus addresses. + */ + PCI_P2PDMA_MAP_BUS_ADDR, + + /* + * PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: Indicates two devices can talk + * to each other, but the transaction traverses a host bridge on the + * allowlist. In this case, a normal mapping either with CPU physical + * addresses (in the case of dma-direct) or IOVA addresses (in the + * case of IOMMUs) should be used to program the DMA engine. + */ + PCI_P2PDMA_MAP_THRU_HOST_BRIDGE, +}; + #ifdef CONFIG_PCI_P2PDMA struct p2pdma_provider *pci_p2pdma_enable(struct pci_dev *pdev); int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, @@ -45,6 +84,8 @@ int pci_p2pdma_enable_store(const char *page, struct pci_= dev **p2p_dev, bool *use_p2pdma); ssize_t pci_p2pdma_enable_show(char *page, struct pci_dev *p2p_dev, bool use_p2pdma); +enum pci_p2pdma_map_type pci_p2pdma_map_type(struct p2pdma_provider *provi= der, + struct device *dev); #else /* CONFIG_PCI_P2PDMA */ static inline struct p2pdma_provider *pci_p2pdma_enable(struct pci_dev *pd= ev) { @@ -105,6 +146,11 @@ static inline ssize_t pci_p2pdma_enable_show(char *pag= e, { return sprintf(page, "none\n"); } +static inline enum pci_p2pdma_map_type +pci_p2pdma_map_type(struct p2pdma_provider *provider, struct device *dev) +{ + return PCI_P2PDMA_MAP_NOT_SUPPORTED; +} #endif /* CONFIG_PCI_P2PDMA */ =20 =20 @@ -119,45 +165,6 @@ static inline struct pci_dev *pci_p2pmem_find(struct d= evice *client) return pci_p2pmem_find_many(&client, 1); } =20 -enum pci_p2pdma_map_type { - /* - * PCI_P2PDMA_MAP_UNKNOWN: Used internally as an initial state before - * the mapping type has been calculated. Exported routines for the API - * will never return this value. - */ - PCI_P2PDMA_MAP_UNKNOWN =3D 0, - - /* - * Not a PCI P2PDMA transfer. - */ - PCI_P2PDMA_MAP_NONE, - - /* - * PCI_P2PDMA_MAP_NOT_SUPPORTED: Indicates the transaction will - * traverse the host bridge and the host bridge is not in the - * allowlist. DMA Mapping routines should return an error when - * this is returned. - */ - PCI_P2PDMA_MAP_NOT_SUPPORTED, - - /* - * PCI_P2PDMA_MAP_BUS_ADDR: Indicates that two devices can talk to - * each other directly through a PCI switch and the transaction will - * not traverse the host bridge. Such a mapping should program - * the DMA engine with PCI bus addresses. - */ - PCI_P2PDMA_MAP_BUS_ADDR, - - /* - * PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: Indicates two devices can talk - * to each other, but the transaction traverses a host bridge on the - * allowlist. In this case, a normal mapping either with CPU physical - * addresses (in the case of dma-direct) or IOVA addresses (in the - * case of IOMMUs) should be used to program the DMA engine. - */ - PCI_P2PDMA_MAP_THRU_HOST_BRIDGE, -}; - struct pci_p2pdma_map_state { struct p2pdma_provider *mem; enum pci_p2pdma_map_type map; --=20 2.50.1 From nobody Mon Oct 6 08:22:08 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C99F02E973C; Wed, 23 Jul 2025 13:02:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753275736; cv=none; b=tdXQkm6XqAjKRRbuOQkA6YfVfzqGpHK7vPeRuC2LxvkAS7ZGC4CaQxmlUSDRbqxnenMr6z/jnZ5Q0eMa8azt1k3kiOxmjF7Gp22KC+CQg3+hGnbHVWQMB89zUe9598zbhl1iCvHet+/DWxX7GBratFUuPyDRP5B5hq8LXPNU6xo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753275736; c=relaxed/simple; bh=pIL0Fal3Z10DtV64LPo4xcXkDTiYeS1pLWRBh1Fmrp0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=AzoJIEJigO9o/f0jNQ7zV+iZ4BvwefndBsJIE3zTmhFkxYYIk5j3IUJTLNgG6AaMytahuefUXjq70UV4scfgj7LB+GesaujcwGZhexEKo9OQuMadCrcfEmjRNV0t+Ulnp9vuQ4C42osA1DkU+GfOBUuidFlbt3IyonCbUL9CryM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=P9okemrw; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="P9okemrw" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9A398C4CEE7; Wed, 23 Jul 2025 13:02:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1753275736; bh=pIL0Fal3Z10DtV64LPo4xcXkDTiYeS1pLWRBh1Fmrp0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=P9okemrwBrufuczFpKQ4z/nyvZOuAUsniIYG3cqafaRjEVFv93g9VIuiDBgki0mYx pE5Q7Y4okUQrm4OGrjVPt/WqPmzs8WpxvQy5y4AmoWiW9YvwIevxDj3bqFCe7zJ1OL jHB5YeZIv6vBqbCXx1aL1GOiXbIKbUIewNTPuHTVWFYub4JPm+R40hj37X7on8IX04 QfALWITdB8aVKkdbV1hXj0S25+yVC4ZcbptopIBhEKrks82E4rExb+hSHqt3Kd7sL6 WlQFvqGfRHlnM8Brd6WJQa5xVhrU1/FK2oXlr249aAs3HbLWwGcfYL8+6dfBu+bdhP 9EC5hFbBQkFOw== From: Leon Romanovsky To: Alex Williamson Cc: Leon Romanovsky , Christoph Hellwig , Jason Gunthorpe , Andrew Morton , Bjorn Helgaas , =?UTF-8?q?Christian=20K=C3=B6nig?= , dri-devel@lists.freedesktop.org, iommu@lists.linux.dev, Jens Axboe , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Joerg Roedel , kvm@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, linux-mm@kvack.org, linux-pci@vger.kernel.org, Logan Gunthorpe , Marek Szyprowski , Robin Murphy , Sumit Semwal , Vivek Kasireddy , Will Deacon Subject: [PATCH 06/10] types: move phys_vec definition to common header Date: Wed, 23 Jul 2025 16:00:07 +0300 Message-ID: X-Mailer: git-send-email 2.50.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Leon Romanovsky Move the struct phys_vec definition from block/blk-mq-dma.c to include/linux/types.h to make it available for use across the kernel. The phys_vec structure represents a physical address range with a length, which is used by the new physical address-based DMA mapping API. This structure is already used by the block layer and will be needed by upcoming VFIO patches for dma-buf operations. Moving this definition to types.h provides a centralized location for this common data structure and eliminates code duplication across subsystems that need to work with physical address ranges. Signed-off-by: Leon Romanovsky --- block/blk-mq-dma.c | 5 ----- include/linux/types.h | 5 +++++ 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c index eeac653e3f3bd..b0fa53c353d9d 100644 --- a/block/blk-mq-dma.c +++ b/block/blk-mq-dma.c @@ -5,11 +5,6 @@ #include #include "blk.h" =20 -struct phys_vec { - phys_addr_t paddr; - u32 len; -}; - static bool blk_map_iter_next(struct request *req, struct req_iterator *it= er, struct phys_vec *vec) { diff --git a/include/linux/types.h b/include/linux/types.h index 6dfdb8e8e4c35..2bc56681b2e62 100644 --- a/include/linux/types.h +++ b/include/linux/types.h @@ -170,6 +170,11 @@ typedef u64 phys_addr_t; typedef u32 phys_addr_t; #endif =20 +struct phys_vec { + phys_addr_t paddr; + u32 len; +}; + typedef phys_addr_t resource_size_t; =20 /* --=20 2.50.1 From nobody Mon Oct 6 08:22:08 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 231452E7F11; Wed, 23 Jul 2025 13:02:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753275754; cv=none; b=rRCCg7QXPk+61UxeVWiRl4VyD7KGM+gR3iGaLK9vFVczvwSms+i/wiBKxVsq/JjZExaLhJKlcFVPsrfdINAq1eOu2HgMXBJRKBXU0HAI/TsKFI8BNwqSKURKQZNDQ3ELTMjOJXCVjfKYLLSbOgBmSFlm9K9ae74yNd4e16909kg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753275754; c=relaxed/simple; bh=BDaDnmGJ1hOXfba5peMH/vLVvGA3352Dpn0pSCnGLcQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=aOtztIcs3C7CKYa4uoPUzEKwL/z/I218QG8kYQPvCI43JT29fBcjkP4ksrGA5jkU/AnDNcBk8oxCXTBoEk2fAROBZvsucvJwp4ap55A9zd0RpcrRbjfv0mm/X6m7EP3S2qxDD90aDedbds4OFRMBJb7d2uF+TVeK7n2mTM9glWo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=M9geGo42; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="M9geGo42" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C0BF9C4CEE7; Wed, 23 Jul 2025 13:02:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1753275753; bh=BDaDnmGJ1hOXfba5peMH/vLVvGA3352Dpn0pSCnGLcQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=M9geGo424bXd6/uG8WL7qxlYMBC2Osxy/EUu6rWhoOMyPRB3P0sBI+IWLSoGSSu2y DHoKgy21QJrxKbiT4TLE5rauohLbNtI1oJbcsqXmrSrrnDtW6q1CSvfjicb0vzsW4q 01nSwQFjX2ienHTrFSxXCw7XuSMUG8nXZd0FQpX9HNWnVxfDKhKwp5NjZ35b67Cdo4 tw9drIN8LE4pYl2eWMyH9QrLB8XG0ZtoYXJmDdICtgYVa1PwtdaJOcXtmOzkkhP3R4 crLQcIZlQjY6BsNh8WPH33PT3gcOF0zU5GS6aQ3pi4Cv+w+fHT6w7TaJQf5iTmpPto 6yR8Blx/5U8hA== From: Leon Romanovsky To: Alex Williamson Cc: Vivek Kasireddy , Christoph Hellwig , Jason Gunthorpe , Andrew Morton , Bjorn Helgaas , =?UTF-8?q?Christian=20K=C3=B6nig?= , dri-devel@lists.freedesktop.org, iommu@lists.linux.dev, Jens Axboe , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Joerg Roedel , kvm@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, linux-mm@kvack.org, linux-pci@vger.kernel.org, Logan Gunthorpe , Marek Szyprowski , Robin Murphy , Sumit Semwal , Will Deacon Subject: [PATCH 07/10] vfio: Export vfio device get and put registration helpers Date: Wed, 23 Jul 2025 16:00:08 +0300 Message-ID: <045df5fc463bbac4c669413fabd4d22e54b58c87.1753274085.git.leonro@nvidia.com> X-Mailer: git-send-email 2.50.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Vivek Kasireddy These helpers are useful for managing additional references taken on the device from other associated VFIO modules. Signed-off-by: Jason Gunthorpe Signed-off-by: Vivek Kasireddy Signed-off-by: Leon Romanovsky --- drivers/vfio/vfio_main.c | 2 ++ include/linux/vfio.h | 2 ++ 2 files changed, 4 insertions(+) diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 1fd261efc582d..620a3ee5d04db 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -171,11 +171,13 @@ void vfio_device_put_registration(struct vfio_device = *device) if (refcount_dec_and_test(&device->refcount)) complete(&device->comp); } +EXPORT_SYMBOL_GPL(vfio_device_put_registration); =20 bool vfio_device_try_get_registration(struct vfio_device *device) { return refcount_inc_not_zero(&device->refcount); } +EXPORT_SYMBOL_GPL(vfio_device_try_get_registration); =20 /* * VFIO driver API diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 707b00772ce1f..ba65bbdffd0b2 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -293,6 +293,8 @@ static inline void vfio_put_device(struct vfio_device *= device) int vfio_register_group_dev(struct vfio_device *device); int vfio_register_emulated_iommu_dev(struct vfio_device *device); void vfio_unregister_group_dev(struct vfio_device *device); +bool vfio_device_try_get_registration(struct vfio_device *device); +void vfio_device_put_registration(struct vfio_device *device); =20 int vfio_assign_device_set(struct vfio_device *device, void *set_id); unsigned int vfio_device_set_open_count(struct vfio_device_set *dev_set); --=20 2.50.1 From nobody Mon Oct 6 08:22:08 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 12B872EB5AD; Wed, 23 Jul 2025 13:02:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753275745; cv=none; b=j1WEPFuPCPbpqBxhgPPDExPDomcekNgODhYmOP0/Xzki6VzXXqvXZO6NXXdfbCckEdPqxqCVvKfFggBlQehpxvztqMMWkPjXES2huMgv22PKq0YO4wBwM4uif2leyfx9Tm51/+egO8NMVFWB6oJkz9jGOSiiUUZPoCLLXShQEzM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753275745; c=relaxed/simple; bh=6nRA10ywpK7SSRO/jFMyVT/cEz8hEByS2jaY5xZiXc8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=RIORRtyR6agCy8LYRDXIzmDJYuNC/w3SQkxqjNmFRnx+QC5mYbkHlKgIch1BmHzHEzdaBAsxG0K7k8C7XBr8PeMKPFqSC1JwfRMLuf/65y0PA8NlTznYaf3zpVLRZcmAAgOVaPEd1Tq5LBoWOMOKwTUGitDvFIivgC8IqlOhGiE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=to57+Gtb; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="to57+Gtb" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EC426C4CEE7; Wed, 23 Jul 2025 13:02:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1753275744; bh=6nRA10ywpK7SSRO/jFMyVT/cEz8hEByS2jaY5xZiXc8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=to57+Gtb/kqx61u9wLvouYUY2r3rfu/wxYrNo5FftZKZzUMaadwbeJO7Fvzil43/3 AWa8IK+h1CT0qK5BjiHdWXKAbRXPY8yBWh2yelV64TCR8e7XICKlREhYh5x5HhCzTD yv7uitdjloEM0UjQRFtyMTUY2BnLpDGa+mOPdgvSC8gki/HU0WdgMJN+OjLqJWiNa1 dfLitFrC/BRt3xcJ+C0G7LPmMwJbLTQUGj/ysphAitQuFV/pKbp8GK9fi3A47SLxl6 perEVWMTCR66ZhAhr2ZD35eqO0aHF3fNBcZl74gC3Yw/qyyElwragyVKfbzDkkcr6N yZWP5Z2A8oaMQ== From: Leon Romanovsky To: Alex Williamson Cc: Leon Romanovsky , Christoph Hellwig , Jason Gunthorpe , Andrew Morton , Bjorn Helgaas , =?UTF-8?q?Christian=20K=C3=B6nig?= , dri-devel@lists.freedesktop.org, iommu@lists.linux.dev, Jens Axboe , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Joerg Roedel , kvm@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, linux-mm@kvack.org, linux-pci@vger.kernel.org, Logan Gunthorpe , Marek Szyprowski , Robin Murphy , Sumit Semwal , Vivek Kasireddy , Will Deacon Subject: [PATCH 08/10] vfio/pci: Enable peer-to-peer DMA transactions by default Date: Wed, 23 Jul 2025 16:00:09 +0300 Message-ID: X-Mailer: git-send-email 2.50.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Leon Romanovsky Make sure that all VFIO PCI devices have peer-to-peer capabilities enables, so we would be able to export their MMIO memory through DMABUF, Signed-off-by: Leon Romanovsky --- drivers/vfio/pci/vfio_pci_core.c | 4 ++++ include/linux/vfio_pci_core.h | 1 + 2 files changed, 5 insertions(+) diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_c= ore.c index 6328c3a05bcdd..1e675daab5753 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -29,6 +29,7 @@ #include #include #include +#include #if IS_ENABLED(CONFIG_EEH) #include #endif @@ -2091,6 +2092,9 @@ int vfio_pci_core_init_dev(struct vfio_device *core_v= dev) INIT_LIST_HEAD(&vdev->dummy_resources_list); INIT_LIST_HEAD(&vdev->ioeventfds_list); INIT_LIST_HEAD(&vdev->sriov_pfs_item); + vdev->provider =3D pci_p2pdma_enable(vdev->pdev); + if (IS_ERR(vdev->provider)) + return PTR_ERR(vdev->provider); init_rwsem(&vdev->memory_lock); xa_init(&vdev->ctx); =20 diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index fbb472dd99b36..b017fae251811 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -94,6 +94,7 @@ struct vfio_pci_core_device { struct vfio_pci_core_device *sriov_pf_core_dev; struct notifier_block nb; struct rw_semaphore memory_lock; + struct p2pdma_provider *provider; }; =20 /* Will be exported for vfio pci drivers usage */ --=20 2.50.1 From nobody Mon Oct 6 08:22:08 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BC1E4221736; Wed, 23 Jul 2025 13:02:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753275749; cv=none; b=qvKjPmT+NK7nLvBD6surHKvGtuzwJH8JOFGRBqT+nstXEeiOvYs3ixRf8I6T7h1/Ivxf1lBqtDVKP1CZSoex1ZgZX6zc7sdoH0uQ6Zn/ZInxJUQXw+ju33VjUa6YjJMqwpRN2zkYIBJ1veh/xXW3mlHFls2TMZt49H17g2fWZTg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753275749; c=relaxed/simple; bh=7Kcn9ETUYUd7AsaGO4xG8HMvQRFSS/0rTVnJwHyaiSw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TbjjY1FM/dpVfhZo8PwoygC33jvhrUPmjYFfngGD/JB+drnIShNNPllN7iscKllzRdIdMU9IaLftG331rwGidrHl0S3tCtqlv0BkYsWeMoQ6hClH88fLAnG7rxW/akA9l+kWUXm4Nhx0w4ArzRFxbQrvcl/VFgZeW/HONcvv1KQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ORtsZg0P; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ORtsZg0P" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9858EC4CEE7; Wed, 23 Jul 2025 13:02:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1753275749; bh=7Kcn9ETUYUd7AsaGO4xG8HMvQRFSS/0rTVnJwHyaiSw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ORtsZg0Pr0EAkwLD8inVYan9Yg3EV9m8dNuCQ9Ux5gvKaOk5yZ6TjxaLF6k6sc0vG YMBPpdNF8O0Ok8SDlkDBaz9S2q+nXrhcV/rnHwNZ1OHWD9ijib5veFrX3/OrVL83Ok GRO1DlD2baQG8S60057KIoBaj1Kf3s06ZAIJKc93i+oCl+L1Z3prFNTNLNw8TfCAOv 6D1qkdw8RkO3tXPvgIdL8vnxk5O76IvcDBmcOCsncwrfQZMZqYa9y5IBKvJNrfRrnw HoX26giRnsA3Fi0l93YwXsppFVJesaZamwYT9P1qdg8A8iTpLcGFykRtCddZAfq7aU ss+vvycNBLTvA== From: Leon Romanovsky To: Alex Williamson Cc: Vivek Kasireddy , Christoph Hellwig , Jason Gunthorpe , Andrew Morton , Bjorn Helgaas , =?UTF-8?q?Christian=20K=C3=B6nig?= , dri-devel@lists.freedesktop.org, iommu@lists.linux.dev, Jens Axboe , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Joerg Roedel , kvm@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, linux-mm@kvack.org, linux-pci@vger.kernel.org, Logan Gunthorpe , Marek Szyprowski , Robin Murphy , Sumit Semwal , Will Deacon Subject: [PATCH 09/10] vfio/pci: Share the core device pointer while invoking feature functions Date: Wed, 23 Jul 2025 16:00:10 +0300 Message-ID: <19f71a0f4d1a5db8c712cb4d094ccf2f10dc22c5.1753274085.git.leonro@nvidia.com> X-Mailer: git-send-email 2.50.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Vivek Kasireddy There is no need to share the main device pointer (struct vfio_device *) with all the feature functions as they only need the core device pointer. Therefore, extract the core device pointer once in the caller (vfio_pci_core_ioctl_feature) and share it instead. Signed-off-by: Vivek Kasireddy Signed-off-by: Leon Romanovsky --- drivers/vfio/pci/vfio_pci_core.c | 30 +++++++++++++----------------- 1 file changed, 13 insertions(+), 17 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_c= ore.c index 1e675daab5753..5512d13bb8899 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -301,11 +301,9 @@ static int vfio_pci_runtime_pm_entry(struct vfio_pci_c= ore_device *vdev, return 0; } =20 -static int vfio_pci_core_pm_entry(struct vfio_device *device, u32 flags, +static int vfio_pci_core_pm_entry(struct vfio_pci_core_device *vdev, u32 f= lags, void __user *arg, size_t argsz) { - struct vfio_pci_core_device *vdev =3D - container_of(device, struct vfio_pci_core_device, vdev); int ret; =20 ret =3D vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, 0); @@ -322,12 +320,10 @@ static int vfio_pci_core_pm_entry(struct vfio_device = *device, u32 flags, } =20 static int vfio_pci_core_pm_entry_with_wakeup( - struct vfio_device *device, u32 flags, + struct vfio_pci_core_device *vdev, u32 flags, struct vfio_device_low_power_entry_with_wakeup __user *arg, size_t argsz) { - struct vfio_pci_core_device *vdev =3D - container_of(device, struct vfio_pci_core_device, vdev); struct vfio_device_low_power_entry_with_wakeup entry; struct eventfd_ctx *efdctx; int ret; @@ -378,11 +374,9 @@ static void vfio_pci_runtime_pm_exit(struct vfio_pci_c= ore_device *vdev) up_write(&vdev->memory_lock); } =20 -static int vfio_pci_core_pm_exit(struct vfio_device *device, u32 flags, +static int vfio_pci_core_pm_exit(struct vfio_pci_core_device *vdev, u32 fl= ags, void __user *arg, size_t argsz) { - struct vfio_pci_core_device *vdev =3D - container_of(device, struct vfio_pci_core_device, vdev); int ret; =20 ret =3D vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, 0); @@ -1475,11 +1469,10 @@ long vfio_pci_core_ioctl(struct vfio_device *core_v= dev, unsigned int cmd, } EXPORT_SYMBOL_GPL(vfio_pci_core_ioctl); =20 -static int vfio_pci_core_feature_token(struct vfio_device *device, u32 fla= gs, - uuid_t __user *arg, size_t argsz) +static int vfio_pci_core_feature_token(struct vfio_pci_core_device *vdev, + u32 flags, uuid_t __user *arg, + size_t argsz) { - struct vfio_pci_core_device *vdev =3D - container_of(device, struct vfio_pci_core_device, vdev); uuid_t uuid; int ret; =20 @@ -1506,16 +1499,19 @@ static int vfio_pci_core_feature_token(struct vfio_= device *device, u32 flags, int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags, void __user *arg, size_t argsz) { + struct vfio_pci_core_device *vdev =3D + container_of(device, struct vfio_pci_core_device, vdev); + switch (flags & VFIO_DEVICE_FEATURE_MASK) { case VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY: - return vfio_pci_core_pm_entry(device, flags, arg, argsz); + return vfio_pci_core_pm_entry(vdev, flags, arg, argsz); case VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP: - return vfio_pci_core_pm_entry_with_wakeup(device, flags, + return vfio_pci_core_pm_entry_with_wakeup(vdev, flags, arg, argsz); case VFIO_DEVICE_FEATURE_LOW_POWER_EXIT: - return vfio_pci_core_pm_exit(device, flags, arg, argsz); + return vfio_pci_core_pm_exit(vdev, flags, arg, argsz); case VFIO_DEVICE_FEATURE_PCI_VF_TOKEN: - return vfio_pci_core_feature_token(device, flags, arg, argsz); + return vfio_pci_core_feature_token(vdev, flags, arg, argsz); default: return -ENOTTY; } --=20 2.50.1 From nobody Mon Oct 6 08:22:08 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4B6432E7F1E; Wed, 23 Jul 2025 13:02:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753275758; cv=none; b=Hh1FbfT9r86g9q8AumhJvXdk2/AEzT4IlpPd9PBRFaRwL3oUgX5IVBBtnzndQ1uSTxDU51pYcc2PVonqkcJH7TaNvZJ+k9+ZVWZ4odfGWX6vItT+yz2IyV506bkphSuwYsvX9kQ9CCqIHx41zAyF7AFVW6UhOIuZOudv4FcGym8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753275758; c=relaxed/simple; bh=WQkbGmYGpObUrlKamk1PamdKVn756geIiOx0Z29EgM8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kgqofOQfjJVm4m69E/jnEvorQY0Mgo4ExYmM0WhbOr2ZiPy+mcMleNFXQJVFwhZZSEH3cpKDq7N6xc8itTYPs9w2OpQIpp1or4Xux4nm+GX2t7ar8Ba8dSfv5IEwc5lFLZ8vcFt6iWtLcUFAW33yBLUIuqN1a7XpyWUzM9f1jOY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=pe9W9Oa4; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="pe9W9Oa4" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3F958C4CEE7; Wed, 23 Jul 2025 13:02:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1753275758; bh=WQkbGmYGpObUrlKamk1PamdKVn756geIiOx0Z29EgM8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=pe9W9Oa4XbwBP6NPf2zE9sEK+qqPjus9XvPDU5y37o3WUekhj2mrUOhuj51dyWFsa PF9zYms+k72DhSdBFl4NM3EAo/y8PstbjFK62j5Kqlz5CdfP552nmZjQJw+HYyGqh1 CvAfgeROUL6jo/QoOSFYtt9s3fZA0oaYpcW0JiFGsiofprAvwRtVOvc/H0vIia7sYr v5+2zfVf6jiDXk+OjFFVzfxSW2Zmcv/NlZJhsrUUzxPetSN1M10IGX/Fh3OCIy6EAI ZJQXuhMnka4o7qlUAZwM0sd3/D2UFErS5nyo47//TrMKEzb0h32wNRQrRx1dk6wq+w 3k3oUq3WVOCxQ== From: Leon Romanovsky To: Alex Williamson Cc: Leon Romanovsky , Christoph Hellwig , Jason Gunthorpe , Andrew Morton , Bjorn Helgaas , =?UTF-8?q?Christian=20K=C3=B6nig?= , dri-devel@lists.freedesktop.org, iommu@lists.linux.dev, Jens Axboe , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Joerg Roedel , kvm@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, linux-mm@kvack.org, linux-pci@vger.kernel.org, Logan Gunthorpe , Marek Szyprowski , Robin Murphy , Sumit Semwal , Vivek Kasireddy , Will Deacon Subject: [PATCH 10/10] vfio/pci: Add dma-buf export support for MMIO regions Date: Wed, 23 Jul 2025 16:00:11 +0300 Message-ID: X-Mailer: git-send-email 2.50.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Leon Romanovsky Add support for exporting PCI device MMIO regions through dma-buf, enabling safe sharing of non-struct page memory with controlled lifetime management. This allows RDMA and other subsystems to import dma-buf FDs and build them into memory regions for PCI P2P operations. The implementation provides a revocable attachment mechanism using dma-buf move operations. MMIO regions are normally pinned as BARs don't change physical addresses, but access is revoked when the VFIO device is closed or a PCI reset is issued. This ensures kernel self-defense against potentially hostile userspace. Signed-off-by: Jason Gunthorpe Signed-off-by: Vivek Kasireddy Signed-off-by: Leon Romanovsky --- drivers/vfio/pci/Kconfig | 20 ++ drivers/vfio/pci/Makefile | 2 + drivers/vfio/pci/vfio_pci_config.c | 22 +- drivers/vfio/pci/vfio_pci_core.c | 25 ++- drivers/vfio/pci/vfio_pci_dmabuf.c | 321 +++++++++++++++++++++++++++++ drivers/vfio/pci/vfio_pci_priv.h | 23 +++ include/linux/dma-buf.h | 1 + include/linux/vfio_pci_core.h | 3 + include/uapi/linux/vfio.h | 19 ++ 9 files changed, 431 insertions(+), 5 deletions(-) create mode 100644 drivers/vfio/pci/vfio_pci_dmabuf.c diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig index 2b0172f546652..55ae888bf26ae 100644 --- a/drivers/vfio/pci/Kconfig +++ b/drivers/vfio/pci/Kconfig @@ -55,6 +55,26 @@ config VFIO_PCI_ZDEV_KVM =20 To enable s390x KVM vfio-pci extensions, say Y. =20 +config VFIO_PCI_DMABUF + bool "VFIO PCI extensions for DMA-BUF" + depends on VFIO_PCI_CORE + depends on PCI_P2PDMA && DMA_SHARED_BUFFER + default y + help + Enable support for VFIO PCI extensions that allow exporting + device MMIO regions as DMA-BUFs for peer devices to access via + peer-to-peer (P2P) DMA. + + This feature enables a VFIO-managed PCI device to export a portion + of its MMIO BAR as a DMA-BUF file descriptor, which can be passed + to other userspace drivers or kernel subsystems capable of + initiating DMA to that region. + + Say Y here if you want to enable VFIO DMABUF-based MMIO export + support for peer-to-peer DMA use cases. + + If unsure, say N. + source "drivers/vfio/pci/mlx5/Kconfig" =20 source "drivers/vfio/pci/hisilicon/Kconfig" diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile index cf00c0a7e55c8..f9155e9c5f630 100644 --- a/drivers/vfio/pci/Makefile +++ b/drivers/vfio/pci/Makefile @@ -2,7 +2,9 @@ =20 vfio-pci-core-y :=3D vfio_pci_core.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio= _pci_config.o vfio-pci-core-$(CONFIG_VFIO_PCI_ZDEV_KVM) +=3D vfio_pci_zdev.o + obj-$(CONFIG_VFIO_PCI_CORE) +=3D vfio-pci-core.o +vfio-pci-core-$(CONFIG_VFIO_PCI_DMABUF) +=3D vfio_pci_dmabuf.o =20 vfio-pci-y :=3D vfio_pci.o vfio-pci-$(CONFIG_VFIO_PCI_IGD) +=3D vfio_pci_igd.o diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci= _config.c index 8f02f236b5b4b..7e23387a43b4d 100644 --- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -589,10 +589,12 @@ static int vfio_basic_config_write(struct vfio_pci_co= re_device *vdev, int pos, virt_mem =3D !!(le16_to_cpu(*virt_cmd) & PCI_COMMAND_MEMORY); new_mem =3D !!(new_cmd & PCI_COMMAND_MEMORY); =20 - if (!new_mem) + if (!new_mem) { vfio_pci_zap_and_down_write_memory_lock(vdev); - else + vfio_pci_dma_buf_move(vdev, true); + } else { down_write(&vdev->memory_lock); + } =20 /* * If the user is writing mem/io enable (new_mem/io) and we @@ -627,6 +629,8 @@ static int vfio_basic_config_write(struct vfio_pci_core= _device *vdev, int pos, *virt_cmd &=3D cpu_to_le16(~mask); *virt_cmd |=3D cpu_to_le16(new_cmd & mask); =20 + if (__vfio_pci_memory_enabled(vdev)) + vfio_pci_dma_buf_move(vdev, false); up_write(&vdev->memory_lock); } =20 @@ -707,12 +711,16 @@ static int __init init_pci_cap_basic_perm(struct perm= _bits *perm) static void vfio_lock_and_set_power_state(struct vfio_pci_core_device *vde= v, pci_power_t state) { - if (state >=3D PCI_D3hot) + if (state >=3D PCI_D3hot) { vfio_pci_zap_and_down_write_memory_lock(vdev); - else + vfio_pci_dma_buf_move(vdev, true); + } else { down_write(&vdev->memory_lock); + } =20 vfio_pci_set_power_state(vdev, state); + if (__vfio_pci_memory_enabled(vdev)) + vfio_pci_dma_buf_move(vdev, false); up_write(&vdev->memory_lock); } =20 @@ -900,7 +908,10 @@ static int vfio_exp_config_write(struct vfio_pci_core_= device *vdev, int pos, =20 if (!ret && (cap & PCI_EXP_DEVCAP_FLR)) { vfio_pci_zap_and_down_write_memory_lock(vdev); + vfio_pci_dma_buf_move(vdev, true); pci_try_reset_function(vdev->pdev); + if (__vfio_pci_memory_enabled(vdev)) + vfio_pci_dma_buf_move(vdev, true); up_write(&vdev->memory_lock); } } @@ -982,7 +993,10 @@ static int vfio_af_config_write(struct vfio_pci_core_d= evice *vdev, int pos, =20 if (!ret && (cap & PCI_AF_CAP_FLR) && (cap & PCI_AF_CAP_TP)) { vfio_pci_zap_and_down_write_memory_lock(vdev); + vfio_pci_dma_buf_move(vdev, true); pci_try_reset_function(vdev->pdev); + if (__vfio_pci_memory_enabled(vdev)) + vfio_pci_dma_buf_move(vdev, true); up_write(&vdev->memory_lock); } } diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_c= ore.c index 5512d13bb8899..e5ab5d1cafd9c 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -29,7 +29,9 @@ #include #include #include +#ifdef CONFIG_VFIO_PCI_DMABUF #include +#endif #if IS_ENABLED(CONFIG_EEH) #include #endif @@ -288,6 +290,8 @@ static int vfio_pci_runtime_pm_entry(struct vfio_pci_co= re_device *vdev, * semaphore. */ vfio_pci_zap_and_down_write_memory_lock(vdev); + vfio_pci_dma_buf_move(vdev, true); + if (vdev->pm_runtime_engaged) { up_write(&vdev->memory_lock); return -EINVAL; @@ -371,6 +375,8 @@ static void vfio_pci_runtime_pm_exit(struct vfio_pci_co= re_device *vdev) */ down_write(&vdev->memory_lock); __vfio_pci_runtime_pm_exit(vdev); + if (__vfio_pci_memory_enabled(vdev)) + vfio_pci_dma_buf_move(vdev, false); up_write(&vdev->memory_lock); } =20 @@ -691,6 +697,8 @@ void vfio_pci_core_close_device(struct vfio_device *cor= e_vdev) #endif vfio_pci_core_disable(vdev); =20 + vfio_pci_dma_buf_cleanup(vdev); + mutex_lock(&vdev->igate); if (vdev->err_trigger) { eventfd_ctx_put(vdev->err_trigger); @@ -1223,7 +1231,10 @@ static int vfio_pci_ioctl_reset(struct vfio_pci_core= _device *vdev, */ vfio_pci_set_power_state(vdev, PCI_D0); =20 + vfio_pci_dma_buf_move(vdev, true); ret =3D pci_try_reset_function(vdev->pdev); + if (__vfio_pci_memory_enabled(vdev)) + vfio_pci_dma_buf_move(vdev, false); up_write(&vdev->memory_lock); =20 return ret; @@ -1512,6 +1523,8 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *d= evice, u32 flags, return vfio_pci_core_pm_exit(vdev, flags, arg, argsz); case VFIO_DEVICE_FEATURE_PCI_VF_TOKEN: return vfio_pci_core_feature_token(vdev, flags, arg, argsz); + case VFIO_DEVICE_FEATURE_DMA_BUF: + return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz); default: return -ENOTTY; } @@ -2088,9 +2101,13 @@ int vfio_pci_core_init_dev(struct vfio_device *core_= vdev) INIT_LIST_HEAD(&vdev->dummy_resources_list); INIT_LIST_HEAD(&vdev->ioeventfds_list); INIT_LIST_HEAD(&vdev->sriov_pfs_item); +#ifdef CONFIG_VFIO_PCI_DMABUF vdev->provider =3D pci_p2pdma_enable(vdev->pdev); if (IS_ERR(vdev->provider)) return PTR_ERR(vdev->provider); + + INIT_LIST_HEAD(&vdev->dmabufs); +#endif init_rwsem(&vdev->memory_lock); xa_init(&vdev->ctx); =20 @@ -2473,11 +2490,17 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_d= evice_set *dev_set, * cause the PCI config space reset without restoring the original * state (saved locally in 'vdev->pm_save'). */ - list_for_each_entry(vdev, &dev_set->device_list, vdev.dev_set_list) + list_for_each_entry(vdev, &dev_set->device_list, vdev.dev_set_list) { + vfio_pci_dma_buf_move(vdev, true); vfio_pci_set_power_state(vdev, PCI_D0); + } =20 ret =3D pci_reset_bus(pdev); =20 + list_for_each_entry(vdev, &dev_set->device_list, vdev.dev_set_list) + if (__vfio_pci_memory_enabled(vdev)) + vfio_pci_dma_buf_move(vdev, false); + vdev =3D list_last_entry(&dev_set->device_list, struct vfio_pci_core_device, vdev.dev_set_list); =20 diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci= _dmabuf.c new file mode 100644 index 0000000000000..5fefcdecd1329 --- /dev/null +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c @@ -0,0 +1,321 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. + */ +#include +#include +#include + +#include "vfio_pci_priv.h" + +MODULE_IMPORT_NS("DMA_BUF"); + +struct vfio_pci_dma_buf { + struct dma_buf *dmabuf; + struct vfio_pci_core_device *vdev; + struct list_head dmabufs_elm; + struct phys_vec phys_vec; + u8 revoked : 1; +}; + +static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf, + struct dma_buf_attachment *attachment) +{ + struct vfio_pci_dma_buf *priv =3D dmabuf->priv; + + if (!attachment->peer2peer) + return -EOPNOTSUPP; + + if (priv->revoked) + return -ENODEV; + + switch (pci_p2pdma_map_type(priv->vdev->provider, attachment->dev)) { + case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: + break; + case PCI_P2PDMA_MAP_BUS_ADDR: + /* + * There is no need in IOVA at all for this flow. + * We rely on attachment->priv =3D=3D NULL as a marker + * for this mode. + */ + return 0; + default: + return -EINVAL; + } + + attachment->priv =3D kzalloc(sizeof(struct dma_iova_state), GFP_KERNEL); + if (!attachment->priv) + return -ENOMEM; + + dma_iova_try_alloc(attachment->dev, attachment->priv, 0, priv->phys_vec.l= en); + return 0; +} + +static void vfio_pci_dma_buf_detach(struct dma_buf *dmabuf, + struct dma_buf_attachment *attachment) +{ + kfree(attachment->priv); +} + +static void fill_sg_entry(struct scatterlist *sgl, unsigned int length, + dma_addr_t addr) +{ + sg_set_page(sgl, NULL, length, 0); + sg_dma_address(sgl) =3D addr; + sg_dma_len(sgl) =3D length; +} + +static struct sg_table * +vfio_pci_dma_buf_map(struct dma_buf_attachment *attachment, + enum dma_data_direction dir) +{ + struct vfio_pci_dma_buf *priv =3D attachment->dmabuf->priv; + struct p2pdma_provider *provider =3D priv->vdev->provider; + struct dma_iova_state *state =3D attachment->priv; + struct phys_vec *phys_vec =3D &priv->phys_vec; + struct scatterlist *sgl; + struct sg_table *sgt; + dma_addr_t addr; + int ret; + + dma_resv_assert_held(priv->dmabuf->resv); + + sgt =3D kzalloc(sizeof(*sgt), GFP_KERNEL); + if (!sgt) + return ERR_PTR(-ENOMEM); + + ret =3D sg_alloc_table(sgt, 1, GFP_KERNEL | __GFP_ZERO); + if (ret) + goto err_kfree_sgt; + + sgl =3D sgt->sgl; + + if (!state) { + addr =3D pci_p2pdma_bus_addr_map(provider, phys_vec->paddr); + } else if (dma_use_iova(state)) { + ret =3D dma_iova_link(attachment->dev, state, phys_vec->paddr, 0, + phys_vec->len, dir, DMA_ATTR_SKIP_CPU_SYNC); + if (ret) + goto err_free_table; + + ret =3D dma_iova_sync(attachment->dev, state, 0, phys_vec->len); + if (ret) + goto err_unmap_dma; + + addr =3D state->addr; + } else { + addr =3D dma_map_phys(attachment->dev, phys_vec->paddr, + phys_vec->len, dir, DMA_ATTR_SKIP_CPU_SYNC); + ret =3D dma_mapping_error(attachment->dev, addr); + if (ret) + goto err_free_table; + } + + fill_sg_entry(sgl, phys_vec->len, addr); + return sgt; + +err_unmap_dma: + dma_iova_destroy(attachment->dev, state, phys_vec->len, dir, + DMA_ATTR_SKIP_CPU_SYNC); +err_free_table: + sg_free_table(sgt); +err_kfree_sgt: + kfree(sgt); + return ERR_PTR(ret); +} + +static void vfio_pci_dma_buf_unmap(struct dma_buf_attachment *attachment, + struct sg_table *sgt, + enum dma_data_direction dir) +{ + struct vfio_pci_dma_buf *priv =3D attachment->dmabuf->priv; + struct dma_iova_state *state =3D attachment->priv; + struct scatterlist *sgl; + int i; + + if (!state) + ; /* Do nothing */ + else if (dma_use_iova(state)) + dma_iova_destroy(attachment->dev, state, priv->phys_vec.len, + dir, DMA_ATTR_SKIP_CPU_SYNC); + else + for_each_sgtable_dma_sg(sgt, sgl, i) + dma_unmap_phys(attachment->dev, sg_dma_address(sgl), + sg_dma_len(sgl), dir, + DMA_ATTR_SKIP_CPU_SYNC); + + sg_free_table(sgt); + kfree(sgt); +} + +static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf) +{ + struct vfio_pci_dma_buf *priv =3D dmabuf->priv; + + /* + * Either this or vfio_pci_dma_buf_cleanup() will remove from the list. + * The refcount prevents both. + */ + if (priv->vdev) { + down_write(&priv->vdev->memory_lock); + list_del_init(&priv->dmabufs_elm); + up_write(&priv->vdev->memory_lock); + vfio_device_put_registration(&priv->vdev->vdev); + } + kfree(priv); +} + +static const struct dma_buf_ops vfio_pci_dmabuf_ops =3D { + .attach =3D vfio_pci_dma_buf_attach, + .detach =3D vfio_pci_dma_buf_detach, + .map_dma_buf =3D vfio_pci_dma_buf_map, + .release =3D vfio_pci_dma_buf_release, + .unmap_dma_buf =3D vfio_pci_dma_buf_unmap, +}; + +static void dma_ranges_to_p2p_phys(struct vfio_pci_dma_buf *priv, + struct vfio_device_feature_dma_buf *dma_buf) +{ + struct pci_dev *pdev =3D priv->vdev->pdev; + + priv->phys_vec.len =3D dma_buf->length; + priv->phys_vec.paddr =3D pci_resource_start(pdev, dma_buf->region_index); + priv->phys_vec.paddr +=3D dma_buf->offset; +} + +static int validate_dmabuf_input(struct vfio_pci_core_device *vdev, + struct vfio_device_feature_dma_buf *dma_buf) +{ + struct pci_dev *pdev =3D vdev->pdev; + u32 bar =3D dma_buf->region_index; + u64 offset =3D dma_buf->offset; + u64 len =3D dma_buf->length; + resource_size_t bar_size; + u64 sum; + + /* + * For PCI the region_index is the BAR number like everything else. + */ + if (bar >=3D VFIO_PCI_ROM_REGION_INDEX) + return -ENODEV; + + if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) + return -EINVAL; + + if (!PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len)) + return -EINVAL; + + bar_size =3D pci_resource_len(pdev, bar); + if (check_add_overflow(offset, len, &sum) || sum > bar_size) + return -EINVAL; + + return 0; +} + +int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 f= lags, + struct vfio_device_feature_dma_buf __user *arg, + size_t argsz) +{ + struct vfio_device_feature_dma_buf get_dma_buf =3D {}; + DEFINE_DMA_BUF_EXPORT_INFO(exp_info); + struct vfio_pci_dma_buf *priv; + int ret; + + ret =3D vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_GET, + sizeof(get_dma_buf)); + if (ret !=3D 1) + return ret; + + if (copy_from_user(&get_dma_buf, arg, sizeof(get_dma_buf))) + return -EFAULT; + + ret =3D validate_dmabuf_input(vdev, &get_dma_buf); + if (ret) + return ret; + + priv =3D kzalloc(sizeof(*priv), GFP_KERNEL); + if (!priv) + return -ENOMEM; + + priv->vdev =3D vdev; + dma_ranges_to_p2p_phys(priv, &get_dma_buf); + + if (!vfio_device_try_get_registration(&vdev->vdev)) { + ret =3D -ENODEV; + goto err_free_priv; + } + + exp_info.ops =3D &vfio_pci_dmabuf_ops; + exp_info.size =3D priv->phys_vec.len; + exp_info.flags =3D get_dma_buf.open_flags; + exp_info.priv =3D priv; + + priv->dmabuf =3D dma_buf_export(&exp_info); + if (IS_ERR(priv->dmabuf)) { + ret =3D PTR_ERR(priv->dmabuf); + goto err_dev_put; + } + + /* dma_buf_put() now frees priv */ + INIT_LIST_HEAD(&priv->dmabufs_elm); + down_write(&vdev->memory_lock); + dma_resv_lock(priv->dmabuf->resv, NULL); + priv->revoked =3D !__vfio_pci_memory_enabled(vdev); + list_add_tail(&priv->dmabufs_elm, &vdev->dmabufs); + dma_resv_unlock(priv->dmabuf->resv); + up_write(&vdev->memory_lock); + + /* + * dma_buf_fd() consumes the reference, when the file closes the dmabuf + * will be released. + */ + return dma_buf_fd(priv->dmabuf, get_dma_buf.open_flags); + +err_dev_put: + vfio_device_put_registration(&vdev->vdev); +err_free_priv: + kfree(priv); + return ret; +} + +void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked) +{ + struct vfio_pci_dma_buf *priv; + struct vfio_pci_dma_buf *tmp; + + lockdep_assert_held_write(&vdev->memory_lock); + + list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) { + if (!get_file_active(&priv->dmabuf->file)) + continue; + + if (priv->revoked !=3D revoked) { + dma_resv_lock(priv->dmabuf->resv, NULL); + priv->revoked =3D revoked; + dma_buf_move_notify(priv->dmabuf); + dma_resv_unlock(priv->dmabuf->resv); + } + dma_buf_put(priv->dmabuf); + } +} + +void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev) +{ + struct vfio_pci_dma_buf *priv; + struct vfio_pci_dma_buf *tmp; + + down_write(&vdev->memory_lock); + list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) { + if (!get_file_active(&priv->dmabuf->file)) + continue; + + dma_resv_lock(priv->dmabuf->resv, NULL); + list_del_init(&priv->dmabufs_elm); + priv->vdev =3D NULL; + priv->revoked =3D true; + dma_buf_move_notify(priv->dmabuf); + dma_resv_unlock(priv->dmabuf->resv); + vfio_device_put_registration(&vdev->vdev); + dma_buf_put(priv->dmabuf); + } + up_write(&vdev->memory_lock); +} diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_p= riv.h index a9972eacb2936..28a405f8b97c9 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -107,4 +107,27 @@ static inline bool vfio_pci_is_vga(struct pci_dev *pde= v) return (pdev->class >> 8) =3D=3D PCI_CLASS_DISPLAY_VGA; } =20 +#ifdef CONFIG_VFIO_PCI_DMABUF +int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 f= lags, + struct vfio_device_feature_dma_buf __user *arg, + size_t argsz); +void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev); +void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked= ); +#else +static inline int +vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, + struct vfio_device_feature_dma_buf __user *arg, + size_t argsz) +{ + return -ENOTTY; +} +static inline void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *v= dev) +{ +} +static inline void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, + bool revoked) +{ +} +#endif + #endif diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index d58e329ac0e71..f14b413aae48d 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -483,6 +483,7 @@ struct dma_buf_attach_ops { * @dev: device attached to the buffer. * @node: list of dma_buf_attachment, protected by dma_resv lock of the dm= abuf. * @peer2peer: true if the importer can handle peer resources without page= s. + * #state: DMA structure to provide support for physical addresses DMA int= erface * @priv: exporter specific attachment data. * @importer_ops: importer operations for this attachment, if provided * dma_buf_map/unmap_attachment() must be called with the dma_resv lock he= ld. diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index b017fae251811..548cbb51bf146 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -94,7 +94,10 @@ struct vfio_pci_core_device { struct vfio_pci_core_device *sriov_pf_core_dev; struct notifier_block nb; struct rw_semaphore memory_lock; +#ifdef CONFIG_VFIO_PCI_DMABUF struct p2pdma_provider *provider; + struct list_head dmabufs; +#endif }; =20 /* Will be exported for vfio pci drivers usage */ diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 5764f315137f9..ad8e303697f97 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -1468,6 +1468,25 @@ struct vfio_device_feature_bus_master { }; #define VFIO_DEVICE_FEATURE_BUS_MASTER 10 =20 +/** + * Upon VFIO_DEVICE_FEATURE_GET create a dma_buf fd for the + * regions selected. + * + * open_flags are the typical flags passed to open(2), eg O_RDWR, O_CLOEXE= C, + * etc. offset/length specify a slice of the region to create the dmabuf f= rom. + * nr_ranges is the total number of (P2P DMA) ranges that comprise the dma= buf. + * + * Return: The fd number on success, -1 and errno is set on failure. + */ +#define VFIO_DEVICE_FEATURE_DMA_BUF 11 + +struct vfio_device_feature_dma_buf { + __u32 region_index; + __u32 open_flags; + __u64 offset; + __u64 length; +}; + /* -------- API for Type1 VFIO IOMMU -------- */ =20 /** --=20 2.50.1