From nobody Mon Feb 9 00:47:19 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 29F8F1C84A2; Mon, 5 May 2025 07:02:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746428524; cv=none; b=EhLxTVQmp4L/r0O0/Fgtu0faFCw/WAzNSetRofaN1sZvEWaMWEod6nGdxZcnYFd3tZcHwJwnCEtGWS3PpgkmO1LpwELEs7glTfzkUbIJS91H3C+YjMfb5ubMOLFujVGVx3Z2e2mmQXpkWe1XNTenRiOEa7cbC8eWU0RA4xDelQE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746428524; c=relaxed/simple; bh=cQaIUm+FQmM/wZJB1sUOj5G5SwgnrSJNVVGXSNmOMjY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=A9lx7fRiMDGqBLkjEYdP0BifosxbHr3lqt1xvxrrnJ0rREy/aZnEo47bdev6l2kvZnIIslYAKmbUfjq6d5SUGVZG45W55QWF88rPVa6SspHChv4j3xVu4BaVral6fsEHgu41aEfyHmLcPm0OfFQtk2Dlq1rxSnO2CCGSD9lXUp0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=W7Ll5dbo; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="W7Ll5dbo" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B9938C4CEE4; Mon, 5 May 2025 07:02:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1746428523; bh=cQaIUm+FQmM/wZJB1sUOj5G5SwgnrSJNVVGXSNmOMjY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=W7Ll5dboceuJlkXHjq/byPWK9fQKavM6x3nAWN7Cw/C0jqzJyAhGYgYQhd74lkhON SYpLqFhFJYWKcbjftE/sAprgt63KgAWLSAEEouutcZMMFp63DtNv3K735QGzeBvlA7 FWCUJvWbmzK+tPm69PD1d0kObZtkIfPP+if8LNVW1rNysJKlStdh0oXx/398J5zb8o mEjUFomzShb3ijkloyALD9fZX6wqtllWuAnR1jkGdgoL28QdE33QiErllxKcdlJaME ygffGnotzg1S8rss5RqFSe7GHO3LZJnCs0LW2YJv03p3A3KitACy4Is5pfxBWANV2G /rnbV4hsjSFbw== From: Leon Romanovsky To: Cc: Christoph Hellwig , Jens Axboe , Keith Busch , Jake Edge , Jonathan Corbet , Jason Gunthorpe , Zhu Yanjun , Robin Murphy , Joerg Roedel , Will Deacon , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Andrew Morton , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, Niklas Schnelle , Chuck Lever , Luis Chamberlain , Matthew Wilcox , Dan Williams , Kanchan Joshi , Chaitanya Kulkarni , Lu Baolu , Leon Romanovsky Subject: [PATCH v11 1/9] PCI/P2PDMA: Refactor the p2pdma mapping helpers Date: Mon, 5 May 2025 10:01:38 +0300 Message-ID: X-Mailer: git-send-email 2.49.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Christoph Hellwig The current scheme with a single helper to determine the P2P status and map a scatterlist segment force users to always use the map_sg helper to DMA map, which we're trying to get away from because they are very cache inefficient. Refactor the code so that there is a single helper that checks the P2P state for a page, including the result that it is not a P2P page to simplify the callers, and a second one to perform the address translation for a bus mapped P2P transfer that does not depend on the scatterlist structure. Signed-off-by: Christoph Hellwig Reviewed-by: Logan Gunthorpe Acked-by: Bjorn Helgaas Tested-by: Jens Axboe Reviewed-by: Luis Chamberlain Reviewed-by: Lu Baolu Signed-off-by: Leon Romanovsky --- drivers/iommu/dma-iommu.c | 47 +++++++++++++++++----------------- drivers/pci/p2pdma.c | 38 ++++----------------------- include/linux/dma-map-ops.h | 51 +++++++++++++++++++++++++++++-------- kernel/dma/direct.c | 43 +++++++++++++++---------------- 4 files changed, 91 insertions(+), 88 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index a775e4dbe06f..8a89e63c5973 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -1359,7 +1359,6 @@ int iommu_dma_map_sg(struct device *dev, struct scatt= erlist *sg, int nents, struct scatterlist *s, *prev =3D NULL; int prot =3D dma_info_to_prot(dir, dev_is_dma_coherent(dev), attrs); struct pci_p2pdma_map_state p2pdma_state =3D {}; - enum pci_p2pdma_map_type map; dma_addr_t iova; size_t iova_len =3D 0; unsigned long mask =3D dma_get_seg_boundary(dev); @@ -1389,28 +1388,30 @@ int iommu_dma_map_sg(struct device *dev, struct sca= tterlist *sg, int nents, size_t s_length =3D s->length; size_t pad_len =3D (mask - iova_len + 1) & mask; =20 - if (is_pci_p2pdma_page(sg_page(s))) { - map =3D pci_p2pdma_map_segment(&p2pdma_state, dev, s); - switch (map) { - case PCI_P2PDMA_MAP_BUS_ADDR: - /* - * iommu_map_sg() will skip this segment as - * it is marked as a bus address, - * __finalise_sg() will copy the dma address - * into the output segment. - */ - continue; - case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: - /* - * Mapping through host bridge should be - * mapped with regular IOVAs, thus we - * do nothing here and continue below. - */ - break; - default: - ret =3D -EREMOTEIO; - goto out_restore_sg; - } + switch (pci_p2pdma_state(&p2pdma_state, dev, sg_page(s))) { + case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: + /* + * Mapping through host bridge should be mapped with + * regular IOVAs, thus we do nothing here and continue + * below. + */ + break; + case PCI_P2PDMA_MAP_NONE: + break; + case PCI_P2PDMA_MAP_BUS_ADDR: + /* + * iommu_map_sg() will skip this segment as it is marked + * as a bus address, __finalise_sg() will copy the dma + * address into the output segment. + */ + s->dma_address =3D pci_p2pdma_bus_addr_map(&p2pdma_state, + sg_phys(s)); + sg_dma_len(s) =3D sg->length; + sg_dma_mark_bus_address(s); + continue; + default: + ret =3D -EREMOTEIO; + goto out_restore_sg; } =20 sg_dma_address(s) =3D s_iova_off; diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c index 19214ec81fbb..8d955c25aed3 100644 --- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -1004,40 +1004,12 @@ static enum pci_p2pdma_map_type pci_p2pdma_map_type= (struct dev_pagemap *pgmap, return type; } =20 -/** - * pci_p2pdma_map_segment - map an sg segment determining the mapping type - * @state: State structure that should be declared outside of the for_each= _sg() - * loop and initialized to zero. - * @dev: DMA device that's doing the mapping operation - * @sg: scatterlist segment to map - * - * This is a helper to be used by non-IOMMU dma_map_sg() implementations w= here - * the sg segment is the same for the page_link and the dma_address. - * - * Attempt to map a single segment in an SGL with the PCI bus address. - * The segment must point to a PCI P2PDMA page and thus must be - * wrapped in a is_pci_p2pdma_page(sg_page(sg)) check. - * - * Returns the type of mapping used and maps the page if the type is - * PCI_P2PDMA_MAP_BUS_ADDR. - */ -enum pci_p2pdma_map_type -pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device *= dev, - struct scatterlist *sg) +void __pci_p2pdma_update_state(struct pci_p2pdma_map_state *state, + struct device *dev, struct page *page) { - if (state->pgmap !=3D page_pgmap(sg_page(sg))) { - state->pgmap =3D page_pgmap(sg_page(sg)); - state->map =3D pci_p2pdma_map_type(state->pgmap, dev); - state->bus_off =3D to_p2p_pgmap(state->pgmap)->bus_offset; - } - - if (state->map =3D=3D PCI_P2PDMA_MAP_BUS_ADDR) { - sg->dma_address =3D sg_phys(sg) + state->bus_off; - sg_dma_len(sg) =3D sg->length; - sg_dma_mark_bus_address(sg); - } - - return state->map; + state->pgmap =3D page_pgmap(page); + state->map =3D pci_p2pdma_map_type(state->pgmap, dev); + state->bus_off =3D to_p2p_pgmap(state->pgmap)->bus_offset; } =20 /** diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h index e172522cd936..c3086edeccc6 100644 --- a/include/linux/dma-map-ops.h +++ b/include/linux/dma-map-ops.h @@ -443,6 +443,11 @@ enum pci_p2pdma_map_type { */ PCI_P2PDMA_MAP_UNKNOWN =3D 0, =20 + /* + * Not a PCI P2PDMA transfer. + */ + PCI_P2PDMA_MAP_NONE, + /* * PCI_P2PDMA_MAP_NOT_SUPPORTED: Indicates the transaction will * traverse the host bridge and the host bridge is not in the @@ -471,21 +476,47 @@ enum pci_p2pdma_map_type { =20 struct pci_p2pdma_map_state { struct dev_pagemap *pgmap; - int map; + enum pci_p2pdma_map_type map; u64 bus_off; }; =20 -#ifdef CONFIG_PCI_P2PDMA -enum pci_p2pdma_map_type -pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device *= dev, - struct scatterlist *sg); -#else /* CONFIG_PCI_P2PDMA */ +/* helper for pci_p2pdma_state(), do not use directly */ +void __pci_p2pdma_update_state(struct pci_p2pdma_map_state *state, + struct device *dev, struct page *page); + +/** + * pci_p2pdma_state - check the P2P transfer state of a page + * @state: P2P state structure + * @dev: device to transfer to/from + * @page: page to map + * + * Check if @page is a PCI P2PDMA page, and if yes of what kind. Returns = the + * map type, and updates @state with all information needed for a P2P tran= sfer. + */ static inline enum pci_p2pdma_map_type -pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device *= dev, - struct scatterlist *sg) +pci_p2pdma_state(struct pci_p2pdma_map_state *state, struct device *dev, + struct page *page) +{ + if (IS_ENABLED(CONFIG_PCI_P2PDMA) && is_pci_p2pdma_page(page)) { + if (state->pgmap !=3D page_pgmap(page)) + __pci_p2pdma_update_state(state, dev, page); + return state->map; + } + return PCI_P2PDMA_MAP_NONE; +} + +/** + * pci_p2pdma_bus_addr_map - map a PCI_P2PDMA_MAP_BUS_ADDR P2P transfer + * @state: P2P state structure + * @paddr: physical address to map + * + * Map a physically contiguous PCI_P2PDMA_MAP_BUS_ADDR transfer. + */ +static inline dma_addr_t +pci_p2pdma_bus_addr_map(struct pci_p2pdma_map_state *state, phys_addr_t pa= ddr) { - return PCI_P2PDMA_MAP_NOT_SUPPORTED; + WARN_ON_ONCE(state->map !=3D PCI_P2PDMA_MAP_BUS_ADDR); + return paddr + state->bus_off; } -#endif /* CONFIG_PCI_P2PDMA */ =20 #endif /* _LINUX_DMA_MAP_OPS_H */ diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index b8fe0b3d0ffb..cec43cd5ed62 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -462,34 +462,33 @@ int dma_direct_map_sg(struct device *dev, struct scat= terlist *sgl, int nents, enum dma_data_direction dir, unsigned long attrs) { struct pci_p2pdma_map_state p2pdma_state =3D {}; - enum pci_p2pdma_map_type map; struct scatterlist *sg; int i, ret; =20 for_each_sg(sgl, sg, nents, i) { - if (is_pci_p2pdma_page(sg_page(sg))) { - map =3D pci_p2pdma_map_segment(&p2pdma_state, dev, sg); - switch (map) { - case PCI_P2PDMA_MAP_BUS_ADDR: - continue; - case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: - /* - * Any P2P mapping that traverses the PCI - * host bridge must be mapped with CPU physical - * address and not PCI bus addresses. This is - * done with dma_direct_map_page() below. - */ - break; - default: - ret =3D -EREMOTEIO; + switch (pci_p2pdma_state(&p2pdma_state, dev, sg_page(sg))) { + case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: + /* + * Any P2P mapping that traverses the PCI host bridge + * must be mapped with CPU physical address and not PCI + * bus addresses. + */ + break; + case PCI_P2PDMA_MAP_NONE: + sg->dma_address =3D dma_direct_map_page(dev, sg_page(sg), + sg->offset, sg->length, dir, attrs); + if (sg->dma_address =3D=3D DMA_MAPPING_ERROR) { + ret =3D -EIO; goto out_unmap; } - } - - sg->dma_address =3D dma_direct_map_page(dev, sg_page(sg), - sg->offset, sg->length, dir, attrs); - if (sg->dma_address =3D=3D DMA_MAPPING_ERROR) { - ret =3D -EIO; + break; + case PCI_P2PDMA_MAP_BUS_ADDR: + sg->dma_address =3D pci_p2pdma_bus_addr_map(&p2pdma_state, + sg_phys(sg)); + sg_dma_mark_bus_address(sg); + continue; + default: + ret =3D -EREMOTEIO; goto out_unmap; } sg_dma_len(sg) =3D sg->length; --=20 2.49.0 From nobody Mon Feb 9 00:47:19 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CFD0A1D8E10; Mon, 5 May 2025 07:02:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746428533; cv=none; b=TyfTh1a3AMCIzLefXEOTc7pcRTB/6bG0tuvzAjduPzVIe6ee1xwFrjO9/ZJvDGan9ouh86NfS9rriJSofMex5hJzrOTIJ+60xSpzg6GGRMrY6aIKr75iSp8/eCsuXcIga5wB0SnwdwdkjfQYXj/KMQTjvNHCJzc6lYFk2yQWbuI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746428533; c=relaxed/simple; bh=o2NB7hQ0vCeWU9CMSs4xzU9VH0loVcu8sfz5tFdLksQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VOSpL9O+dCV6JCKA9RyXQu3JnfEwlZRDyGEVn0Pu9cNWOlt40KqCa8eZCVl45RG5DL6FvqzwLG67B7czLH7Q1SmPfHNA/Z5cUmGKv2S1Yr80nGjCroq5iBLpMfJS97ejesbsoSGpmz1ecK45pE7jpn5A7NJyTDxbTbKAPuv1J7s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=UBGiMsZh; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="UBGiMsZh" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E3FFCC4CEE4; Mon, 5 May 2025 07:02:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1746428532; bh=o2NB7hQ0vCeWU9CMSs4xzU9VH0loVcu8sfz5tFdLksQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=UBGiMsZh02q0IoD7DcJfAn0JTe8JlE/bF1jWijDkcXehBGfgQlWXEfK2MGaYVCnkY uIQvU/ZrvJxfwRR5xKU+wwFmUv6+snZL8RtB0jUjX3sR08O10SZGLqfU0JskElbLR2 AWNH2ZD0D2RbdOeK/004Ve1GIEUATFBa5dKFwrOtWGMieg3hofmZWduu8iNN1sWrW+ HY9XhMlDbNEBRFenHGtkXavdqEs/ojy9HvVxbqgtwoEp7gQcXVZgY31VOjyhA5Lh8X JdYBapTiqH8tB/6LXgkwg10bbfjM+d6cV0JX1b952hhSsxW1g6GAxnkFsCMIFeIVWS RtV7pZMkZGq9A== From: Leon Romanovsky To: Cc: Christoph Hellwig , Jens Axboe , Keith Busch , Jake Edge , Jonathan Corbet , Jason Gunthorpe , Zhu Yanjun , Robin Murphy , Joerg Roedel , Will Deacon , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Andrew Morton , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, Niklas Schnelle , Chuck Lever , Luis Chamberlain , Matthew Wilcox , Dan Williams , Kanchan Joshi , Chaitanya Kulkarni , Lu Baolu , Leon Romanovsky Subject: [PATCH v11 2/9] dma-mapping: move the PCI P2PDMA mapping helpers to pci-p2pdma.h Date: Mon, 5 May 2025 10:01:39 +0300 Message-ID: <09b90e787d1bb16429642350515cf364cd92530f.1746424934.git.leon@kernel.org> X-Mailer: git-send-email 2.49.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Christoph Hellwig To support the upcoming non-scatterlist mapping helpers, we need to go back to have them called outside of the DMA API. Thus move them out of dma-map-ops.h, which is only for DMA API implementations to pci-p2pdma.h, which is for driver use. Note that the core helper is still not exported as the mapping is expected to be done only by very highlevel subsystem code at least for now. Signed-off-by: Christoph Hellwig Reviewed-by: Logan Gunthorpe Acked-by: Bjorn Helgaas Tested-by: Jens Axboe Reviewed-by: Luis Chamberlain Reviewed-by: Lu Baolu Signed-off-by: Leon Romanovsky --- drivers/iommu/dma-iommu.c | 1 + include/linux/dma-map-ops.h | 85 ------------------------------------- include/linux/pci-p2pdma.h | 85 +++++++++++++++++++++++++++++++++++++ kernel/dma/direct.c | 1 + 4 files changed, 87 insertions(+), 85 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 8a89e63c5973..9ba8d8bc0ce9 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include #include diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h index c3086edeccc6..f48e5fb88bd5 100644 --- a/include/linux/dma-map-ops.h +++ b/include/linux/dma-map-ops.h @@ -434,89 +434,4 @@ static inline void debug_dma_dump_mappings(struct devi= ce *dev) #endif /* CONFIG_DMA_API_DEBUG */ =20 extern const struct dma_map_ops dma_dummy_ops; - -enum pci_p2pdma_map_type { - /* - * PCI_P2PDMA_MAP_UNKNOWN: Used internally for indicating the mapping - * type hasn't been calculated yet. Functions that return this enum - * never return this value. - */ - PCI_P2PDMA_MAP_UNKNOWN =3D 0, - - /* - * Not a PCI P2PDMA transfer. - */ - PCI_P2PDMA_MAP_NONE, - - /* - * PCI_P2PDMA_MAP_NOT_SUPPORTED: Indicates the transaction will - * traverse the host bridge and the host bridge is not in the - * allowlist. DMA Mapping routines should return an error when - * this is returned. - */ - PCI_P2PDMA_MAP_NOT_SUPPORTED, - - /* - * PCI_P2PDMA_BUS_ADDR: Indicates that two devices can talk to - * each other directly through a PCI switch and the transaction will - * not traverse the host bridge. Such a mapping should program - * the DMA engine with PCI bus addresses. - */ - PCI_P2PDMA_MAP_BUS_ADDR, - - /* - * PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: Indicates two devices can talk - * to each other, but the transaction traverses a host bridge on the - * allowlist. In this case, a normal mapping either with CPU physical - * addresses (in the case of dma-direct) or IOVA addresses (in the - * case of IOMMUs) should be used to program the DMA engine. - */ - PCI_P2PDMA_MAP_THRU_HOST_BRIDGE, -}; - -struct pci_p2pdma_map_state { - struct dev_pagemap *pgmap; - enum pci_p2pdma_map_type map; - u64 bus_off; -}; - -/* helper for pci_p2pdma_state(), do not use directly */ -void __pci_p2pdma_update_state(struct pci_p2pdma_map_state *state, - struct device *dev, struct page *page); - -/** - * pci_p2pdma_state - check the P2P transfer state of a page - * @state: P2P state structure - * @dev: device to transfer to/from - * @page: page to map - * - * Check if @page is a PCI P2PDMA page, and if yes of what kind. Returns = the - * map type, and updates @state with all information needed for a P2P tran= sfer. - */ -static inline enum pci_p2pdma_map_type -pci_p2pdma_state(struct pci_p2pdma_map_state *state, struct device *dev, - struct page *page) -{ - if (IS_ENABLED(CONFIG_PCI_P2PDMA) && is_pci_p2pdma_page(page)) { - if (state->pgmap !=3D page_pgmap(page)) - __pci_p2pdma_update_state(state, dev, page); - return state->map; - } - return PCI_P2PDMA_MAP_NONE; -} - -/** - * pci_p2pdma_bus_addr_map - map a PCI_P2PDMA_MAP_BUS_ADDR P2P transfer - * @state: P2P state structure - * @paddr: physical address to map - * - * Map a physically contiguous PCI_P2PDMA_MAP_BUS_ADDR transfer. - */ -static inline dma_addr_t -pci_p2pdma_bus_addr_map(struct pci_p2pdma_map_state *state, phys_addr_t pa= ddr) -{ - WARN_ON_ONCE(state->map !=3D PCI_P2PDMA_MAP_BUS_ADDR); - return paddr + state->bus_off; -} - #endif /* _LINUX_DMA_MAP_OPS_H */ diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h index 2c07aa6b7665..075c20b161d9 100644 --- a/include/linux/pci-p2pdma.h +++ b/include/linux/pci-p2pdma.h @@ -104,4 +104,89 @@ static inline struct pci_dev *pci_p2pmem_find(struct d= evice *client) return pci_p2pmem_find_many(&client, 1); } =20 +enum pci_p2pdma_map_type { + /* + * PCI_P2PDMA_MAP_UNKNOWN: Used internally as an initial state before + * the mapping type has been calculated. Exported routines for the API + * will never return this value. + */ + PCI_P2PDMA_MAP_UNKNOWN =3D 0, + + /* + * Not a PCI P2PDMA transfer. + */ + PCI_P2PDMA_MAP_NONE, + + /* + * PCI_P2PDMA_MAP_NOT_SUPPORTED: Indicates the transaction will + * traverse the host bridge and the host bridge is not in the + * allowlist. DMA Mapping routines should return an error when + * this is returned. + */ + PCI_P2PDMA_MAP_NOT_SUPPORTED, + + /* + * PCI_P2PDMA_MAP_BUS_ADDR: Indicates that two devices can talk to + * each other directly through a PCI switch and the transaction will + * not traverse the host bridge. Such a mapping should program + * the DMA engine with PCI bus addresses. + */ + PCI_P2PDMA_MAP_BUS_ADDR, + + /* + * PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: Indicates two devices can talk + * to each other, but the transaction traverses a host bridge on the + * allowlist. In this case, a normal mapping either with CPU physical + * addresses (in the case of dma-direct) or IOVA addresses (in the + * case of IOMMUs) should be used to program the DMA engine. + */ + PCI_P2PDMA_MAP_THRU_HOST_BRIDGE, +}; + +struct pci_p2pdma_map_state { + struct dev_pagemap *pgmap; + enum pci_p2pdma_map_type map; + u64 bus_off; +}; + +/* helper for pci_p2pdma_state(), do not use directly */ +void __pci_p2pdma_update_state(struct pci_p2pdma_map_state *state, + struct device *dev, struct page *page); + +/** + * pci_p2pdma_state - check the P2P transfer state of a page + * @state: P2P state structure + * @dev: device to transfer to/from + * @page: page to map + * + * Check if @page is a PCI P2PDMA page, and if yes of what kind. Returns = the + * map type, and updates @state with all information needed for a P2P tran= sfer. + */ +static inline enum pci_p2pdma_map_type +pci_p2pdma_state(struct pci_p2pdma_map_state *state, struct device *dev, + struct page *page) +{ + if (IS_ENABLED(CONFIG_PCI_P2PDMA) && is_pci_p2pdma_page(page)) { + if (state->pgmap !=3D page_pgmap(page)) + __pci_p2pdma_update_state(state, dev, page); + return state->map; + } + return PCI_P2PDMA_MAP_NONE; +} + +/** + * pci_p2pdma_bus_addr_map - Translate a physical address to a bus address + * for a PCI_P2PDMA_MAP_BUS_ADDR transfer. + * @state: P2P state structure + * @paddr: physical address to map + * + * Map a physically contiguous PCI_P2PDMA_MAP_BUS_ADDR transfer. + */ +static inline dma_addr_t +pci_p2pdma_bus_addr_map(struct pci_p2pdma_map_state *state, phys_addr_t pa= ddr) +{ + WARN_ON_ONCE(state->map !=3D PCI_P2PDMA_MAP_BUS_ADDR); + return paddr + state->bus_off; +} + #endif /* _LINUX_PCI_P2P_H */ diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index cec43cd5ed62..24c359d9c879 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -13,6 +13,7 @@ #include #include #include +#include #include "direct.h" =20 /* --=20 2.49.0 From nobody Mon Feb 9 00:47:19 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8477E1D63E6; Mon, 5 May 2025 07:02:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746428528; cv=none; b=WfThpTODh4KidSV3U77Jq8TaXRDhEsNYfj5cFgAL6TWARhU0ehWg3FBFdvOu+m2J/4QUaOYyi80YlS6vf1U24fgZxomSS5OAAwjspiimMtUYDewDtzQChSv14rX/5iIlOHrSovKnMHDkRwEmEBZxjSdFdPQc3kS039xPXr0Tkjg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746428528; c=relaxed/simple; bh=NSjvHXdN9Auhws02PZWMgoFw38+PyYsmwJSV2Mf4l+E=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=S2iHKasIDRPdxaSeYYdxXFQnNbexAdbcbO7mpb27FanwrYMu4NJkes7qwqXBErWEKXmsKIXIT3bMUJfLTNlytITJn5m499SE1W9NCfGa9mSqYCYTGtk/WdkrD+HQIVS0Kg1LVZfgwY9ei0GFeY/cEHx1+587+d3zhgBiOqXjGxg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Pbjoh9io; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Pbjoh9io" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 29FF0C4CEEE; Mon, 5 May 2025 07:02:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1746428527; bh=NSjvHXdN9Auhws02PZWMgoFw38+PyYsmwJSV2Mf4l+E=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Pbjoh9io2FXWocROWFlAoZmys25rTcsa+c5jcyV02z+qfdxpaGggNeM5/ovA+uLIq YTfuULDJ2SCvLZa3z74b1gbNPwbmtmxEWxgHCoNMaSIsbSX6BmEEpzOLY1NYgBC/w2 ZqNacybVwcJohe2ddFlWXCQrYQdcZ9lgPr7G7K6PoqQ1UbFlvOV5FdZU0Mc/at2Ev7 c6nVY2zfL8RGLUuoADrBm2E1blIRTwIW+2NG6E3oMJWQNXzO95ASIxuLQqOawg7ENn +1y1oZwpg0HK2NwEme6kcyyZVaKiHkWLDjHxfc4f1hTJIVP0BK0vsX+q5mhRTreW+Z geAWqLOQPt77w== From: Leon Romanovsky To: Cc: Christoph Hellwig , Jens Axboe , Keith Busch , Jake Edge , Jonathan Corbet , Jason Gunthorpe , Zhu Yanjun , Robin Murphy , Joerg Roedel , Will Deacon , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Andrew Morton , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, Niklas Schnelle , Chuck Lever , Luis Chamberlain , Matthew Wilcox , Dan Williams , Kanchan Joshi , Chaitanya Kulkarni , Jason Gunthorpe , Leon Romanovsky Subject: [PATCH v11 3/9] iommu: generalize the batched sync after map interface Date: Mon, 5 May 2025 10:01:40 +0300 Message-ID: X-Mailer: git-send-email 2.49.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Christoph Hellwig For the upcoming IOVA-based DMA API we want to batch the ops->iotlb_sync_map() call after mapping multiple IOVAs from dma-iommu without having a scatterlist. Improve the API. Add a wrapper for the map_sync as iommu_sync_map() so that callers don't need to poke into the methods directly. Formalize __iommu_map() into iommu_map_nosync() which requires the caller to call iommu_sync_map() after all maps are completed. Refactor the existing sanity checks from all the different layers into iommu_map_nosync(). Signed-off-by: Christoph Hellwig Acked-by: Will Deacon Tested-by: Jens Axboe Reviewed-by: Jason Gunthorpe Reviewed-by: Luis Chamberlain Signed-off-by: Leon Romanovsky --- drivers/iommu/iommu.c | 65 +++++++++++++++++++------------------------ include/linux/iommu.h | 4 +++ 2 files changed, 33 insertions(+), 36 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 4f91a740c15f..02960585b8d4 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2443,8 +2443,8 @@ static size_t iommu_pgsize(struct iommu_domain *domai= n, unsigned long iova, return pgsize; } =20 -static int __iommu_map(struct iommu_domain *domain, unsigned long iova, - phys_addr_t paddr, size_t size, int prot, gfp_t gfp) +int iommu_map_nosync(struct iommu_domain *domain, unsigned long iova, + phys_addr_t paddr, size_t size, int prot, gfp_t gfp) { const struct iommu_domain_ops *ops =3D domain->ops; unsigned long orig_iova =3D iova; @@ -2453,12 +2453,19 @@ static int __iommu_map(struct iommu_domain *domain,= unsigned long iova, phys_addr_t orig_paddr =3D paddr; int ret =3D 0; =20 + might_sleep_if(gfpflags_allow_blocking(gfp)); + if (unlikely(!(domain->type & __IOMMU_DOMAIN_PAGING))) return -EINVAL; =20 if (WARN_ON(!ops->map_pages || domain->pgsize_bitmap =3D=3D 0UL)) return -ENODEV; =20 + /* Discourage passing strange GFP flags */ + if (WARN_ON_ONCE(gfp & (__GFP_COMP | __GFP_DMA | __GFP_DMA32 | + __GFP_HIGHMEM))) + return -EINVAL; + /* find out the minimum page size supported */ min_pagesz =3D 1 << __ffs(domain->pgsize_bitmap); =20 @@ -2506,31 +2513,27 @@ static int __iommu_map(struct iommu_domain *domain,= unsigned long iova, return ret; } =20 -int iommu_map(struct iommu_domain *domain, unsigned long iova, - phys_addr_t paddr, size_t size, int prot, gfp_t gfp) +int iommu_sync_map(struct iommu_domain *domain, unsigned long iova, size_t= size) { const struct iommu_domain_ops *ops =3D domain->ops; - int ret; - - might_sleep_if(gfpflags_allow_blocking(gfp)); =20 - /* Discourage passing strange GFP flags */ - if (WARN_ON_ONCE(gfp & (__GFP_COMP | __GFP_DMA | __GFP_DMA32 | - __GFP_HIGHMEM))) - return -EINVAL; + if (!ops->iotlb_sync_map) + return 0; + return ops->iotlb_sync_map(domain, iova, size); +} =20 - ret =3D __iommu_map(domain, iova, paddr, size, prot, gfp); - if (ret =3D=3D 0 && ops->iotlb_sync_map) { - ret =3D ops->iotlb_sync_map(domain, iova, size); - if (ret) - goto out_err; - } +int iommu_map(struct iommu_domain *domain, unsigned long iova, + phys_addr_t paddr, size_t size, int prot, gfp_t gfp) +{ + int ret; =20 - return ret; + ret =3D iommu_map_nosync(domain, iova, paddr, size, prot, gfp); + if (ret) + return ret; =20 -out_err: - /* undo mappings already done */ - iommu_unmap(domain, iova, size); + ret =3D iommu_sync_map(domain, iova, size); + if (ret) + iommu_unmap(domain, iova, size); =20 return ret; } @@ -2630,26 +2633,17 @@ ssize_t iommu_map_sg(struct iommu_domain *domain, u= nsigned long iova, struct scatterlist *sg, unsigned int nents, int prot, gfp_t gfp) { - const struct iommu_domain_ops *ops =3D domain->ops; size_t len =3D 0, mapped =3D 0; phys_addr_t start; unsigned int i =3D 0; int ret; =20 - might_sleep_if(gfpflags_allow_blocking(gfp)); - - /* Discourage passing strange GFP flags */ - if (WARN_ON_ONCE(gfp & (__GFP_COMP | __GFP_DMA | __GFP_DMA32 | - __GFP_HIGHMEM))) - return -EINVAL; - while (i <=3D nents) { phys_addr_t s_phys =3D sg_phys(sg); =20 if (len && s_phys !=3D start + len) { - ret =3D __iommu_map(domain, iova + mapped, start, + ret =3D iommu_map_nosync(domain, iova + mapped, start, len, prot, gfp); - if (ret) goto out_err; =20 @@ -2672,11 +2666,10 @@ ssize_t iommu_map_sg(struct iommu_domain *domain, u= nsigned long iova, sg =3D sg_next(sg); } =20 - if (ops->iotlb_sync_map) { - ret =3D ops->iotlb_sync_map(domain, iova, mapped); - if (ret) - goto out_err; - } + ret =3D iommu_sync_map(domain, iova, mapped); + if (ret) + goto out_err; + return mapped; =20 out_err: diff --git a/include/linux/iommu.h b/include/linux/iommu.h index ccce8a751e2a..ce472af8e9c3 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -872,6 +872,10 @@ extern struct iommu_domain *iommu_get_domain_for_dev(s= truct device *dev); extern struct iommu_domain *iommu_get_dma_domain(struct device *dev); extern int iommu_map(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot, gfp_t gfp); +int iommu_map_nosync(struct iommu_domain *domain, unsigned long iova, + phys_addr_t paddr, size_t size, int prot, gfp_t gfp); +int iommu_sync_map(struct iommu_domain *domain, unsigned long iova, + size_t size); extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size); extern size_t iommu_unmap_fast(struct iommu_domain *domain, --=20 2.49.0 From nobody Mon Feb 9 00:47:19 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DE7231DC994; Mon, 5 May 2025 07:02:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746428542; cv=none; b=Ywpw1M9h3Ctu11fbJCr1+vUHUQvvM6sSjepqaxhtJUYd8H6jN5Vbzu2eVU0eBueKP61M6bfjnkzJhMxP5pSzRG0Lz+gDpkcRjq8gnrcjAAdqo+0QuxD0iRlxZW8YYVKcedvTjsC2tKWWPikkkqC3JJ80imPRHR8C+1ZBki6HOmU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746428542; c=relaxed/simple; bh=IJofnoNmHQxgrT5KEQtJlhLI7w0UbAaAkqcCSrH3C0w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gl63OoARyYzJkQhdRkZQLzXY7ErAHztlCFfAOqq1ANLDJdv3zl3mt/8Qf5HIY724gr2e+ucpMfA97ooPVOrpk0r+dOiTOAX1ioySdgQMrWIvhFjQx6tUytwJAlpB3Y2lItPHOnyqQD1jU7ZxKtJQRxrvg9aymbfo9ObohdjdYGU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=koIjUkdj; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="koIjUkdj" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7FBF8C4CEE4; Mon, 5 May 2025 07:02:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1746428541; bh=IJofnoNmHQxgrT5KEQtJlhLI7w0UbAaAkqcCSrH3C0w=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=koIjUkdjseN17qIdTillVBxxTU12WgfbLBLnPhJv+cIP1nNZVR/W4O4kk84lrol0+ NP9ak5VHjxDx7vaYw90ux3V3L+MIjgNdOxgputMMfmao8yjC37iQuxHpmYI7FQwbt3 DG1erHZhY4F5KQA4yCmapa2FSDPhNVhpOTSbwc6J16rV9Vi6pAtg6AbC+l//X/AWYu SXLp2SMsB4Yb5gO66XMDoILFdNYhrJ07NV9ptVvdD3CLzFdqMxhiYfD6TQGibqSjvD 3wkWpjNVwfQN1mm5CaqJ6joe+M7FqqT6tXdaaMKCekTKw6EG9T949sfmZlakhHqG/T M4XxxJBKnf18g== From: Leon Romanovsky To: Cc: Leon Romanovsky , Jens Axboe , Christoph Hellwig , Keith Busch , Jake Edge , Jonathan Corbet , Jason Gunthorpe , Zhu Yanjun , Robin Murphy , Joerg Roedel , Will Deacon , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Andrew Morton , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, Niklas Schnelle , Chuck Lever , Luis Chamberlain , Matthew Wilcox , Dan Williams , Kanchan Joshi , Chaitanya Kulkarni , Jason Gunthorpe , Lu Baolu Subject: [PATCH v11 4/9] iommu: add kernel-doc for iommu_unmap_fast Date: Mon, 5 May 2025 10:01:41 +0300 Message-ID: <7535d8f4364c5413293bb963c41f18298d2344d0.1746424934.git.leon@kernel.org> X-Mailer: git-send-email 2.49.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Leon Romanovsky Add kernel-doc section for iommu_unmap_fast to document existing limitation of underlying functions which can't split individual ranges. Suggested-by: Jason Gunthorpe Acked-by: Will Deacon Reviewed-by: Christoph Hellwig Tested-by: Jens Axboe Reviewed-by: Jason Gunthorpe Reviewed-by: Luis Chamberlain Reviewed-by: Lu Baolu Signed-off-by: Leon Romanovsky --- drivers/iommu/iommu.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 02960585b8d4..8619c355ef9c 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2621,6 +2621,25 @@ size_t iommu_unmap(struct iommu_domain *domain, } EXPORT_SYMBOL_GPL(iommu_unmap); =20 +/** + * iommu_unmap_fast() - Remove mappings from a range of IOVA without IOTLB= sync + * @domain: Domain to manipulate + * @iova: IO virtual address to start + * @size: Length of the range starting from @iova + * @iotlb_gather: range information for a pending IOTLB flush + * + * iommu_unmap_fast() will remove a translation created by iommu_map(). + * It can't subdivide a mapping created by iommu_map(), so it should be + * called with IOVA ranges that match what was passed to iommu_map(). The + * range can aggregate contiguous iommu_map() calls so long as no individu= al + * range is split. + * + * Basically iommu_unmap_fast() is the same as iommu_unmap() but for calle= rs + * which manage the IOTLB flushing externally to perform a batched sync. + * + * Returns: Number of bytes of IOVA unmapped. iova + res will be the point + * unmapping stopped. + */ size_t iommu_unmap_fast(struct iommu_domain *domain, unsigned long iova, size_t size, struct iommu_iotlb_gather *iotlb_gather) --=20 2.49.0 From nobody Mon Feb 9 00:47:19 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7A0761D8E10; Mon, 5 May 2025 07:02:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746428537; cv=none; b=YYCzQfBBLECKlqE+aSovNH11rg3noIJsczs9eQXzpg4PcFg0PqMzVW2kGIYZh/OPbyW9apOam2nlh6kpwdymBFDTSFnGcHHq3ZKBfWFWpv8ivy2XwQwzTtq0wEbxOxmBggr6ll34MjioqGfWtAi+prrYOIgGBjNl+SV7KwTtcDk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746428537; c=relaxed/simple; bh=sqmkRxicYJxF1SYtskeYCHqqOrD/BpTRn7fQVhqbMqA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lDijq5q1oia0eb+6b/4Ygksoyt9SA+HC0x6P/SS1FIBe6VnKXzvQG2/2RZW3ZF76rtZTLIEjDQB5XS8UyJw2g3MVJRSx6H0CyflDicoOjjbseD+V6nCTOPDVDJApNDmfbvvNJ9vEnM0tR4Kj1F/pnmGi4cKntFc7oPVB+h0ZLaU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=VBa15hKL; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="VBa15hKL" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 24A96C4CEE4; Mon, 5 May 2025 07:02:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1746428536; bh=sqmkRxicYJxF1SYtskeYCHqqOrD/BpTRn7fQVhqbMqA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=VBa15hKLluHSENRQz9wdx4LLSeftr/H0LMOsMSDrCQdf5Ha6WE2N+KSj9tEDRxPEd Tbez50sNxvl9+sDhc/wxAak8u4puWWWjIMNfVEWCvkoLo0AZ1860Y7TX1eFUlpklQU vdVj5x/10BBl/3h9rXJcSADpUk6jqQiXE0V7b6tM9yph2HZ0ErMX52lQ6ithFcfUrN 1hVWnDthNUTZPG4ft3VxPal3btjy1YigIsl/1KVl2fbQ9g0v+Sxqv4V0fP7fOtNl+9 DzC/WA+/nfOKEQc11urOJnIfmD7eG/PzUibjglUkfH35n0+AQcKSKSopBXKTnB/xih +I7VX0sOOI40Q== From: Leon Romanovsky To: Cc: Leon Romanovsky , Jens Axboe , Christoph Hellwig , Keith Busch , Jake Edge , Jonathan Corbet , Jason Gunthorpe , Zhu Yanjun , Robin Murphy , Joerg Roedel , Will Deacon , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Andrew Morton , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, Niklas Schnelle , Chuck Lever , Luis Chamberlain , Matthew Wilcox , Dan Williams , Kanchan Joshi , Chaitanya Kulkarni Subject: [PATCH v11 5/9] dma-mapping: Provide an interface to allow allocate IOVA Date: Mon, 5 May 2025 10:01:42 +0300 Message-ID: <0b56de6f3e50550a14fd21c98ab3a37d8668cd65.1746424934.git.leon@kernel.org> X-Mailer: git-send-email 2.49.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Leon Romanovsky The existing .map_pages() callback provides both allocating of IOVA and linking DMA pages. That combination works great for most of the callers who use it in control paths, but is less effective in fast paths where there may be multiple calls to map_page(). These advanced callers already manage their data in some sort of database and can perform IOVA allocation in advance, leaving range linkage operation to be in fast path. Provide an interface to allocate/deallocate IOVA and next patch link/unlink DMA ranges to that specific IOVA. In the new API a DMA mapping transaction is identified by a struct dma_iova_state, which holds some recomputed information for the transaction which does not change for each page being mapped, so add a check if IOVA can be used for the specific transaction. The API is exported from dma-iommu as it is the only implementation supported, the namespace is clearly different from iommu_* functions which are not allowed to be used. This code layout allows us to save function call per API call used in datapath as well as a lot of boilerplate code. Reviewed-by: Christoph Hellwig Tested-by: Jens Axboe Reviewed-by: Luis Chamberlain Signed-off-by: Leon Romanovsky --- drivers/iommu/dma-iommu.c | 86 +++++++++++++++++++++++++++++++++++++ include/linux/dma-mapping.h | 48 +++++++++++++++++++++ 2 files changed, 134 insertions(+) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 9ba8d8bc0ce9..d3211a8d755e 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -1723,6 +1723,92 @@ size_t iommu_dma_max_mapping_size(struct device *dev) return SIZE_MAX; } =20 +/** + * dma_iova_try_alloc - Try to allocate an IOVA space + * @dev: Device to allocate the IOVA space for + * @state: IOVA state + * @phys: physical address + * @size: IOVA size + * + * Check if @dev supports the IOVA-based DMA API, and if yes allocate IOVA= space + * for the given base address and size. + * + * Note: @phys is only used to calculate the IOVA alignment. Callers that = always + * do PAGE_SIZE aligned transfers can safely pass 0 here. + * + * Returns %true if the IOVA-based DMA API can be used and IOVA space has = been + * allocated, or %false if the regular DMA API should be used. + */ +bool dma_iova_try_alloc(struct device *dev, struct dma_iova_state *state, + phys_addr_t phys, size_t size) +{ + struct iommu_dma_cookie *cookie; + struct iommu_domain *domain; + struct iova_domain *iovad; + size_t iova_off; + dma_addr_t addr; + + memset(state, 0, sizeof(*state)); + if (!use_dma_iommu(dev)) + return false; + + domain =3D iommu_get_dma_domain(dev); + cookie =3D domain->iova_cookie; + iovad =3D &cookie->iovad; + iova_off =3D iova_offset(iovad, phys); + + if (static_branch_unlikely(&iommu_deferred_attach_enabled) && + iommu_deferred_attach(dev, iommu_get_domain_for_dev(dev))) + return false; + + if (WARN_ON_ONCE(!size)) + return false; + + /* + * DMA_IOVA_USE_SWIOTLB is flag which is set by dma-iommu + * internals, make sure that caller didn't set it and/or + * didn't use this interface to map SIZE_MAX. + */ + if (WARN_ON_ONCE((u64)size & DMA_IOVA_USE_SWIOTLB)) + return false; + + addr =3D iommu_dma_alloc_iova(domain, + iova_align(iovad, size + iova_off), + dma_get_mask(dev), dev); + if (!addr) + return false; + + state->addr =3D addr + iova_off; + state->__size =3D size; + return true; +} +EXPORT_SYMBOL_GPL(dma_iova_try_alloc); + +/** + * dma_iova_free - Free an IOVA space + * @dev: Device to free the IOVA space for + * @state: IOVA state + * + * Undoes a successful dma_try_iova_alloc(). + * + * Note that all dma_iova_link() calls need to be undone first. For calle= rs + * that never call dma_iova_unlink(), dma_iova_destroy() can be used inste= ad + * which unlinks all ranges and frees the IOVA space in a single efficient + * operation. + */ +void dma_iova_free(struct device *dev, struct dma_iova_state *state) +{ + struct iommu_domain *domain =3D iommu_get_dma_domain(dev); + struct iommu_dma_cookie *cookie =3D domain->iova_cookie; + struct iova_domain *iovad =3D &cookie->iovad; + size_t iova_start_pad =3D iova_offset(iovad, state->addr); + size_t size =3D dma_iova_size(state); + + iommu_dma_free_iova(domain, state->addr - iova_start_pad, + iova_align(iovad, size + iova_start_pad), NULL); +} +EXPORT_SYMBOL_GPL(dma_iova_free); + void iommu_setup_dma_ops(struct device *dev) { struct iommu_domain *domain =3D iommu_get_domain_for_dev(dev); diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index b79925b1c433..de7f73810d54 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -72,6 +72,22 @@ =20 #define DMA_BIT_MASK(n) (((n) =3D=3D 64) ? ~0ULL : ((1ULL<<(n))-1)) =20 +struct dma_iova_state { + dma_addr_t addr; + u64 __size; +}; + +/* + * Use the high bit to mark if we used swiotlb for one or more ranges. + */ +#define DMA_IOVA_USE_SWIOTLB (1ULL << 63) + +static inline size_t dma_iova_size(struct dma_iova_state *state) +{ + /* Casting is needed for 32-bits systems */ + return (size_t)(state->__size & ~DMA_IOVA_USE_SWIOTLB); +} + #ifdef CONFIG_DMA_API_DEBUG void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr); void debug_dma_map_single(struct device *dev, const void *addr, @@ -277,6 +293,38 @@ static inline int dma_mmap_noncontiguous(struct device= *dev, } #endif /* CONFIG_HAS_DMA */ =20 +#ifdef CONFIG_IOMMU_DMA +/** + * dma_use_iova - check if the IOVA API is used for this state + * @state: IOVA state + * + * Return %true if the DMA transfers uses the dma_iova_*() calls or %false= if + * they can't be used. + */ +static inline bool dma_use_iova(struct dma_iova_state *state) +{ + return state->__size !=3D 0; +} + +bool dma_iova_try_alloc(struct device *dev, struct dma_iova_state *state, + phys_addr_t phys, size_t size); +void dma_iova_free(struct device *dev, struct dma_iova_state *state); +#else /* CONFIG_IOMMU_DMA */ +static inline bool dma_use_iova(struct dma_iova_state *state) +{ + return false; +} +static inline bool dma_iova_try_alloc(struct device *dev, + struct dma_iova_state *state, phys_addr_t phys, size_t size) +{ + return false; +} +static inline void dma_iova_free(struct device *dev, + struct dma_iova_state *state) +{ +} +#endif /* CONFIG_IOMMU_DMA */ + #if defined(CONFIG_HAS_DMA) && defined(CONFIG_DMA_NEED_SYNC) void __dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr, size_t= size, enum dma_data_direction dir); --=20 2.49.0 From nobody Mon Feb 9 00:47:19 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 83AAA1DE3DF; Mon, 5 May 2025 07:02:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746428554; cv=none; b=EbP6F/H3nL39HBfXO5rYRmp04j0KVheZGa1XZcejTOKMTtFS0AdRx7AGkkr5JOTE1+qHa+HmIJ7PXepUhg7fqODk1JS1rsdXHEb13o4vAkfGd9ecV+ypwxIIZcFZ9IgiHqMWCwWqzC19rBMEl2/CLdy2Hr/hnMi1XRRNZkC/UX4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746428554; c=relaxed/simple; bh=FHhYCIQH3fyl1RZsVHoG+qc1CnBKO2xtISEdd+41zhA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=GBUJ5lRdSqWyFaPot5FZ7bXOk3nuQHvPVSqqth7t0swf3Y2XNbnKdhP8aUIe4FjfrC72E4/EDKldItJvuGG14czvOW5dnTpHXXf+REGnAeJGKf3lx3ME6HRFRpNMj19eHLZn0FFAOIa/RuPeH+qfvDLHpohT62mBNFKJJB7Lcwc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=rya3XWBx; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="rya3XWBx" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 18AECC4CEEE; Mon, 5 May 2025 07:02:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1746428554; bh=FHhYCIQH3fyl1RZsVHoG+qc1CnBKO2xtISEdd+41zhA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=rya3XWBxNoSpNlJz32e3z2UANL3UGlWjrmlMLD1i8sq7XwjQL5SolDu5qj6zHwrST p+e95Ie6o3sLtUvfK1cVVr5ALs36JL8DDHOYgv+WCCwrNiAv1/58RBHCh7g7h/4t/J /MTaH3EZouT7FyjGILcWE/K7nxeP4PdOk0+rU4wWi4XWtJ4R9LqVMPMrM+D3lAQ854 mmKer1TMEPeKi7N4+02LDgeKQGYLidTwaXjUvznHGQMZRrOCCdHX3eWH3oXd5AY6B6 JoTX4+ac1NY6xVYXVe4J6Npm0rBh6bRObr36/EQEdIn54YZ84ngMEXiDEHUZ2oHK21 fyThRUNwjlYkg== From: Leon Romanovsky To: Cc: Christoph Hellwig , Jens Axboe , Keith Busch , Jake Edge , Jonathan Corbet , Jason Gunthorpe , Zhu Yanjun , Robin Murphy , Joerg Roedel , Will Deacon , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Andrew Morton , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, Niklas Schnelle , Chuck Lever , Luis Chamberlain , Matthew Wilcox , Dan Williams , Kanchan Joshi , Chaitanya Kulkarni , Lu Baolu , Leon Romanovsky Subject: [PATCH v11 6/9] iommu/dma: Factor out a iommu_dma_map_swiotlb helper Date: Mon, 5 May 2025 10:01:43 +0300 Message-ID: <6e45705027d0a90014dc253aedaee92db7f4be1f.1746424934.git.leon@kernel.org> X-Mailer: git-send-email 2.49.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Christoph Hellwig Split the iommu logic from iommu_dma_map_page into a separate helper. This not only keeps the code neatly separated, but will also allow for reuse in another caller. Signed-off-by: Christoph Hellwig Tested-by: Jens Axboe Reviewed-by: Luis Chamberlain Reviewed-by: Lu Baolu Signed-off-by: Leon Romanovsky --- drivers/iommu/dma-iommu.c | 73 ++++++++++++++++++++++----------------- 1 file changed, 41 insertions(+), 32 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index d3211a8d755e..d7684024c439 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -1138,6 +1138,43 @@ void iommu_dma_sync_sg_for_device(struct device *dev= , struct scatterlist *sgl, arch_sync_dma_for_device(sg_phys(sg), sg->length, dir); } =20 +static phys_addr_t iommu_dma_map_swiotlb(struct device *dev, phys_addr_t p= hys, + size_t size, enum dma_data_direction dir, unsigned long attrs) +{ + struct iommu_domain *domain =3D iommu_get_dma_domain(dev); + struct iova_domain *iovad =3D &domain->iova_cookie->iovad; + + if (!is_swiotlb_active(dev)) { + dev_warn_once(dev, "DMA bounce buffers are inactive, unable to map unali= gned transaction.\n"); + return (phys_addr_t)DMA_MAPPING_ERROR; + } + + trace_swiotlb_bounced(dev, phys, size); + + phys =3D swiotlb_tbl_map_single(dev, phys, size, iova_mask(iovad), dir, + attrs); + + /* + * Untrusted devices should not see padding areas with random leftover + * kernel data, so zero the pre- and post-padding. + * swiotlb_tbl_map_single() has initialized the bounce buffer proper to + * the contents of the original memory buffer. + */ + if (phys !=3D (phys_addr_t)DMA_MAPPING_ERROR && dev_is_untrusted(dev)) { + size_t start, virt =3D (size_t)phys_to_virt(phys); + + /* Pre-padding */ + start =3D iova_align_down(iovad, virt); + memset((void *)start, 0, virt - start); + + /* Post-padding */ + start =3D virt + size; + memset((void *)start, 0, iova_align(iovad, start) - start); + } + + return phys; +} + dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page, unsigned long offset, size_t size, enum dma_data_direction dir, unsigned long attrs) @@ -1151,42 +1188,14 @@ dma_addr_t iommu_dma_map_page(struct device *dev, s= truct page *page, dma_addr_t iova, dma_mask =3D dma_get_mask(dev); =20 /* - * If both the physical buffer start address and size are - * page aligned, we don't need to use a bounce page. + * If both the physical buffer start address and size are page aligned, + * we don't need to use a bounce page. */ if (dev_use_swiotlb(dev, size, dir) && iova_offset(iovad, phys | size)) { - if (!is_swiotlb_active(dev)) { - dev_warn_once(dev, "DMA bounce buffers are inactive, unable to map unal= igned transaction.\n"); - return DMA_MAPPING_ERROR; - } - - trace_swiotlb_bounced(dev, phys, size); - - phys =3D swiotlb_tbl_map_single(dev, phys, size, - iova_mask(iovad), dir, attrs); - - if (phys =3D=3D DMA_MAPPING_ERROR) + phys =3D iommu_dma_map_swiotlb(dev, phys, size, dir, attrs); + if (phys =3D=3D (phys_addr_t)DMA_MAPPING_ERROR) return DMA_MAPPING_ERROR; - - /* - * Untrusted devices should not see padding areas with random - * leftover kernel data, so zero the pre- and post-padding. - * swiotlb_tbl_map_single() has initialized the bounce buffer - * proper to the contents of the original memory buffer. - */ - if (dev_is_untrusted(dev)) { - size_t start, virt =3D (size_t)phys_to_virt(phys); - - /* Pre-padding */ - start =3D iova_align_down(iovad, virt); - memset((void *)start, 0, virt - start); - - /* Post-padding */ - start =3D virt + size; - memset((void *)start, 0, - iova_align(iovad, start) - start); - } } =20 if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) --=20 2.49.0 From nobody Mon Feb 9 00:47:19 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EAFA31DC98B; Mon, 5 May 2025 07:02:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746428547; cv=none; b=cIwY9DQkNJnRQqZL1iJEeyB5Awhxor6lZvc67VDCYDdvSokDdfRjuYoM00ubcr8Pn02q94x6DYyn7YYYUyZOCw3JMvWy+1kSAhv+xqnNBM2/K5781IfWIyKsEQIia7AJunO3x/L0WA/FBuAEOw8Pwao/E5JEgba0RLIJR/epWgw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746428547; c=relaxed/simple; bh=OqVY+n6v+7S7pyZo8PdJ2+LmT5/B6VKI0mwJdkJCFKg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=nXKB47ZezfqMc4fRemOgMk4XrgmO6KSZDTvjhktCLOyCm3VckMcGn5s3QIMHqKNfMYAcPQmX1mFkU89SnGjD4vagIZpWaJZJlYafSp0r7bMhwQImiMQdGGOhP2S1VHnXMfjU0OyXWSl+HSe4L+VtRZpYfOueyLVheaEK4SzbtmA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=YEBsdGBI; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="YEBsdGBI" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4017BC4CEEF; Mon, 5 May 2025 07:02:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1746428546; bh=OqVY+n6v+7S7pyZo8PdJ2+LmT5/B6VKI0mwJdkJCFKg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=YEBsdGBIzt1tmoFUlPgwxbUt6Map7qYLunTdNuUjPrzcerFGyL4Kzyk+0TVD5BoXz euz6+X/5aUNGTTTNo2iZH9Bp2cV57g2pEcb71sD1lM67DrMFb9y6WGJ33HDm7LXTKZ VPfqcDgFb5srO3QSzcLSLYGV7wYSycu557udKIxEgQrzOEkUjp7oGXtqAIm6WQFxUu M6CH2lB+Zz4je8OGbLpqwEFC6u9jAEUG4ZZRQcRp1ntqpfK7g/PJYdhcx/AIbAcGTI 69NnZuPVEFSAqQ59+zTjweuZZohzp9W3GlFe9DXAkZyi2jyER/WXFwe9p06nek7jUm gi9tU66ZaiXuA== From: Leon Romanovsky To: Cc: Leon Romanovsky , Jens Axboe , Christoph Hellwig , Keith Busch , Jake Edge , Jonathan Corbet , Jason Gunthorpe , Zhu Yanjun , Robin Murphy , Joerg Roedel , Will Deacon , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Andrew Morton , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, Niklas Schnelle , Chuck Lever , Luis Chamberlain , Matthew Wilcox , Dan Williams , Kanchan Joshi , Chaitanya Kulkarni Subject: [PATCH v11 7/9] dma-mapping: Implement link/unlink ranges API Date: Mon, 5 May 2025 10:01:44 +0300 Message-ID: <41f0281051375512df1304abed642dcea2ae1e6b.1746424934.git.leon@kernel.org> X-Mailer: git-send-email 2.49.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Leon Romanovsky Introduce new DMA APIs to perform DMA linkage of buffers in layers higher than DMA. In proposed API, the callers will perform the following steps. In map path: if (dma_can_use_iova(...)) dma_iova_alloc() for (page in range) dma_iova_link_next(...) dma_iova_sync(...) else /* Fallback to legacy map pages */ for (all pages) dma_map_page(...) In unmap path: if (dma_can_use_iova(...)) dma_iova_destroy() else for (all pages) dma_unmap_page(...) Reviewed-by: Christoph Hellwig Tested-by: Jens Axboe Reviewed-by: Luis Chamberlain Signed-off-by: Leon Romanovsky --- drivers/iommu/dma-iommu.c | 275 +++++++++++++++++++++++++++++++++++- include/linux/dma-mapping.h | 32 +++++ 2 files changed, 306 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index d7684024c439..98f7205ec8fb 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -1175,6 +1175,17 @@ static phys_addr_t iommu_dma_map_swiotlb(struct devi= ce *dev, phys_addr_t phys, return phys; } =20 +/* + * Checks if a physical buffer has unaligned boundaries with respect to + * the IOMMU granule. Returns non-zero if either the start or end + * address is not aligned to the granule boundary. + */ +static inline size_t iova_unaligned(struct iova_domain *iovad, phys_addr_t= phys, + size_t size) +{ + return iova_offset(iovad, phys | size); +} + dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page, unsigned long offset, size_t size, enum dma_data_direction dir, unsigned long attrs) @@ -1192,7 +1203,7 @@ dma_addr_t iommu_dma_map_page(struct device *dev, str= uct page *page, * we don't need to use a bounce page. */ if (dev_use_swiotlb(dev, size, dir) && - iova_offset(iovad, phys | size)) { + iova_unaligned(iovad, phys, size)) { phys =3D iommu_dma_map_swiotlb(dev, phys, size, dir, attrs); if (phys =3D=3D (phys_addr_t)DMA_MAPPING_ERROR) return DMA_MAPPING_ERROR; @@ -1818,6 +1829,268 @@ void dma_iova_free(struct device *dev, struct dma_i= ova_state *state) } EXPORT_SYMBOL_GPL(dma_iova_free); =20 +static int __dma_iova_link(struct device *dev, dma_addr_t addr, + phys_addr_t phys, size_t size, enum dma_data_direction dir, + unsigned long attrs) +{ + bool coherent =3D dev_is_dma_coherent(dev); + + if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + arch_sync_dma_for_device(phys, size, dir); + + return iommu_map_nosync(iommu_get_dma_domain(dev), addr, phys, size, + dma_info_to_prot(dir, coherent, attrs), GFP_ATOMIC); +} + +static int iommu_dma_iova_bounce_and_link(struct device *dev, dma_addr_t a= ddr, + phys_addr_t phys, size_t bounce_len, + enum dma_data_direction dir, unsigned long attrs, + size_t iova_start_pad) +{ + struct iommu_domain *domain =3D iommu_get_dma_domain(dev); + struct iova_domain *iovad =3D &domain->iova_cookie->iovad; + phys_addr_t bounce_phys; + int error; + + bounce_phys =3D iommu_dma_map_swiotlb(dev, phys, bounce_len, dir, attrs); + if (bounce_phys =3D=3D DMA_MAPPING_ERROR) + return -ENOMEM; + + error =3D __dma_iova_link(dev, addr - iova_start_pad, + bounce_phys - iova_start_pad, + iova_align(iovad, bounce_len), dir, attrs); + if (error) + swiotlb_tbl_unmap_single(dev, bounce_phys, bounce_len, dir, + attrs); + return error; +} + +static int iommu_dma_iova_link_swiotlb(struct device *dev, + struct dma_iova_state *state, phys_addr_t phys, size_t offset, + size_t size, enum dma_data_direction dir, unsigned long attrs) +{ + struct iommu_domain *domain =3D iommu_get_dma_domain(dev); + struct iommu_dma_cookie *cookie =3D domain->iova_cookie; + struct iova_domain *iovad =3D &cookie->iovad; + size_t iova_start_pad =3D iova_offset(iovad, phys); + size_t iova_end_pad =3D iova_offset(iovad, phys + size); + dma_addr_t addr =3D state->addr + offset; + size_t mapped =3D 0; + int error; + + if (iova_start_pad) { + size_t bounce_len =3D min(size, iovad->granule - iova_start_pad); + + error =3D iommu_dma_iova_bounce_and_link(dev, addr, phys, + bounce_len, dir, attrs, iova_start_pad); + if (error) + return error; + state->__size |=3D DMA_IOVA_USE_SWIOTLB; + + mapped +=3D bounce_len; + size -=3D bounce_len; + if (!size) + return 0; + } + + size -=3D iova_end_pad; + error =3D __dma_iova_link(dev, addr + mapped, phys + mapped, size, dir, + attrs); + if (error) + goto out_unmap; + mapped +=3D size; + + if (iova_end_pad) { + error =3D iommu_dma_iova_bounce_and_link(dev, addr + mapped, + phys + mapped, iova_end_pad, dir, attrs, 0); + if (error) + goto out_unmap; + state->__size |=3D DMA_IOVA_USE_SWIOTLB; + } + + return 0; + +out_unmap: + dma_iova_unlink(dev, state, 0, mapped, dir, attrs); + return error; +} + +/** + * dma_iova_link - Link a range of IOVA space + * @dev: DMA device + * @state: IOVA state + * @phys: physical address to link + * @offset: offset into the IOVA state to map into + * @size: size of the buffer + * @dir: DMA direction + * @attrs: attributes of mapping properties + * + * Link a range of IOVA space for the given IOVA state without IOTLB sync. + * This function is used to link multiple physical addresses in contiguous + * IOVA space without performing costly IOTLB sync. + * + * The caller is responsible to call to dma_iova_sync() to sync IOTLB at + * the end of linkage. + */ +int dma_iova_link(struct device *dev, struct dma_iova_state *state, + phys_addr_t phys, size_t offset, size_t size, + enum dma_data_direction dir, unsigned long attrs) +{ + struct iommu_domain *domain =3D iommu_get_dma_domain(dev); + struct iommu_dma_cookie *cookie =3D domain->iova_cookie; + struct iova_domain *iovad =3D &cookie->iovad; + size_t iova_start_pad =3D iova_offset(iovad, phys); + + if (WARN_ON_ONCE(iova_start_pad && offset > 0)) + return -EIO; + + if (dev_use_swiotlb(dev, size, dir) && + iova_unaligned(iovad, phys, size)) + return iommu_dma_iova_link_swiotlb(dev, state, phys, offset, + size, dir, attrs); + + return __dma_iova_link(dev, state->addr + offset - iova_start_pad, + phys - iova_start_pad, + iova_align(iovad, size + iova_start_pad), dir, attrs); +} +EXPORT_SYMBOL_GPL(dma_iova_link); + +/** + * dma_iova_sync - Sync IOTLB + * @dev: DMA device + * @state: IOVA state + * @offset: offset into the IOVA state to sync + * @size: size of the buffer + * + * Sync IOTLB for the given IOVA state. This function should be called on + * the IOVA-contiguous range created by one ore more dma_iova_link() calls + * to sync the IOTLB. + */ +int dma_iova_sync(struct device *dev, struct dma_iova_state *state, + size_t offset, size_t size) +{ + struct iommu_domain *domain =3D iommu_get_dma_domain(dev); + struct iommu_dma_cookie *cookie =3D domain->iova_cookie; + struct iova_domain *iovad =3D &cookie->iovad; + dma_addr_t addr =3D state->addr + offset; + size_t iova_start_pad =3D iova_offset(iovad, addr); + + return iommu_sync_map(domain, addr - iova_start_pad, + iova_align(iovad, size + iova_start_pad)); +} +EXPORT_SYMBOL_GPL(dma_iova_sync); + +static void iommu_dma_iova_unlink_range_slow(struct device *dev, + dma_addr_t addr, size_t size, enum dma_data_direction dir, + unsigned long attrs) +{ + struct iommu_domain *domain =3D iommu_get_dma_domain(dev); + struct iommu_dma_cookie *cookie =3D domain->iova_cookie; + struct iova_domain *iovad =3D &cookie->iovad; + size_t iova_start_pad =3D iova_offset(iovad, addr); + dma_addr_t end =3D addr + size; + + do { + phys_addr_t phys; + size_t len; + + phys =3D iommu_iova_to_phys(domain, addr); + if (WARN_ON(!phys)) + /* Something very horrible happen here */ + return; + + len =3D min_t(size_t, + end - addr, iovad->granule - iova_start_pad); + + if (!dev_is_dma_coherent(dev) && + !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + arch_sync_dma_for_cpu(phys, len, dir); + + swiotlb_tbl_unmap_single(dev, phys, len, dir, attrs); + + addr +=3D len; + iova_start_pad =3D 0; + } while (addr < end); +} + +static void __iommu_dma_iova_unlink(struct device *dev, + struct dma_iova_state *state, size_t offset, size_t size, + enum dma_data_direction dir, unsigned long attrs, + bool free_iova) +{ + struct iommu_domain *domain =3D iommu_get_dma_domain(dev); + struct iommu_dma_cookie *cookie =3D domain->iova_cookie; + struct iova_domain *iovad =3D &cookie->iovad; + dma_addr_t addr =3D state->addr + offset; + size_t iova_start_pad =3D iova_offset(iovad, addr); + struct iommu_iotlb_gather iotlb_gather; + size_t unmapped; + + if ((state->__size & DMA_IOVA_USE_SWIOTLB) || + (!dev_is_dma_coherent(dev) && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))) + iommu_dma_iova_unlink_range_slow(dev, addr, size, dir, attrs); + + iommu_iotlb_gather_init(&iotlb_gather); + iotlb_gather.queued =3D free_iova && READ_ONCE(cookie->fq_domain); + + size =3D iova_align(iovad, size + iova_start_pad); + addr -=3D iova_start_pad; + unmapped =3D iommu_unmap_fast(domain, addr, size, &iotlb_gather); + WARN_ON(unmapped !=3D size); + + if (!iotlb_gather.queued) + iommu_iotlb_sync(domain, &iotlb_gather); + if (free_iova) + iommu_dma_free_iova(domain, addr, size, &iotlb_gather); +} + +/** + * dma_iova_unlink - Unlink a range of IOVA space + * @dev: DMA device + * @state: IOVA state + * @offset: offset into the IOVA state to unlink + * @size: size of the buffer + * @dir: DMA direction + * @attrs: attributes of mapping properties + * + * Unlink a range of IOVA space for the given IOVA state. + */ +void dma_iova_unlink(struct device *dev, struct dma_iova_state *state, + size_t offset, size_t size, enum dma_data_direction dir, + unsigned long attrs) +{ + __iommu_dma_iova_unlink(dev, state, offset, size, dir, attrs, false); +} +EXPORT_SYMBOL_GPL(dma_iova_unlink); + +/** + * dma_iova_destroy - Finish a DMA mapping transaction + * @dev: DMA device + * @state: IOVA state + * @mapped_len: number of bytes to unmap + * @dir: DMA direction + * @attrs: attributes of mapping properties + * + * Unlink the IOVA range up to @mapped_len and free the entire IOVA space.= The + * range of IOVA from dma_addr to @mapped_len must all be linked, and be t= he + * only linked IOVA in state. + */ +void dma_iova_destroy(struct device *dev, struct dma_iova_state *state, + size_t mapped_len, enum dma_data_direction dir, + unsigned long attrs) +{ + if (mapped_len) + __iommu_dma_iova_unlink(dev, state, 0, mapped_len, dir, attrs, + true); + else + /* + * We can be here if first call to dma_iova_link() failed and + * there is nothing to unlink, so let's be more clear. + */ + dma_iova_free(dev, state); +} +EXPORT_SYMBOL_GPL(dma_iova_destroy); + void iommu_setup_dma_ops(struct device *dev) { struct iommu_domain *domain =3D iommu_get_domain_for_dev(dev); diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index de7f73810d54..a71e110f1e9d 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -309,6 +309,17 @@ static inline bool dma_use_iova(struct dma_iova_state = *state) bool dma_iova_try_alloc(struct device *dev, struct dma_iova_state *state, phys_addr_t phys, size_t size); void dma_iova_free(struct device *dev, struct dma_iova_state *state); +void dma_iova_destroy(struct device *dev, struct dma_iova_state *state, + size_t mapped_len, enum dma_data_direction dir, + unsigned long attrs); +int dma_iova_sync(struct device *dev, struct dma_iova_state *state, + size_t offset, size_t size); +int dma_iova_link(struct device *dev, struct dma_iova_state *state, + phys_addr_t phys, size_t offset, size_t size, + enum dma_data_direction dir, unsigned long attrs); +void dma_iova_unlink(struct device *dev, struct dma_iova_state *state, + size_t offset, size_t size, enum dma_data_direction dir, + unsigned long attrs); #else /* CONFIG_IOMMU_DMA */ static inline bool dma_use_iova(struct dma_iova_state *state) { @@ -323,6 +334,27 @@ static inline void dma_iova_free(struct device *dev, struct dma_iova_state *state) { } +static inline void dma_iova_destroy(struct device *dev, + struct dma_iova_state *state, size_t mapped_len, + enum dma_data_direction dir, unsigned long attrs) +{ +} +static inline int dma_iova_sync(struct device *dev, + struct dma_iova_state *state, size_t offset, size_t size) +{ + return -EOPNOTSUPP; +} +static inline int dma_iova_link(struct device *dev, + struct dma_iova_state *state, phys_addr_t phys, size_t offset, + size_t size, enum dma_data_direction dir, unsigned long attrs) +{ + return -EOPNOTSUPP; +} +static inline void dma_iova_unlink(struct device *dev, + struct dma_iova_state *state, size_t offset, size_t size, + enum dma_data_direction dir, unsigned long attrs) +{ +} #endif /* CONFIG_IOMMU_DMA */ =20 #if defined(CONFIG_HAS_DMA) && defined(CONFIG_DMA_NEED_SYNC) --=20 2.49.0 From nobody Mon Feb 9 00:47:19 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AF2951DDA39; Mon, 5 May 2025 07:02:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746428550; cv=none; b=o/uEx2zWxfOy5Vp96sRHQM9JCR/gHavY7hR4+N7HAH4KTVPN0X70LC6IPd9VeB2wqHu4imqNSE0orr9n7zcqr/klWbCx35nXrHFYTotXR3dtxeVDMzR22GSCGYCa2VtDR+djx9z1mKdt63mOoOZy2CMpV9BlQSUlJeKYKXp8OEg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746428550; c=relaxed/simple; bh=zDzlqykv86nTBfYEytc8wm5wkFShzSQKAsrmEhNbBwU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=M0iV4R31euIXK9aX566B9sTWpdlE0ANaZHCDeNAcNTowJS/chWyx9J3osbmtNcKadqtCeTCp5Eiz3mIqiQ8znNMqsvjVWwCtzX/+ea8DtnL8PPXvp6qfWJMHdp6Y2qJiyYtV+5hpJO+9QNiz7pA2MAW6Z6g4rtP+1X142stosjE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=qqtM0p4g; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="qqtM0p4g" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 824E5C4CEE4; Mon, 5 May 2025 07:02:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1746428550; bh=zDzlqykv86nTBfYEytc8wm5wkFShzSQKAsrmEhNbBwU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=qqtM0p4gWP9l89f4xe03WY9yh8S81jdJOy9cNmMjR+Mi9fNCGSmR3SKl9rQnAZ6HK ifaZF9J3ukpYGHCWk8qbzAYHobwBSK6ZPE9c8zYYX98JeidbQ7xvgllEiqhKw0HOpM dzM7fO8tAGhd7WaF/P3BocqEX/LTe/YP3sRfSN9JLTsTIeDn2vlwVi/rBP3xpLyLA0 Hfk7nonUdMRXWPDCOVNaRXYIZ29to9Zuf/kf5xo6D/n5im+HZTPVbVJ8h8oioMZnvc WaFLtd5J2lafRtA3roO1fRn3KaNxX7b0rzVH9A9CgLTev2i9azW/8Bo4QESW8C3baq DP06qo478a2iw== From: Leon Romanovsky To: Cc: Christoph Hellwig , Jens Axboe , Keith Busch , Jake Edge , Jonathan Corbet , Jason Gunthorpe , Zhu Yanjun , Robin Murphy , Joerg Roedel , Will Deacon , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Andrew Morton , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, Niklas Schnelle , Chuck Lever , Luis Chamberlain , Matthew Wilcox , Dan Williams , Kanchan Joshi , Chaitanya Kulkarni , Leon Romanovsky Subject: [PATCH v11 8/9] dma-mapping: add a dma_need_unmap helper Date: Mon, 5 May 2025 10:01:45 +0300 Message-ID: <11ca5400460fa195692bd413387b06ac484e03b4.1746424934.git.leon@kernel.org> X-Mailer: git-send-email 2.49.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Christoph Hellwig Add helper that allows a driver to skip calling dma_unmap_* if the DMA layer can guarantee that they are no-nops. Signed-off-by: Christoph Hellwig Tested-by: Jens Axboe Reviewed-by: Luis Chamberlain Signed-off-by: Leon Romanovsky --- include/linux/dma-mapping.h | 5 +++++ kernel/dma/mapping.c | 18 ++++++++++++++++++ 2 files changed, 23 insertions(+) diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index a71e110f1e9d..d2f358c5a25d 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -406,6 +406,7 @@ static inline bool dma_need_sync(struct device *dev, dm= a_addr_t dma_addr) { return dma_dev_need_sync(dev) ? __dma_need_sync(dev, dma_addr) : false; } +bool dma_need_unmap(struct device *dev); #else /* !CONFIG_HAS_DMA || !CONFIG_DMA_NEED_SYNC */ static inline bool dma_dev_need_sync(const struct device *dev) { @@ -431,6 +432,10 @@ static inline bool dma_need_sync(struct device *dev, d= ma_addr_t dma_addr) { return false; } +static inline bool dma_need_unmap(struct device *dev) +{ + return false; +} #endif /* !CONFIG_HAS_DMA || !CONFIG_DMA_NEED_SYNC */ =20 struct page *dma_alloc_pages(struct device *dev, size_t size, diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index cda127027e48..3c3204ad2839 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -443,6 +443,24 @@ bool __dma_need_sync(struct device *dev, dma_addr_t dm= a_addr) } EXPORT_SYMBOL_GPL(__dma_need_sync); =20 +/** + * dma_need_unmap - does this device need dma_unmap_* operations + * @dev: device to check + * + * If this function returns %false, drivers can skip calling dma_unmap_* a= fter + * finishing an I/O. This function must be called after all mappings that= might + * need to be unmapped have been performed. + */ +bool dma_need_unmap(struct device *dev) +{ + if (!dma_map_direct(dev, get_dma_ops(dev))) + return true; + if (!dev->dma_skip_sync) + return true; + return IS_ENABLED(CONFIG_DMA_API_DEBUG); +} +EXPORT_SYMBOL_GPL(dma_need_unmap); + static void dma_setup_need_sync(struct device *dev) { const struct dma_map_ops *ops =3D get_dma_ops(dev); --=20 2.49.0 From nobody Mon Feb 9 00:47:19 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 30D7B1F4C8A; Mon, 5 May 2025 07:02:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746428558; cv=none; b=K9PqOFwzWxOalEnbu6uJK0xhYtjEw6nzYSgSQQwsg7Tgn83ksalhp0NqGFjT/OeVdIrAC27TRBf8KVlPag0qF4PHZVQEG1E7Wnbi76JrwoRaB4I6txG97C3LqVQKOtv81KUB7VBMF4/+9fcERTXMMdySMK6I7cSQPigqk+iPyCo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746428558; c=relaxed/simple; bh=d1+z79zg+kihe5QDvuiZPuBTqwMqD9eJrziwPARATZw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HfvIs/uUAdqD/dz/m/Emmc/NnoUCypT3XeHf6PV3eyA2CULIqoA+PIWlooQO+jM0ZcqPZmlIGAM7qSIDdJ8VkOFvmPOt6hBnBNv6bqPLfHaONCB7BlDw7gDB5U7j1uWWhLzNRHhThUA2PJuJbcci/jgCuDaIQ+5+vVfVZiY2b7A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=udIyTs1j; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="udIyTs1j" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EA5DBC4CEE4; Mon, 5 May 2025 07:02:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1746428557; bh=d1+z79zg+kihe5QDvuiZPuBTqwMqD9eJrziwPARATZw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=udIyTs1jU0xkkPO2JwNPXSUyJHDJzOGOAJqZEBd18QourGwZw+baXEZ9gTe8BJ0EJ DKGtIef9GidDihg430dzb3TV4U9nSk0oH0fWDOqZMUUied876JK1wN1He2oUOPQ9+s 4talNO552304u79Njy1LPb0QYEBAPSm4JRnjt0U38ZXHn3uQXTkQ6SLZmvo5VPfcSN 4oQb+HpClX6Qepsm/ir1G8Cf48WpHUEF6GV/yNoKGVauQurcR8AKjzjpLESRp4OM8E eFEoy59bj9uf5LIDs5y+12okk38f0cZDP+qoR+lYvzBX1noadVlU3+zY/zH2A2FB5S 5rtZiYk4O/uXQ== From: Leon Romanovsky To: Cc: Christoph Hellwig , Jens Axboe , Keith Busch , Jake Edge , Jonathan Corbet , Jason Gunthorpe , Zhu Yanjun , Robin Murphy , Joerg Roedel , Will Deacon , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Andrew Morton , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, Niklas Schnelle , Chuck Lever , Luis Chamberlain , Matthew Wilcox , Dan Williams , Kanchan Joshi , Chaitanya Kulkarni , Leon Romanovsky Subject: [PATCH v11 9/9] docs: core-api: document the IOVA-based API Date: Mon, 5 May 2025 10:01:46 +0300 Message-ID: X-Mailer: git-send-email 2.49.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Christoph Hellwig Add an explanation of the newly added IOVA-based mapping API. Signed-off-by: Christoph Hellwig Tested-by: Jens Axboe Signed-off-by: Leon Romanovsky --- Documentation/core-api/dma-api.rst | 71 ++++++++++++++++++++++++++++++ 1 file changed, 71 insertions(+) diff --git a/Documentation/core-api/dma-api.rst b/Documentation/core-api/dm= a-api.rst index 8e3cce3d0a23..2ad08517e626 100644 --- a/Documentation/core-api/dma-api.rst +++ b/Documentation/core-api/dma-api.rst @@ -530,6 +530,77 @@ routines, e.g.::: .... } =20 +Part Ie - IOVA-based DMA mappings +--------------------------------- + +These APIs allow a very efficient mapping when using an IOMMU. They are an +optional path that requires extra code and are only recommended for drivers +where DMA mapping performance, or the space usage for storing the DMA addr= esses +matter. All the considerations from the previous section apply here as we= ll. + +:: + + bool dma_iova_try_alloc(struct device *dev, struct dma_iova_state *sta= te, + phys_addr_t phys, size_t size); + +Is used to try to allocate IOVA space for mapping operation. If it returns +false this API can't be used for the given device and the normal streaming +DMA mapping API should be used. The ``struct dma_iova_state`` is allocated +by the driver and must be kept around until unmap time. + +:: + + static inline bool dma_use_iova(struct dma_iova_state *state) + +Can be used by the driver to check if the IOVA-based API is used after a +call to dma_iova_try_alloc. This can be useful in the unmap path. + +:: + + int dma_iova_link(struct device *dev, struct dma_iova_state *state, + phys_addr_t phys, size_t offset, size_t size, + enum dma_data_direction dir, unsigned long attrs); + +Is used to link ranges to the IOVA previously allocated. The start of all +but the first call to dma_iova_link for a given state must be aligned +to the DMA merge boundary returned by ``dma_get_merge_boundary())``, and +the size of all but the last range must be aligned to the DMA merge bounda= ry +as well. + +:: + + int dma_iova_sync(struct device *dev, struct dma_iova_state *state, + size_t offset, size_t size); + +Must be called to sync the IOMMU page tables for IOVA-range mapped by one = or +more calls to ``dma_iova_link()``. + +For drivers that use a one-shot mapping, all ranges can be unmapped and the +IOVA freed by calling: + +:: + + void dma_iova_destroy(struct device *dev, struct dma_iova_state *state, + size_t mapped_len, enum dma_data_direction dir, + unsigned long attrs); + +Alternatively drivers can dynamically manage the IOVA space by unmapping +and mapping individual regions. In that case + +:: + + void dma_iova_unlink(struct device *dev, struct dma_iova_state *state, + size_t offset, size_t size, enum dma_data_direction dir, + unsigned long attrs); + +is used to unmap a range previously mapped, and + +:: + + void dma_iova_free(struct device *dev, struct dma_iova_state *state); + +is used to free the IOVA space. All regions must have been unmapped using +``dma_iova_unlink()`` before calling ``dma_iova_free()``. =20 Part II - Non-coherent DMA allocations -------------------------------------- --=20 2.49.0