From nobody Sat Nov 30 01:44:58 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 53616194AD7; Thu, 12 Sep 2024 11:16:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139772; cv=none; b=CQ800Uy6J9Kwd11MJ5jbemniajt/PjeFSFBcFrYGg4tDS/MnNtJvBKClqX+PP4+21BayUQggMy7cqWDMJnlUonUzky+OVQBGiLGhBU819moZ61CCp8sfwEC/oL8qX6B4wc254Ug3b+reeS/ZooJwHU/jQwrfbHqTXGJkJdo99HA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139772; c=relaxed/simple; bh=wScFwwk329QxKWfLLj8pz7NHQQe+VUBRN1u2NOeObGA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=D7oU1cqL2FiqlHO03oTeb3hbeaxzS/e7uxxc9CpvZ8CyBvPlGLxdtBM1wkDXkaOnLFffrw1WM0SGixiky5jkPTv31FL5o8EtJmudu6xT6zMhmwAqVhoje/3GJX4Z2nhB6o/xv1RbTx1I+wqfSmls0u2s2UWuOatkdseMmx5gggs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ElXcUbpH; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ElXcUbpH" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 126DEC4CEC3; Thu, 12 Sep 2024 11:16:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726139771; bh=wScFwwk329QxKWfLLj8pz7NHQQe+VUBRN1u2NOeObGA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ElXcUbpHs5KI2K+PlHKsjpKVaEdXIxDpLVBVHKF4mWEuE2E7Ocs9+wyk4s1iPJPZl MksOnT4XNgcgbVxPfE2E5QYww/ITakbKdUQDYgbzYGmg/vX++ZK2D4Lanvdqp+YK95 ZXZcZyroMhH2X46lASJh3WD5m7PC3YPmecSgvvLBCwJeZ22yUP66AhIpUjdeWqJsPz QrVBKcD5kMCzIh+oE7Oa4e8rYNzkrM0zuv78ko8JsMq1aKACqwkFXDOZsH3otwMUTy p5EkgVVvI2x8IWEdGZtUfwuKhR+nBBCQZLf3AiKXGSgFqxAIegmk3Tdh282tqbQyGP +ryDDdtJImP5A== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 01/21] iommu/dma: Provide an interface to allow preallocate IOVA Date: Thu, 12 Sep 2024 14:15:36 +0300 Message-ID: <8ae3944565cd7b140625a71b8c7e74ca466bd3ec.1726138681.git.leon@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Leon Romanovsky Separate IOVA allocation to dedicated callback so it will allow cache of IOVA and reuse it in fast paths for devices which support ODP (on-demand-paging) mechanism. Signed-off-by: Leon Romanovsky --- drivers/iommu/dma-iommu.c | 57 ++++++++++++++++++++++++++++++--------- include/linux/iommu-dma.h | 11 ++++++++ 2 files changed, 56 insertions(+), 12 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 65a38b5695f9..09deea2fc86b 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -358,7 +358,7 @@ int iommu_dma_init_fq(struct iommu_domain *domain) atomic_set(&cookie->fq_timer_on, 0); /* * Prevent incomplete fq state being observable. Pairs with path from - * __iommu_dma_unmap() through iommu_dma_free_iova() to queue_iova() + * __iommu_dma_unmap() through __iommu_dma_free_iova() to queue_iova() */ smp_wmb(); WRITE_ONCE(cookie->fq_domain, domain); @@ -759,7 +759,7 @@ static int dma_info_to_prot(enum dma_data_direction dir= , bool coherent, } } =20 -static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain *domain, +static dma_addr_t __iommu_dma_alloc_iova(struct iommu_domain *domain, size_t size, u64 dma_limit, struct device *dev) { struct iommu_dma_cookie *cookie =3D domain->iova_cookie; @@ -805,7 +805,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_dom= ain *domain, return (dma_addr_t)iova << shift; } =20 -static void iommu_dma_free_iova(struct iommu_dma_cookie *cookie, +static void __iommu_dma_free_iova(struct iommu_dma_cookie *cookie, dma_addr_t iova, size_t size, struct iommu_iotlb_gather *gather) { struct iova_domain *iovad =3D &cookie->iovad; @@ -842,7 +842,7 @@ static void __iommu_dma_unmap(struct device *dev, dma_a= ddr_t dma_addr, =20 if (!iotlb_gather.queued) iommu_iotlb_sync(domain, &iotlb_gather); - iommu_dma_free_iova(cookie, dma_addr, size, &iotlb_gather); + __iommu_dma_free_iova(cookie, dma_addr, size, &iotlb_gather); } =20 static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys, @@ -865,12 +865,12 @@ static dma_addr_t __iommu_dma_map(struct device *dev,= phys_addr_t phys, =20 size =3D iova_align(iovad, size + iova_off); =20 - iova =3D iommu_dma_alloc_iova(domain, size, dma_mask, dev); + iova =3D __iommu_dma_alloc_iova(domain, size, dma_mask, dev); if (!iova) return DMA_MAPPING_ERROR; =20 if (iommu_map(domain, iova, phys - iova_off, size, prot, GFP_ATOMIC)) { - iommu_dma_free_iova(cookie, iova, size, NULL); + __iommu_dma_free_iova(cookie, iova, size, NULL); return DMA_MAPPING_ERROR; } return iova + iova_off; @@ -973,7 +973,7 @@ static struct page **__iommu_dma_alloc_noncontiguous(st= ruct device *dev, return NULL; =20 size =3D iova_align(iovad, size); - iova =3D iommu_dma_alloc_iova(domain, size, dev->coherent_dma_mask, dev); + iova =3D __iommu_dma_alloc_iova(domain, size, dev->coherent_dma_mask, dev= ); if (!iova) goto out_free_pages; =20 @@ -1007,7 +1007,7 @@ static struct page **__iommu_dma_alloc_noncontiguous(= struct device *dev, out_free_sg: sg_free_table(sgt); out_free_iova: - iommu_dma_free_iova(cookie, iova, size, NULL); + __iommu_dma_free_iova(cookie, iova, size, NULL); out_free_pages: __iommu_dma_free_pages(pages, count); return NULL; @@ -1434,7 +1434,7 @@ int iommu_dma_map_sg(struct device *dev, struct scatt= erlist *sg, int nents, if (!iova_len) return __finalise_sg(dev, sg, nents, 0); =20 - iova =3D iommu_dma_alloc_iova(domain, iova_len, dma_get_mask(dev), dev); + iova =3D __iommu_dma_alloc_iova(domain, iova_len, dma_get_mask(dev), dev); if (!iova) { ret =3D -ENOMEM; goto out_restore_sg; @@ -1451,7 +1451,7 @@ int iommu_dma_map_sg(struct device *dev, struct scatt= erlist *sg, int nents, return __finalise_sg(dev, sg, nents, iova); =20 out_free_iova: - iommu_dma_free_iova(cookie, iova, iova_len, NULL); + __iommu_dma_free_iova(cookie, iova, iova_len, NULL); out_restore_sg: __invalidate_sg(sg, nents); out: @@ -1710,6 +1710,39 @@ size_t iommu_dma_max_mapping_size(struct device *dev) return SIZE_MAX; } =20 +int iommu_dma_alloc_iova(struct dma_iova_state *state, phys_addr_t phys, + size_t size) +{ + struct iommu_domain *domain =3D iommu_get_dma_domain(state->dev); + struct iommu_dma_cookie *cookie =3D domain->iova_cookie; + struct iova_domain *iovad =3D &cookie->iovad; + dma_addr_t addr; + + size =3D iova_align(iovad, size + iova_offset(iovad, phys)); + addr =3D __iommu_dma_alloc_iova(domain, size, dma_get_mask(state->dev), + state->dev); + if (addr =3D=3D DMA_MAPPING_ERROR) + return -EINVAL; + + state->addr =3D addr; + state->size =3D size; + return 0; +} + +void iommu_dma_free_iova(struct dma_iova_state *state) +{ + struct iommu_domain *domain =3D iommu_get_dma_domain(state->dev); + struct iommu_dma_cookie *cookie =3D domain->iova_cookie; + struct iova_domain *iovad =3D &cookie->iovad; + size_t iova_off =3D iova_offset(iovad, state->addr); + struct iommu_iotlb_gather iotlb_gather; + + iommu_iotlb_gather_init(&iotlb_gather); + __iommu_dma_free_iova(cookie, state->addr - iova_off, + iova_align(iovad, state->size + iova_off), + &iotlb_gather); +} + void iommu_setup_dma_ops(struct device *dev) { struct iommu_domain *domain =3D iommu_get_domain_for_dev(dev); @@ -1746,7 +1779,7 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_p= age(struct device *dev, if (!msi_page) return NULL; =20 - iova =3D iommu_dma_alloc_iova(domain, size, dma_get_mask(dev), dev); + iova =3D __iommu_dma_alloc_iova(domain, size, dma_get_mask(dev), dev); if (!iova) goto out_free_page; =20 @@ -1760,7 +1793,7 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_p= age(struct device *dev, return msi_page; =20 out_free_iova: - iommu_dma_free_iova(cookie, iova, size, NULL); + __iommu_dma_free_iova(cookie, iova, size, NULL); out_free_page: kfree(msi_page); return NULL; diff --git a/include/linux/iommu-dma.h b/include/linux/iommu-dma.h index 13874f95d77f..698df67b152a 100644 --- a/include/linux/iommu-dma.h +++ b/include/linux/iommu-dma.h @@ -57,6 +57,9 @@ void iommu_dma_sync_sg_for_cpu(struct device *dev, struct= scatterlist *sgl, int nelems, enum dma_data_direction dir); void iommu_dma_sync_sg_for_device(struct device *dev, struct scatterlist *= sgl, int nelems, enum dma_data_direction dir); +int iommu_dma_alloc_iova(struct dma_iova_state *state, phys_addr_t phys, + size_t size); +void iommu_dma_free_iova(struct dma_iova_state *state); #else static inline bool use_dma_iommu(struct device *dev) { @@ -173,5 +176,13 @@ static inline void iommu_dma_sync_sg_for_device(struct= device *dev, enum dma_data_direction dir) { } +static inline int iommu_dma_alloc_iova(struct dma_iova_state *state, + phys_addr_t phys, size_t size) +{ + return -EOPNOTSUPP; +} +static inline void iommu_dma_free_iova(struct dma_iova_state *state) +{ +} #endif /* CONFIG_IOMMU_DMA */ #endif /* _LINUX_IOMMU_DMA_H */ --=20 2.46.0 From nobody Sat Nov 30 01:44:58 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1DCF5145B14; Thu, 12 Sep 2024 11:16:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139768; cv=none; b=U0Z/jsajQA1WoLO9CAFhLumbUYkDCpCpTTvsHSi+Nbs0VinIx0jaFt/5n87BIagnhG/sK32tMmBUhy1Hm+6mqYVk+b9lRirBD6eUiKZgxOvXKDJmp/DESCax2KrzX5ie10WNaPCOvaLS4D1Yc2QNNTomDNfwiwkDbRM32i1YQvI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139768; c=relaxed/simple; bh=VWjabdNWVPsGz8JpMutT65B9uF9Uo/BpJGgpfDUTVak=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=o2PdtgZBGun+51GiavOvSJh/0J6cTGXwgIawR4WgBESqlSYbuTbxlYfSvFLoHshDCHpQGKITnXNPM9GKhL6KMq0oSNi33Wevb4P50zyDaU45QmZTBWB975M8tci80UKOJ3yivdPfkOH0gTnJreu7COsp8M70FI8+x5UaFH7XMSk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=EmLlNz91; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="EmLlNz91" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D77C3C4CEC3; Thu, 12 Sep 2024 11:16:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726139767; bh=VWjabdNWVPsGz8JpMutT65B9uF9Uo/BpJGgpfDUTVak=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=EmLlNz91vs6SxK07DPB0FeRsNdYraG/8/aqHCC+3yK4unyOBTigWl3dgYpL2FkZvP G00341+DXFortwi5A0DRM0oHHJqqoS7g8LSRwclecTKFOR6jCplrCR/SZdrKvwQrep 5phR03LaxkxKfznDtpK311Npqk+LqTE1hfaGrQzheteDVBW+0npQPavVlKK53ZaCZV jUwtCJrepSK/qZzM0l46iOZh5zC7lj4U1Ro+C1FrUtN0yi1ZAX+AUZ4QhVVi/LHgnS 9a/vmDOidRNoehzzXg/ZzBt24/4beQiO6/rnc2z0BUmysAh3pMoMNwdyfrAPJ0MonP a9CIwLBnLrSxw== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 02/21] iommu/dma: Implement link/unlink ranges callbacks Date: Thu, 12 Sep 2024 14:15:37 +0300 Message-ID: X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Leon Romanovsky Add an implementation of link/unlink interface to perform in map/unmap pages in fast patch for pre-allocated IOVA. Signed-off-by: Leon Romanovsky --- drivers/iommu/dma-iommu.c | 86 +++++++++++++++++++++++++++++++++++++++ include/linux/iommu-dma.h | 25 ++++++++++++ 2 files changed, 111 insertions(+) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 09deea2fc86b..72763f76b712 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -1743,6 +1743,92 @@ void iommu_dma_free_iova(struct dma_iova_state *stat= e) &iotlb_gather); } =20 +int iommu_dma_start_range(struct device *dev) +{ + struct iommu_domain *domain =3D iommu_get_dma_domain(dev); + + if (static_branch_unlikely(&iommu_deferred_attach_enabled)) + return iommu_deferred_attach(dev, domain); + + return 0; +} + +void iommu_dma_end_range(struct device *dev) +{ + /* TODO: Factor out ops->iotlb_sync_map(..) call from iommu_map() + * and put it here to provide batched iotlb sync for the range. + */ +} + +dma_addr_t iommu_dma_link_range(struct dma_iova_state *state, phys_addr_t = phys, + size_t size, unsigned long attrs) +{ + struct iommu_domain *domain =3D iommu_get_dma_domain(state->dev); + struct iommu_dma_cookie *cookie =3D domain->iova_cookie; + struct iova_domain *iovad =3D &cookie->iovad; + size_t iova_off =3D iova_offset(iovad, phys); + bool coherent =3D dev_is_dma_coherent(state->dev); + int prot =3D dma_info_to_prot(state->dir, coherent, attrs); + dma_addr_t addr =3D state->addr + state->range_size; + int ret; + + WARN_ON_ONCE(iova_off && state->range_size > 0); + + if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + arch_sync_dma_for_device(phys, size, state->dir); + + size =3D iova_align(iovad, size + iova_off); + ret =3D iommu_map(domain, addr, phys - iova_off, size, prot, GFP_ATOMIC); + if (ret) + return ret; + + state->range_size +=3D size; + return addr + iova_off; +} + +static void iommu_sync_dma_for_cpu(struct iommu_domain *domain, + dma_addr_t start, size_t size, + enum dma_data_direction dir) +{ + size_t sync_size, unmapped =3D 0; + phys_addr_t phys; + + do { + phys =3D iommu_iova_to_phys(domain, start + unmapped); + if (WARN_ON(!phys)) + continue; + + sync_size =3D (unmapped + PAGE_SIZE > size) ? size % PAGE_SIZE : + PAGE_SIZE; + arch_sync_dma_for_cpu(phys, sync_size, dir); + unmapped +=3D sync_size; + } while (unmapped < size); +} + +void iommu_dma_unlink_range(struct device *dev, dma_addr_t start, size_t s= ize, + enum dma_data_direction dir, unsigned long attrs) +{ + struct iommu_domain *domain =3D iommu_get_dma_domain(dev); + struct iommu_dma_cookie *cookie =3D domain->iova_cookie; + struct iova_domain *iovad =3D &cookie->iovad; + struct iommu_iotlb_gather iotlb_gather; + bool coherent =3D dev_is_dma_coherent(dev); + size_t unmapped; + + iommu_iotlb_gather_init(&iotlb_gather); + iotlb_gather.queued =3D READ_ONCE(cookie->fq_domain); + + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) && !coherent) + iommu_sync_dma_for_cpu(domain, start, size, dir); + + size =3D iova_align(iovad, size); + unmapped =3D iommu_unmap_fast(domain, start, size, &iotlb_gather); + WARN_ON(unmapped !=3D size); + + if (!iotlb_gather.queued) + iommu_iotlb_sync(domain, &iotlb_gather); +} + void iommu_setup_dma_ops(struct device *dev) { struct iommu_domain *domain =3D iommu_get_domain_for_dev(dev); diff --git a/include/linux/iommu-dma.h b/include/linux/iommu-dma.h index 698df67b152a..21b0341f52b8 100644 --- a/include/linux/iommu-dma.h +++ b/include/linux/iommu-dma.h @@ -60,6 +60,12 @@ void iommu_dma_sync_sg_for_device(struct device *dev, st= ruct scatterlist *sgl, int iommu_dma_alloc_iova(struct dma_iova_state *state, phys_addr_t phys, size_t size); void iommu_dma_free_iova(struct dma_iova_state *state); +int iommu_dma_start_range(struct device *dev); +void iommu_dma_end_range(struct device *dev); +dma_addr_t iommu_dma_link_range(struct dma_iova_state *state, phys_addr_t = phys, + size_t size, unsigned long attrs); +void iommu_dma_unlink_range(struct device *dev, dma_addr_t start, size_t s= ize, + enum dma_data_direction dir, unsigned long attrs); #else static inline bool use_dma_iommu(struct device *dev) { @@ -184,5 +190,24 @@ static inline int iommu_dma_alloc_iova(struct dma_iova= _state *state, static inline void iommu_dma_free_iova(struct dma_iova_state *state) { } +static inline int iommu_dma_start_range(struct device *dev) +{ + return -EOPNOTSUPP; +} +static inline void iommu_dma_end_range(struct device *dev) +{ +} +static inline dma_addr_t iommu_dma_link_range(struct dma_iova_state *state, + phys_addr_t phys, size_t size, + unsigned long attrs) +{ + return DMA_MAPPING_ERROR; +} +static inline void iommu_dma_unlink_range(struct device *dev, dma_addr_t s= tart, + size_t size, + enum dma_data_direction dir, + unsigned long attrs) +{ +} #endif /* CONFIG_IOMMU_DMA */ #endif /* _LINUX_IOMMU_DMA_H */ --=20 2.46.0 From nobody Sat Nov 30 01:44:58 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C8D061A2631; Thu, 12 Sep 2024 11:16:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139793; cv=none; b=nmtN2JvktCGUa86vLJO7rFwHMwWGK6SqvKqvpC5mdFGPFLaKknPW7iyXpqJYMR+UaBk+qYDIQ8ZLIY0yO57EAXOtDkAwcYBBsY45l3G9lOXsBpK4R9ibcPV3xa+iisle/FX8fhVP7vFLq1GuSeoBoBnYenNmH1ZG2vnYFXOX83Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139793; c=relaxed/simple; bh=UdWHtN/ksXEH4oEZ6oLMvfaCKk3WWuTK2xFlFpaVROQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=EeL2nWOIykG5W1P0+7ZXRkr5hDnwS55lXMwem74B97XfezOyKn88jzZgfJ+c+qZ7tVv0HGfX+UE9gm4SOG9Tw1Pf6jaXmaCx/WwKzVqZtF6QwLwlQZcuq3Z7KtiG6+/R1s0j7KRFVc0fGnEupSFzOPN6HpliQspvL7oa6w6/OPY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=JaEIJPQl; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="JaEIJPQl" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8814DC4CECC; Thu, 12 Sep 2024 11:16:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726139793; bh=UdWHtN/ksXEH4oEZ6oLMvfaCKk3WWuTK2xFlFpaVROQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=JaEIJPQlXW2dDMP6p7hqTalr5/nZfzPDNjt31Ao+a/Z+6Qd6aR11344w2a+DiulS6 iMiK8gTB/L8nSXItJGUYe9cwRmL2GgnozYUV8QmwIPwxlHyW9aemlOw4kUFJ274pAA PN0d31Risc4/iSKKWebUhSD66kfvH9LIyEwYMS4hOAKbYQxJRBmddM3wevSD+t9Lab 3P9ksYF1WRRS9QQhDgWjrPfCaOyxagFMoWzGFxcyzomruL7xcQnhG0yOGfbzjF3fR/ bCmDiQ+O2/NGR32Y9Ac9Uqzw207ZTfmgI/uaIsUSGZlxOlUgCp8lIc5BJSywScgOIC JJAYki7oZADlw== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 03/21] iommu/dma: Add check if IOVA can be used Date: Thu, 12 Sep 2024 14:15:38 +0300 Message-ID: X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Leon Romanovsky This patch adds a check if IOVA can be used for the page and size. Signed-off-by: Leon Romanovsky --- drivers/iommu/dma-iommu.c | 21 +++++++++++++++++++++ drivers/pci/p2pdma.c | 4 ++-- include/linux/dma-map-ops.h | 7 +++++++ include/linux/iommu-dma.h | 7 +++++++ 4 files changed, 37 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 72763f76b712..3e2e382bb502 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -1829,6 +1830,26 @@ void iommu_dma_unlink_range(struct device *dev, dma_= addr_t start, size_t size, iommu_iotlb_sync(domain, &iotlb_gather); } =20 +bool iommu_can_use_iova(struct device *dev, struct page *page, size_t size, + enum dma_data_direction dir) +{ + enum pci_p2pdma_map_type map; + + if (is_swiotlb_force_bounce(dev) || dev_use_swiotlb(dev, size, dir)) + return false; + + /* TODO: Rewrite this check to rely on specific struct page flags */ + if (cc_platform_has(CC_ATTR_MEM_ENCRYPT)) + return false; + + if (page && is_pci_p2pdma_page(page)) { + map =3D pci_p2pdma_map_type(page->pgmap, dev); + return map =3D=3D PCI_P2PDMA_MAP_THRU_HOST_BRIDGE; + } + + return true; +} + void iommu_setup_dma_ops(struct device *dev) { struct iommu_domain *domain =3D iommu_get_domain_for_dev(dev); diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c index 4f47a13cb500..6ceea32bb041 100644 --- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -964,8 +964,8 @@ void pci_p2pmem_publish(struct pci_dev *pdev, bool publ= ish) } EXPORT_SYMBOL_GPL(pci_p2pmem_publish); =20 -static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pg= map, - struct device *dev) +enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap, + struct device *dev) { enum pci_p2pdma_map_type type =3D PCI_P2PDMA_MAP_NOT_SUPPORTED; struct pci_dev *provider =3D to_p2p_pgmap(pgmap)->provider; diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h index 103d9c66c445..936e822e9f40 100644 --- a/include/linux/dma-map-ops.h +++ b/include/linux/dma-map-ops.h @@ -516,6 +516,8 @@ struct pci_p2pdma_map_state { enum pci_p2pdma_map_type pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device *= dev, struct scatterlist *sg); +enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap, + struct device *dev); #else /* CONFIG_PCI_P2PDMA */ static inline enum pci_p2pdma_map_type pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device *= dev, @@ -523,6 +525,11 @@ pci_p2pdma_map_segment(struct pci_p2pdma_map_state *st= ate, struct device *dev, { return PCI_P2PDMA_MAP_NOT_SUPPORTED; } +static inline enum pci_p2pdma_map_type +pci_p2pdma_map_type(struct dev_pagemap *pgmap, struct device *dev) +{ + return PCI_P2PDMA_MAP_NOT_SUPPORTED; +} #endif /* CONFIG_PCI_P2PDMA */ =20 #endif /* _LINUX_DMA_MAP_OPS_H */ diff --git a/include/linux/iommu-dma.h b/include/linux/iommu-dma.h index 21b0341f52b8..561d81b12d9c 100644 --- a/include/linux/iommu-dma.h +++ b/include/linux/iommu-dma.h @@ -66,6 +66,8 @@ dma_addr_t iommu_dma_link_range(struct dma_iova_state *st= ate, phys_addr_t phys, size_t size, unsigned long attrs); void iommu_dma_unlink_range(struct device *dev, dma_addr_t start, size_t s= ize, enum dma_data_direction dir, unsigned long attrs); +bool iommu_can_use_iova(struct device *dev, struct page *page, size_t size, + enum dma_data_direction dir); #else static inline bool use_dma_iommu(struct device *dev) { @@ -209,5 +211,10 @@ static inline void iommu_dma_unlink_range(struct devic= e *dev, dma_addr_t start, unsigned long attrs) { } +static inline bool iommu_can_use_iova(struct device *dev, struct page *pag= e, + size_t size, enum dma_data_direction dir) +{ + return false; +} #endif /* CONFIG_IOMMU_DMA */ #endif /* _LINUX_IOMMU_DMA_H */ --=20 2.46.0 From nobody Sat Nov 30 01:44:58 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 845F01A38F0; Thu, 12 Sep 2024 11:16:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139776; cv=none; b=dVkyfUHAEl0urBaETLSBRQIKrw2rsq0nQrVe8hiuSkZ92WdVkWvSiYvXY8hMV4pqCc2RAKKnmlNcEPFySjLSoNoUIC/KWDh/9RrntE0IXusCfjlexux3ERHew5yKuvKalKG0jMjUU685J/4DudtAz591bGIPvYlqk9+1StJT0M4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139776; c=relaxed/simple; bh=wOAE2cOF8fAnldbeI2zuqZ+UOONgwZZm16iJWxrv2PE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ofkoh+z3YaCBqn726agZjHTSuc3mhdUljK85yAcIRsWzlD/FixZMHPVfiUUvK0zsA/N8Q0s6w6qC43Hv8X0yblKiuZxYz3gqpwBg7Y0yDAHLLdGRa8l/2rBNkpTTEfqdUd1GkSe9TuUfJlW2VckIPgstp/ttWo6o24RQIbQwAcM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=oqy5RZrK; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="oqy5RZrK" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4577FC4CECE; Thu, 12 Sep 2024 11:16:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726139776; bh=wOAE2cOF8fAnldbeI2zuqZ+UOONgwZZm16iJWxrv2PE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=oqy5RZrK9GLnKls1wNDOUPi+7NtU2JS4w8JdmoFHMUazCDp35ubbfNk+eD2Ky3+kD c9dh8XOv6xn8eUdWkXYq/ky1ASXhJwYbPSmlWO7OggXZokzWbb0xc++iuMk1NnstOZ 9G+LadC3Vfaio8VU/li3IVUb1dbRspKdNLjwGfwSKTIlBcbij2YVopnafSpz+igh2w 8q3vtsyRvLIliApnPdT+nobPr5chNkY+1N1mtTKAQdsI0f3f4yiiv78QQi0zQHlzvS 4okcMEZtIu1uoplQgT2/NieLqCfTUhXBYZSEndngVTH5rsuHS4qYvHNtl2e/+MdkFi 0ds6D1OQO+Iuw== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 04/21] dma-mapping: initialize IOVA state struct Date: Thu, 12 Sep 2024 14:15:39 +0300 Message-ID: X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Leon Romanovsky Allow callers to properly initialize the IOVA state struct by providing a new function to do so. This will make sure that even users who doesn't zero their allocated memory will have a valid IOVA state struct. Signed-off-by: Leon Romanovsky --- include/linux/dma-mapping.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index f693aafe221f..285075873077 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -76,6 +76,20 @@ =20 #define DMA_BIT_MASK(n) (((n) =3D=3D 64) ? ~0ULL : ((1ULL<<(n))-1)) =20 +struct dma_iova_state { + struct device *dev; + enum dma_data_direction dir; +}; + +static inline void dma_init_iova_state(struct dma_iova_state *state, + struct device *dev, + enum dma_data_direction dir) +{ + memset(state, 0, sizeof(*state)); + state->dev =3D dev; + state->dir =3D dir; +} + #ifdef CONFIG_DMA_API_DEBUG void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr); void debug_dma_map_single(struct device *dev, const void *addr, --=20 2.46.0 From nobody Sat Nov 30 01:44:58 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ED1581A0BF9; Thu, 12 Sep 2024 11:16:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139781; cv=none; b=UAs98Be6dK8DcpXTaTd8qkYNw8W87i/O+bzFPyHrL2NbqhIMEpnNTNty3BGh6AS4qqW3lxg5zQcaPut7ZA8Xc0j0aUd2oGF/3ocJzAyX5/3aw67yhUe0GtIhtpUymCUKd1cTz451C0g21jdNcHgCg8hFyB+KiFCJT9RWvLAs5bY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139781; c=relaxed/simple; bh=li4T+ci0miq/Ck6TZvJSh35TTfQLVZ3ez/sogkVEAqg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ezlwl840+NjJGGArhzx8Lvs+Y/EQ6Q86dNOMZcGizcqWZsPZuEvizDMpFqUp3mnFn5kK1cuywxO1t+m7kuuIdIKmzs+ZPhrlgBMYklX+VLaIJaamKnuRJopS0jLw+KTQ5PRrDG1t4pjjuxBF90pH/+Gltjvg99Zx32s66B+K9Rc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=DW0EiOBL; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="DW0EiOBL" Received: by smtp.kernel.org (Postfix) with ESMTPSA id ADF5FC4CED2; Thu, 12 Sep 2024 11:16:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726139780; bh=li4T+ci0miq/Ck6TZvJSh35TTfQLVZ3ez/sogkVEAqg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=DW0EiOBLWTdXdcYL7iQeO1KEagJUzhnYtnawH9XAOccSvNYhDqCr0kKYfQM7BvY7U Z/wF8BB04Og3VC/8rARWoLu32FUWjBo1rIeCC7O48BCmC1v7BqOLzPGtwScbzU+jh5 yM16QVL6oJS9fqiYwrvdfeNXuOWuirfNAINa2A9ynEvi8yQyOZcl7Q5AiJF1Psq1wq BlV0gczTOznpsch1Z/NwAxSedcftaMBui/NE1/LqZ1tIhEFQpGA/yfhVm5TfMyPZyn iwopoujIQGMD6QhZFyOHnnS0zH8X4MT/j6CzTSHEFKa//fWWK8/8tzQhXDftpbBb7j 3hFBAtVG640fQ== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 05/21] dma-mapping: provide an interface to allocate IOVA Date: Thu, 12 Sep 2024 14:15:40 +0300 Message-ID: X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Leon Romanovsky Existing .map_page() callback provides two things at the same time: allocates IOVA and links DMA pages. That combination works great for most of the callers who use it in control paths, but less effective in fast paths. These advanced callers already manage their data in some sort of database and can perform IOVA allocation in advance, leaving range linkage operation to be in fast path. Provide an interface to allocate/deallocate IOVA and next patch link/unlink DMA ranges to that specific IOVA. Signed-off-by: Leon Romanovsky --- include/linux/dma-mapping.h | 18 ++++++++++++++++++ kernel/dma/mapping.c | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 53 insertions(+) diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 285075873077..6a51d8e96a9d 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -78,6 +78,8 @@ =20 struct dma_iova_state { struct device *dev; + dma_addr_t addr; + size_t size; enum dma_data_direction dir; }; =20 @@ -115,6 +117,10 @@ static inline int dma_mapping_error(struct device *dev= , dma_addr_t dma_addr) return 0; } =20 +int dma_alloc_iova_unaligned(struct dma_iova_state *state, phys_addr_t phy= s, + size_t size); +void dma_free_iova(struct dma_iova_state *state); + dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page, size_t offset, size_t size, enum dma_data_direction dir, unsigned long attrs); @@ -164,6 +170,14 @@ void dma_vunmap_noncontiguous(struct device *dev, void= *vaddr); int dma_mmap_noncontiguous(struct device *dev, struct vm_area_struct *vma, size_t size, struct sg_table *sgt); #else /* CONFIG_HAS_DMA */ +static inline int dma_alloc_iova_unaligned(struct dma_iova_state *state, + phys_addr_t phys, size_t size) +{ + return -EOPNOTSUPP; +} +static inline void dma_free_iova(struct dma_iova_state *state) +{ +} static inline dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page, size_t offset, size_t size, enum dma_data_direction dir, unsigned long attrs) @@ -370,6 +384,10 @@ static inline bool dma_need_sync(struct device *dev, d= ma_addr_t dma_addr) return false; } #endif /* !CONFIG_HAS_DMA || !CONFIG_DMA_NEED_SYNC */ +static inline int dma_alloc_iova(struct dma_iova_state *state, size_t size) +{ + return dma_alloc_iova_unaligned(state, 0, size); +} =20 struct page *dma_alloc_pages(struct device *dev, size_t size, dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp); diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index fd9ecff8beee..4cd910f27dee 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -951,3 +951,38 @@ unsigned long dma_get_merge_boundary(struct device *de= v) return ops->get_merge_boundary(dev); } EXPORT_SYMBOL_GPL(dma_get_merge_boundary); + +/** + * dma_alloc_iova_unaligned - Allocate an IOVA space + * @state: IOVA state + * @phys: physical address + * @size: IOVA size + * + * Allocate an IOVA space for the given IOVA state and size. The IOVA space + * is allocated to the worst case when whole range is going to be used. + */ +int dma_alloc_iova_unaligned(struct dma_iova_state *state, phys_addr_t phy= s, + size_t size) +{ + if (!use_dma_iommu(state->dev)) + return 0; + + WARN_ON_ONCE(!size); + return iommu_dma_alloc_iova(state, phys, size); +} +EXPORT_SYMBOL_GPL(dma_alloc_iova_unaligned); + +/** + * dma_free_iova - Free an IOVA space + * @state: IOVA state + * + * Free an IOVA space for the given IOVA attributes. + */ +void dma_free_iova(struct dma_iova_state *state) +{ + if (!use_dma_iommu(state->dev)) + return; + + iommu_dma_free_iova(state); +} +EXPORT_SYMBOL_GPL(dma_free_iova); --=20 2.46.0 From nobody Sat Nov 30 01:44:58 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 32C281A0BF9; Thu, 12 Sep 2024 11:16:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139785; cv=none; b=lNLmUZ8NncYya2p7ZYfAZM0KcR+UXCEFCV0xYvMd5tuuQxod4gauySNj+DYoqBmZbMR+0+JW7m2yRSppmmd3PVNmkwcyGP3B/VLeCHnTqgqgoheNQ3ybxJ68UeuILL5AJ4GvqKlZMW4oO7hJm6069dQLMcZ0GTg1NT5Sr8KpePg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139785; c=relaxed/simple; bh=VvpomHjn0PXFDu7e/jW7/gfUbs21DN2RjkYJAyY9I0w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XyN7rdcTKaJCMSu9/EoJMU1DB848dt/h273H61x7oAUT4V9naf2INHRkcXPM0nQFyBWH7wt4ilaPk9C7WxmEDXonJcU3Y93UkopXFKfShDmra0afvcsqCZbcyk+o21wLfP6ttJ+3f5EP7AhGHkiUB3u6b2yJezZP00TiY+HaNJE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=aLB45VxS; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="aLB45VxS" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 499D0C4CECF; Thu, 12 Sep 2024 11:16:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726139785; bh=VvpomHjn0PXFDu7e/jW7/gfUbs21DN2RjkYJAyY9I0w=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=aLB45VxSAnSX1Omnd4tNbgxYwGvlO0vlHzxS5XRXs3Jf+H6Sv4g2apXm3/BjzYZcU jJat8T1BsFMxKIbvmMGDI3d2Wav68reJ0E3LN7sVO+ms1PwvbS5VCjz/Zp+6dn0u38 Hjyfj6EN+qrGYfmm8UXj6bzSoYbXvDIEBCuKATHLVp+eKVznmpDYlgEn6mEfIV25ac CluAye/X57cQXU1+3AFoTAvO2BnVfHu3mTsIsvPd7Y8II6FIRfLqDJj5jqYb3qQCuB HoFkBLlicit7oilmLRe8VOlKBnHex2vD837rA3akUkz25QskU2jzMySkDTK/0QW30m CBd4EkuZoW2Aw== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 06/21] dma-mapping: set and query DMA IOVA state Date: Thu, 12 Sep 2024 14:15:41 +0300 Message-ID: <818f2fbdb80f07297ca2abe5d04443d3b665f445.1726138681.git.leon@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Leon Romanovsky Provide an option to query and set if IOMMU path can be taken. Callers who supply range of pages can perform it only once as the whole range is supposed to have same memory type. Signed-off-by: Leon Romanovsky --- include/linux/dma-mapping.h | 12 ++++++++++++ kernel/dma/mapping.c | 38 +++++++++++++++++++++++++++++++++++++ 2 files changed, 50 insertions(+) diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 6a51d8e96a9d..2c74e68b0567 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -81,6 +81,7 @@ struct dma_iova_state { dma_addr_t addr; size_t size; enum dma_data_direction dir; + u8 use_iova : 1; }; =20 static inline void dma_init_iova_state(struct dma_iova_state *state, @@ -169,6 +170,9 @@ void *dma_vmap_noncontiguous(struct device *dev, size_t= size, void dma_vunmap_noncontiguous(struct device *dev, void *vaddr); int dma_mmap_noncontiguous(struct device *dev, struct vm_area_struct *vma, size_t size, struct sg_table *sgt); +void dma_set_iova_state(struct dma_iova_state *state, struct page *page, + size_t size); +bool dma_can_use_iova(struct dma_iova_state *state); #else /* CONFIG_HAS_DMA */ static inline int dma_alloc_iova_unaligned(struct dma_iova_state *state, phys_addr_t phys, size_t size) @@ -307,6 +311,14 @@ static inline int dma_mmap_noncontiguous(struct device= *dev, { return -EINVAL; } +static inline void dma_set_iova_state(struct dma_iova_state *state, + struct page *page, size_t size) +{ +} +static inline bool dma_can_use_iova(struct dma_iova_state *state) +{ + return false; +} #endif /* CONFIG_HAS_DMA */ =20 #if defined(CONFIG_HAS_DMA) && defined(CONFIG_DMA_NEED_SYNC) diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index 4cd910f27dee..16cb03d5d87d 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -6,6 +6,7 @@ * Copyright (c) 2006 Tejun Heo */ #include /* for max_pfn */ +#include #include #include #include @@ -15,6 +16,7 @@ #include #include #include +#include #include "debug.h" #include "direct.h" =20 @@ -986,3 +988,39 @@ void dma_free_iova(struct dma_iova_state *state) iommu_dma_free_iova(state); } EXPORT_SYMBOL_GPL(dma_free_iova); + +/** + * dma_set_iova_state - Set the IOVA state for the given page and size + * @state: IOVA state + * @page: page to check + * @size: size of the page + * + * Set the IOVA state for the given page and size. The IOVA state is set + * based on the device and the page. + */ +void dma_set_iova_state(struct dma_iova_state *state, struct page *page, + size_t size) +{ + if (!use_dma_iommu(state->dev)) + return; + + state->use_iova =3D iommu_can_use_iova(state->dev, page, size, state->dir= ); +} +EXPORT_SYMBOL_GPL(dma_set_iova_state); + +/** + * dma_can_use_iova - check if the device type is valid + * and won't take SWIOTLB path + * @state: IOVA state + * + * Return %true if the device should use swiotlb for the given buffer, else + * %false. + */ +bool dma_can_use_iova(struct dma_iova_state *state) +{ + if (!use_dma_iommu(state->dev)) + return false; + + return state->use_iova; +} +EXPORT_SYMBOL_GPL(dma_can_use_iova); --=20 2.46.0 From nobody Sat Nov 30 01:44:58 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BDAC11A4E8F; Thu, 12 Sep 2024 11:16:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139789; cv=none; b=Dxh1wt+UmeFWqiN8+Uh/n90DPlzlek6ecZi0ZZGqjGCjX2r1BuynUajDW22tZQNJXAFmg3dLXYThFYead4lg17ifv06GpoHoM69tLMMqxTd3c11P4CYdH6G46rP50IE1GPdFfcYwI1A9S9wGyAWrfwtgKM0hM/5OSICKGV4Z8zs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139789; c=relaxed/simple; bh=YbYR1B9oX34Xv+Cehx9/tYc07da7ONVOmPeGXmHRn1c=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Qta8zsy87l0hJ7KQq/SyFQDFKU8kXDUExesB78iCQ0LbDZ5JfezBGIOtSEZLV1iHDNUzeRGvgdaVcaDc/L0daHyjzrpEWDJGQRM+/e4hOwLn4P2/tTb3udxBURn5AyboEV2i10bReEahc63toADl0QsIvbkHeXPT5Rr5FuI2mbs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=fMCA+j/u; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="fMCA+j/u" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 89AE0C4CECC; Thu, 12 Sep 2024 11:16:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726139789; bh=YbYR1B9oX34Xv+Cehx9/tYc07da7ONVOmPeGXmHRn1c=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=fMCA+j/uOJoozgNYTkAQZ9i5owgnrutNMDDvBGalOj/zk/VYWCdYuZ4BwEUgbo+Ee 8IK3/oKPT7MWMTtxPtDL2tX4K+v6ORqHmqpoXabXDx37Z0XKm5aVZRF6/WCoJ9ITzO y2gg2RxijzVwlXxmRZdMVaHnbiANFg06GCTCeNRhNlUjPZ4StdmFz7Mfcxa8G1XV1K slWv9Nbw6MJs1cRNGwbW5Pi2YtCX+5q7wtccX+oU6he+cAY4vrHTPY4NBWkdGZdWz5 rmSAiwxnbgN6fVd3O1RZFGWZ5pQ/Skqfv2XfNFsGj0rdVOoX/QhfA/JxO+QYtRiKMd e8hcOh4m/hvyg== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 07/21] dma-mapping: implement link range API Date: Thu, 12 Sep 2024 14:15:42 +0300 Message-ID: X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Leon Romanovsky Introduce new DMA APIs to perform DMA linkage of buffers in layers higher than DMA. In proposed API, the callers will perform the following steps: dma_alloc_iova() if (dma_can_use_iova(...)) dma_start_range(...) for (page in range) dma_link_range(...) dma_end_range(...) else /* Fallback to legacy map pages */ dma_map_page(...) Signed-off-by: Leon Romanovsky --- include/linux/dma-mapping.h | 26 ++++++++++++++++ kernel/dma/mapping.c | 60 +++++++++++++++++++++++++++++++++++++ 2 files changed, 86 insertions(+) diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 2c74e68b0567..bb541f8944e5 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -11,6 +11,7 @@ #include #include #include +#include =20 /** * List of possible attributes associated with a DMA mapping. The semantics @@ -82,6 +83,7 @@ struct dma_iova_state { size_t size; enum dma_data_direction dir; u8 use_iova : 1; + size_t range_size; }; =20 static inline void dma_init_iova_state(struct dma_iova_state *state, @@ -173,6 +175,11 @@ int dma_mmap_noncontiguous(struct device *dev, struct = vm_area_struct *vma, void dma_set_iova_state(struct dma_iova_state *state, struct page *page, size_t size); bool dma_can_use_iova(struct dma_iova_state *state); +int dma_start_range(struct dma_iova_state *state); +void dma_end_range(struct dma_iova_state *state); +dma_addr_t dma_link_range_attrs(struct dma_iova_state *state, phys_addr_t = phys, + size_t size, unsigned long attrs); +void dma_unlink_range_attrs(struct dma_iova_state *state, unsigned long at= trs); #else /* CONFIG_HAS_DMA */ static inline int dma_alloc_iova_unaligned(struct dma_iova_state *state, phys_addr_t phys, size_t size) @@ -319,6 +326,23 @@ static inline bool dma_can_use_iova(struct dma_iova_st= ate *state) { return false; } +static inline int dma_start_range(struct dma_iova_state *state) +{ + return -EOPNOTSUPP; +} +static inline void dma_end_range(struct dma_iova_state *state) +{ +} +static inline dma_addr_t dma_link_range_attrs(struct dma_iova_state *state, + phys_addr_t phys, size_t size, + unsigned long attrs) +{ + return DMA_MAPPING_ERROR; +} +static inline void dma_unlink_range_attrs(struct dma_iova_state *state, + unsigned long attrs) +{ +} #endif /* CONFIG_HAS_DMA */ =20 #if defined(CONFIG_HAS_DMA) && defined(CONFIG_DMA_NEED_SYNC) @@ -513,6 +537,8 @@ static inline void dma_sync_sgtable_for_device(struct d= evice *dev, #define dma_unmap_page(d, a, s, r) dma_unmap_page_attrs(d, a, s, r, 0) #define dma_get_sgtable(d, t, v, h, s) dma_get_sgtable_attrs(d, t, v, h, s= , 0) #define dma_mmap_coherent(d, v, c, h, s) dma_mmap_attrs(d, v, c, h, s, 0) +#define dma_link_range(d, p, o) dma_link_range_attrs(d, p, o, 0) +#define dma_unlink_range(d) dma_unlink_range_attrs(d, 0) =20 bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size); =20 diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index 16cb03d5d87d..39fac8c21643 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -1024,3 +1024,63 @@ bool dma_can_use_iova(struct dma_iova_state *state) return state->use_iova; } EXPORT_SYMBOL_GPL(dma_can_use_iova); + +/** + * dma_start_range - Start a range of IOVA space + * @state: IOVA state + * + * Start a range of IOVA space for the given IOVA state. + */ +int dma_start_range(struct dma_iova_state *state) +{ + if (!state->use_iova) + return 0; + + return iommu_dma_start_range(state->dev); +} +EXPORT_SYMBOL_GPL(dma_start_range); + +/** + * dma_end_range - End a range of IOVA space + * @state: IOVA state + * + * End a range of IOVA space for the given IOVA state. + */ +void dma_end_range(struct dma_iova_state *state) +{ + if (!state->use_iova) + return; + + iommu_dma_end_range(state->dev); +} +EXPORT_SYMBOL_GPL(dma_end_range); + +/** + * dma_link_range_attrs - Link a range of IOVA space + * @state: IOVA state + * @phys: physical address to link + * @size: size of the buffer + * @attrs: attributes of mapping properties + * + * Link a range of IOVA space for the given IOVA state. + */ +dma_addr_t dma_link_range_attrs(struct dma_iova_state *state, phys_addr_t = phys, + size_t size, unsigned long attrs) +{ + return iommu_dma_link_range(state, phys, size, attrs); +} +EXPORT_SYMBOL_GPL(dma_link_range_attrs); + +/** + * dma_unlink_range_attrs - Unlink a range of IOVA space + * @state: IOVA state + * @attrs: attributes of mapping properties + * + * Unlink a range of IOVA space for the given IOVA state. + */ +void dma_unlink_range_attrs(struct dma_iova_state *state, unsigned long at= trs) +{ + iommu_dma_unlink_range(state->dev, state->addr, state->range_size, + state->dir, attrs); +} +EXPORT_SYMBOL_GPL(dma_unlink_range_attrs); --=20 2.46.0 From nobody Sat Nov 30 01:44:58 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0A07C1A01BF; Thu, 12 Sep 2024 11:16:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139814; cv=none; b=mqM/AAs1Lr9hYS3630KJHxb+w6qTruhKPjeyHVW7a8E7z7siWwrNig6D+8i5651WNvjfIcY++oy1De/IEv/1yRRoDix9tLOK9K7KSP8Rok1Pd2aX1vNmjzBnZ+yA2e7mxjPa1A/c+HcbvZ4siEJK6YXu1IBBM4dUxUYgV8OHKhM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139814; c=relaxed/simple; bh=yM7hzIxBdo84p+8J0W8C8RyNqM4t4h1sCQMfoWGuons=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dqVnewrnSsZJKBYDg3p1UeZYp0MZ6FiiYzvtp+hg5tsk+cyLFZaCCU82zeUHOCBg5gYfzcvRhbmOb5ttbU+BMNMGOOrOYvXYuof9e15r6wpEZrL9Ffc2/DC0qobLICvuQ2fyQ7lr8wIAcHLqNncgnGG/OZF+Cxy5awphy61/Fx0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=KcO5hqoq; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="KcO5hqoq" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D69B4C4CEC5; Thu, 12 Sep 2024 11:16:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726139813; bh=yM7hzIxBdo84p+8J0W8C8RyNqM4t4h1sCQMfoWGuons=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=KcO5hqoqU35uJkwTk2yXTFB4Zx39KtK7hLnT6ftctxrL1AUdcl2c9tnGfUA2Fvw2M r5iFRNfnVbZv6yTQ48Tjd/51FH0f19FXC8BtccsPCy7E0m8ApLxF1G5F76sHNMTt08 fF3pMkKWUGBB039tUpB9ZsNe7dvQ2urhvG1SBgra3+HegAm5UPpxSCG/xYfD9p+slm iQ2atj875hpQG5yGAlNljzrlOsJXskm97Rsc6suFUTq9OBYSlV8Yzgx6F9X2SySLHy OzJA/eLp51lYQsqHxESmhnqv+PE06l4lHYcZq7dAQQz89lRv1f943RJZNMgpUqV/6n axJKYYHuNlJuw== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 08/21] mm/hmm: let users to tag specific PFN with DMA mapped bit Date: Thu, 12 Sep 2024 14:15:43 +0300 Message-ID: <3c68ab13bcabe908c35388c66bf38a43f5d68c8b.1726138681.git.leon@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Leon Romanovsky Introduce new sticky flag (HMM_PFN_DMA_MAPPED), which isn't overwritten by HMM range fault. Such flag allows users to tag specific PFNs with inform= ation if this specific PFN was already DMA mapped. Signed-off-by: Leon Romanovsky --- include/linux/hmm.h | 4 ++++ mm/hmm.c | 34 +++++++++++++++++++++------------- 2 files changed, 25 insertions(+), 13 deletions(-) diff --git a/include/linux/hmm.h b/include/linux/hmm.h index 126a36571667..2999697db83a 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -23,6 +23,8 @@ struct mmu_interval_notifier; * HMM_PFN_WRITE - if the page memory can be written to (requires HMM_PFN_= VALID) * HMM_PFN_ERROR - accessing the pfn is impossible and the device should * fail. ie poisoned memory, special pages, no vma, etc + * HMM_PFN_DMA_MAPPED - Flag preserved on input-to-output transformation + * to mark that page is already DMA mapped * * On input: * 0 - Return the current state of the page, do not fault = it. @@ -36,6 +38,8 @@ enum hmm_pfn_flags { HMM_PFN_VALID =3D 1UL << (BITS_PER_LONG - 1), HMM_PFN_WRITE =3D 1UL << (BITS_PER_LONG - 2), HMM_PFN_ERROR =3D 1UL << (BITS_PER_LONG - 3), + /* Sticky lag, carried from Input to Output */ + HMM_PFN_DMA_MAPPED =3D 1UL << (BITS_PER_LONG - 7), HMM_PFN_ORDER_SHIFT =3D (BITS_PER_LONG - 8), =20 /* Input flags */ diff --git a/mm/hmm.c b/mm/hmm.c index 7e0229ae4a5a..2a0c34d7cb2b 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -44,8 +44,10 @@ static int hmm_pfns_fill(unsigned long addr, unsigned lo= ng end, { unsigned long i =3D (addr - range->start) >> PAGE_SHIFT; =20 - for (; addr < end; addr +=3D PAGE_SIZE, i++) - range->hmm_pfns[i] =3D cpu_flags; + for (; addr < end; addr +=3D PAGE_SIZE, i++) { + range->hmm_pfns[i] &=3D HMM_PFN_DMA_MAPPED; + range->hmm_pfns[i] |=3D cpu_flags; + } return 0; } =20 @@ -202,8 +204,10 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk, un= signed long addr, return hmm_vma_fault(addr, end, required_fault, walk); =20 pfn =3D pmd_pfn(pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT); - for (i =3D 0; addr < end; addr +=3D PAGE_SIZE, i++, pfn++) - hmm_pfns[i] =3D pfn | cpu_flags; + for (i =3D 0; addr < end; addr +=3D PAGE_SIZE, i++, pfn++) { + hmm_pfns[i] &=3D HMM_PFN_DMA_MAPPED; + hmm_pfns[i] |=3D pfn | cpu_flags; + } return 0; } #else /* CONFIG_TRANSPARENT_HUGEPAGE */ @@ -236,7 +240,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, uns= igned long addr, hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, 0); if (required_fault) goto fault; - *hmm_pfn =3D 0; + *hmm_pfn =3D *hmm_pfn & HMM_PFN_DMA_MAPPED; return 0; } =20 @@ -253,14 +257,14 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, u= nsigned long addr, cpu_flags =3D HMM_PFN_VALID; if (is_writable_device_private_entry(entry)) cpu_flags |=3D HMM_PFN_WRITE; - *hmm_pfn =3D swp_offset_pfn(entry) | cpu_flags; + *hmm_pfn =3D (*hmm_pfn & HMM_PFN_DMA_MAPPED) | swp_offset_pfn(entry) | = cpu_flags; return 0; } =20 required_fault =3D hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, 0); if (!required_fault) { - *hmm_pfn =3D 0; + *hmm_pfn =3D *hmm_pfn & HMM_PFN_DMA_MAPPED; return 0; } =20 @@ -304,11 +308,11 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, u= nsigned long addr, pte_unmap(ptep); return -EFAULT; } - *hmm_pfn =3D HMM_PFN_ERROR; + *hmm_pfn =3D (*hmm_pfn & HMM_PFN_DMA_MAPPED) | HMM_PFN_ERROR; return 0; } =20 - *hmm_pfn =3D pte_pfn(pte) | cpu_flags; + *hmm_pfn =3D (*hmm_pfn & HMM_PFN_DMA_MAPPED) | pte_pfn(pte) | cpu_flags; return 0; =20 fault: @@ -448,8 +452,10 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long= start, unsigned long end, } =20 pfn =3D pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT); - for (i =3D 0; i < npages; ++i, ++pfn) - hmm_pfns[i] =3D pfn | cpu_flags; + for (i =3D 0; i < npages; ++i, ++pfn) { + hmm_pfns[i] &=3D HMM_PFN_DMA_MAPPED; + hmm_pfns[i] |=3D pfn | cpu_flags; + } goto out_unlock; } =20 @@ -507,8 +513,10 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsi= gned long hmask, } =20 pfn =3D pte_pfn(entry) + ((start & ~hmask) >> PAGE_SHIFT); - for (; addr < end; addr +=3D PAGE_SIZE, i++, pfn++) - range->hmm_pfns[i] =3D pfn | cpu_flags; + for (; addr < end; addr +=3D PAGE_SIZE, i++, pfn++) { + range->hmm_pfns[i] &=3D HMM_PFN_DMA_MAPPED; + range->hmm_pfns[i] |=3D pfn | cpu_flags; + } =20 spin_unlock(ptl); return 0; --=20 2.46.0 From nobody Sat Nov 30 01:44:58 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D21291A2631; Thu, 12 Sep 2024 11:16:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139798; cv=none; b=oRyZ0hOsYFT7Ny9R1+hOJ9LYLnYS0jgblJgikJFX7VERaxaYc/Ny3lc3JFqYVtgX0bJhmcNy+RSZsLhcJv4AufiYUQquQ+1Oy8SkvDtDvXSMfvM8WDAcNtuOeswsGb1TorM5HKYugnh67RsOcxcrXuwA8hC7pWrMKFZCDnMNI28= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139798; c=relaxed/simple; bh=HCbYGqX249833/tdiRDBvFBjeuSolO9rH8wFEsCHmuw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=YdKnbLxAeyLXcSNF88gJNauvqRfctOqXiDQIAn8xPTBIGDTijzq4lztCRLb38JmfdPyH9sXcrbOz5MCm0sbhqyM9ziSjnhkjzyXllDLUHrk8qPk9fzecsBjhy9U9spgeyrM+1c3jQpEzggquwKg8Hhtk1YVGr/LGpbuNlWgVO18= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=utilRRP5; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="utilRRP5" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 95168C4CEC5; Thu, 12 Sep 2024 11:16:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726139797; bh=HCbYGqX249833/tdiRDBvFBjeuSolO9rH8wFEsCHmuw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=utilRRP5yDc5niZQdTcTGnzCnlrPS0S4L3Hb9vq/XwpoQFI0AVDbejOe0Qo42RRZR KkYQDJw1ZBH1+GhREn9IO3Aft1IbL0WfEet5f0emPRe8iuqT8w2AFuYT9rawGzAivh cQxvsrrXrX2aswijqPZVgD9EpT7cf20gG1iWYXmqsE07Ej+8c8FKOF4tQuWco24nd1 2lGYnudAW1yc+NprinXcSsQXcnqPmKubTJyZY5LKaiOnVKB5gOk1fdSCFyV62ullq/ IqZxh1M1CLXFDX8nzGm4+Sy2BP2yjU9370wTj3+xV+5K9S+8gyJXtfhxEhlqM50c+P tzPreaeQDm/dA== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 09/21] dma-mapping: provide callbacks to link/unlink HMM PFNs to specific IOVA Date: Thu, 12 Sep 2024 14:15:44 +0300 Message-ID: X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Leon Romanovsky Introduce new DMA link/unlink API to provide a way for HMM users to link pages to already preallocated IOVA. Signed-off-by: Leon Romanovsky --- include/linux/dma-mapping.h | 15 ++++++ kernel/dma/mapping.c | 102 ++++++++++++++++++++++++++++++++++++ 2 files changed, 117 insertions(+) diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index bb541f8944e5..8c2a468c5420 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -123,6 +123,10 @@ static inline int dma_mapping_error(struct device *dev= , dma_addr_t dma_addr) int dma_alloc_iova_unaligned(struct dma_iova_state *state, phys_addr_t phy= s, size_t size); void dma_free_iova(struct dma_iova_state *state); +dma_addr_t dma_hmm_link_page(struct dma_iova_state *state, unsigned long *= pfn, + dma_addr_t dma_offset); +void dma_hmm_unlink_page(struct dma_iova_state *state, unsigned long *pfn, + dma_addr_t dma_offset); =20 dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page, size_t offset, size_t size, enum dma_data_direction dir, @@ -189,6 +193,17 @@ static inline int dma_alloc_iova_unaligned(struct dma_= iova_state *state, static inline void dma_free_iova(struct dma_iova_state *state) { } +static inline dma_addr_t dma_hmm_link_page(struct dma_iova_state *state, + unsigned long *pfn, + dma_addr_t dma_offset) +{ + return DMA_MAPPING_ERROR; +} +static inline void dma_hmm_unlink_page(struct dma_iova_state *state, + unsigned long *pfn, + dma_addr_t dma_offset) +{ +} static inline dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page, size_t offset, size_t size, enum dma_data_direction dir, unsigned long attrs) diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index 39fac8c21643..5354ddc3ac03 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -17,6 +17,7 @@ #include #include #include +#include #include "debug.h" #include "direct.h" =20 @@ -1084,3 +1085,104 @@ void dma_unlink_range_attrs(struct dma_iova_state *= state, unsigned long attrs) state->dir, attrs); } EXPORT_SYMBOL_GPL(dma_unlink_range_attrs); + +/** + * dma_hmm_link_page - Link a physical HMM page to DMA address + * @state: IOVA state + * @pfn: HMM PFN + * @dma_offset: DMA offset form which this page needs to be linked + * + * dma_alloc_iova() allocates IOVA based on the size specified by their us= e in + * iova->size. Call this function after IOVA allocation to link whole @page + * to get the DMA address. Note that very first call to this function + * will have @dma_offset set to 0 in the IOVA space allocated from + * dma_alloc_iova(). For subsequent calls to this function on same @iova, + * @dma_offset needs to be advanced by the caller with the size of previous + * page that was linked + DMA address returned for the previous page that = was + * linked by this function. + */ +dma_addr_t dma_hmm_link_page(struct dma_iova_state *state, unsigned long *= pfn, + dma_addr_t dma_offset) +{ + struct device *dev =3D state->dev; + struct page *page =3D hmm_pfn_to_page(*pfn); + phys_addr_t phys =3D page_to_phys(page); + bool coherent =3D dev_is_dma_coherent(dev); + dma_addr_t addr; + int ret; + + if (*pfn & HMM_PFN_DMA_MAPPED) + /* + * We are in this flow when there is a need to resync flags, + * for example when page was already linked in prefetch call + * with READ flag and now we need to add WRITE flag + * + * This page was already programmed to HW and we don't want/need + * to unlink and link it again just to resync flags. + * + * The DMA address calculation below is based on the fact that + * HMM doesn't work with swiotlb. + */ + return (state->addr) ? state->addr + dma_offset : + phys_to_dma(dev, phys); + + state->range_size =3D dma_offset; + + /* + * The below check is based on assumption that HMM range users + * don't work with swiotlb and hence can be or in direct mode + * or in IOMMU mode. + */ + if (!use_dma_iommu(dev)) { + if (!coherent) + arch_sync_dma_for_device(phys, PAGE_SIZE, state->dir); + + addr =3D phys_to_dma(dev, phys); + goto done; + } + + ret =3D dma_start_range(state); + if (ret) + return DMA_MAPPING_ERROR; + + addr =3D dma_link_range(state, phys, PAGE_SIZE); + dma_end_range(state); + if (dma_mapping_error(state->dev, addr)) + return addr; + +done: + kmsan_handle_dma(page, 0, PAGE_SIZE, state->dir); + *pfn |=3D HMM_PFN_DMA_MAPPED; + return addr; +} +EXPORT_SYMBOL_GPL(dma_hmm_link_page); + +/** + * dma_hmm_unlink_page - Unlink a physical HMM page from DMA address + * @state: IOVA state + * @pfn: HMM PFN + * @dma_offset: DMA offset form which this page needs to be unlinked + * from the IOVA space + */ +void dma_hmm_unlink_page(struct dma_iova_state *state, unsigned long *pfn, + dma_addr_t dma_offset) +{ + struct device *dev =3D state->dev; + struct page *page; + phys_addr_t phys; + + *pfn &=3D ~HMM_PFN_DMA_MAPPED; + + if (!use_dma_iommu(dev)) { + page =3D hmm_pfn_to_page(*pfn); + phys =3D page_to_phys(page); + + dma_direct_sync_single_for_cpu(dev, phys_to_dma(dev, phys), + PAGE_SIZE, state->dir); + return; + } + + iommu_dma_unlink_range(dev, state->addr + dma_offset, PAGE_SIZE, + state->dir, 0); +} +EXPORT_SYMBOL_GPL(dma_hmm_unlink_page); --=20 2.46.0 From nobody Sat Nov 30 01:44:58 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D4F9F1A2631; Thu, 12 Sep 2024 11:16:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139802; cv=none; b=okIAiniDGX56RdHfmAdwk+nYzcS1+76y8MF8eudPHaw7SkKlPg8sNw0DSWPSsbyN/RodadXqBksqaXBqtxDDdYHtXaw3B7IXAna9R9ETHoTCfyYhTbMe7+2eoPMW7DrgWm/y4HBXguR3mrOXIEN/nqmef6kTv93Xp/upn/tyUTc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139802; c=relaxed/simple; bh=cWTZ/vX5KH29DDePm3J0/PIuPOYHHrOmAmy/4Xsjr4k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SDMbo4ysxaMpwhYinqfduP+y08og3OZb1QYGli6kY9/vpYy23nQbj9ADmIr9mcYmxoSiglBrKDJulHMvDq2xLzS0MfsqECbofDnkE6QAyLTCkU0KwJBYaZGUwqvqq+uf8NOzuzfaKt3QnO91VNnCwO7//T7MGyuwPb0Kgm9weIU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=MaNgzGuY; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="MaNgzGuY" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A155AC4CEC3; Thu, 12 Sep 2024 11:16:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726139801; bh=cWTZ/vX5KH29DDePm3J0/PIuPOYHHrOmAmy/4Xsjr4k=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=MaNgzGuYutMZ4S+zhyYnCR1nQ7/cVUpL4cqJdHGuAEl1wmik2EzqsRYrKwjzPSgGy sfidRPgO4zHN72I0olnabe+xMFut7ZSLfoMfElMezA5aKgxLkIlLNFrWStvp5nrBQe WIU+RNQrJtth1GJ+9XevvCvmshuuvPMj21JSNCgv1OQQ6sBN2NPFmAViYUr2n9nfgn dbjpF0kXfzUGO92UMShptYWEvda0Nuu3fgD16HzX7heBADacYrIS4+3hYZbE1tbiNB QNt5RqCvZK/+hGQmCeSmU9XwJusDixnsAIA3mt2lH8BGiBECYIR+WZoJ5CVwVtemKh 8zM6C899KSD3A== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 10/21] RDMA/umem: Preallocate and cache IOVA for UMEM ODP Date: Thu, 12 Sep 2024 14:15:45 +0300 Message-ID: X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Leon Romanovsky As a preparation to provide two step interface to map pages, preallocate IOVA when UMEM is initialized. Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/umem_odp.c | 13 ++++++++++++- include/rdma/ib_umem_odp.h | 1 + 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/u= mem_odp.c index e9fa22d31c23..01cbf7f55b3a 100644 --- a/drivers/infiniband/core/umem_odp.c +++ b/drivers/infiniband/core/umem_odp.c @@ -50,6 +50,7 @@ static inline int ib_init_umem_odp(struct ib_umem_odp *umem_odp, const struct mmu_interval_notifier_ops *ops) { + struct ib_device *dev =3D umem_odp->umem.ibdev; int ret; =20 umem_odp->umem.is_odp =3D 1; @@ -87,15 +88,24 @@ static inline int ib_init_umem_odp(struct ib_umem_odp *= umem_odp, goto out_pfn_list; } =20 + dma_init_iova_state(&umem_odp->state, dev->dma_device, + DMA_BIDIRECTIONAL); + ret =3D dma_alloc_iova(&umem_odp->state, end - start); + if (ret) + goto out_dma_list; + + ret =3D mmu_interval_notifier_insert(&umem_odp->notifier, umem_odp->umem.owning_mm, start, end - start, ops); if (ret) - goto out_dma_list; + goto out_free_iova; } =20 return 0; =20 +out_free_iova: + dma_free_iova(&umem_odp->state); out_dma_list: kvfree(umem_odp->dma_list); out_pfn_list: @@ -274,6 +284,7 @@ void ib_umem_odp_release(struct ib_umem_odp *umem_odp) ib_umem_end(umem_odp)); mutex_unlock(&umem_odp->umem_mutex); mmu_interval_notifier_remove(&umem_odp->notifier); + dma_free_iova(&umem_odp->state); kvfree(umem_odp->dma_list); kvfree(umem_odp->pfn_list); } diff --git a/include/rdma/ib_umem_odp.h b/include/rdma/ib_umem_odp.h index 0844c1d05ac6..c0c1215925eb 100644 --- a/include/rdma/ib_umem_odp.h +++ b/include/rdma/ib_umem_odp.h @@ -23,6 +23,7 @@ struct ib_umem_odp { * See ODP_READ_ALLOWED_BIT and ODP_WRITE_ALLOWED_BIT. */ dma_addr_t *dma_list; + struct dma_iova_state state; /* * The umem_mutex protects the page_list and dma_list fields of an ODP * umem, allowing only a single thread to map/unmap pages. The mutex --=20 2.46.0 From nobody Sat Nov 30 01:44:58 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CE8531A2657; Thu, 12 Sep 2024 11:16:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139805; cv=none; b=RsExm1ONNi+QRzhtrwJNTtEL7PAXSnbkMij+6zU1x3ze8qvKXQf9YyZrhrslDVNCE1r87r8uV1G34pMf0bL1t+4ynHnGH5kKk04gzslrSNlXGtgH/pHQa2YXeMKNjzHumL/6QEiDb84YXfszVEbkm5N6W83x9Ymkvrw7eBmLSFE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139805; c=relaxed/simple; bh=XdvLFLSs77dfdKFwvoI+1QDjheRjvPIMY5q3duojPU4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PH5tIQB6tFuHI2DxKlT5gwlnpC8wG3hB1g5yThFC5sJMVhw0isXM529Fsrvy8O8v9ggHXpkgrvuLwPxRghxDQYDSwFpRkFZdi3f8xALKw1PrugDE2erpgskoWBzWfJs2KS7K4FHZdPKKFvXioAq/4XSLYj+qeHhejWiXGAlHAZg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=NgzO9Cd/; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="NgzO9Cd/" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A435DC4CEC3; Thu, 12 Sep 2024 11:16:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726139805; bh=XdvLFLSs77dfdKFwvoI+1QDjheRjvPIMY5q3duojPU4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=NgzO9Cd/pEYl3RXBSH3KvVWH5oLj+mgwuiuvlEuWz/LdVeoxrVjxHPYN+1RgtJ5vx uYJjvq15M3diYUAp46bypI/thcHEx+QtEexV40Maopkpc7NH/pQObOfLTMM9sedXTe HyCCIZaumupuZzn/90Sj4asgy/V2CGMFpJPauJAwyc4WeCzaSYZigiZbhfgSaAIuNl AltJNiTekFARdZbPRH/+4tfrcUAYzA0Buzi7se4BfNiYVdO4ozUObZevqOncwAYEYE P94t+8sP2sgWdP4i0Jt7pKccaJG3IY0H5c0QUNwnV7gocMrpLWBr+P/djT3WWKtrhz 5mKfa6eJbE7lA== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 11/21] RDMA/umem: Store ODP access mask information in PFN Date: Thu, 12 Sep 2024 14:15:46 +0300 Message-ID: X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Leon Romanovsky As a preparation to remove of dma_list, store access mask in PFN pointer and not in dma_addr_t. Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/umem_odp.c | 98 +++++++++++----------------- drivers/infiniband/hw/mlx5/mlx5_ib.h | 1 + drivers/infiniband/hw/mlx5/odp.c | 37 ++++++----- include/rdma/ib_umem_odp.h | 14 +--- 4 files changed, 59 insertions(+), 91 deletions(-) diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/u= mem_odp.c index 01cbf7f55b3a..72885eca4181 100644 --- a/drivers/infiniband/core/umem_odp.c +++ b/drivers/infiniband/core/umem_odp.c @@ -307,22 +307,11 @@ EXPORT_SYMBOL(ib_umem_odp_release); static int ib_umem_odp_map_dma_single_page( struct ib_umem_odp *umem_odp, unsigned int dma_index, - struct page *page, - u64 access_mask) + struct page *page) { struct ib_device *dev =3D umem_odp->umem.ibdev; dma_addr_t *dma_addr =3D &umem_odp->dma_list[dma_index]; =20 - if (*dma_addr) { - /* - * If the page is already dma mapped it means it went through - * a non-invalidating trasition, like read-only to writable. - * Resync the flags. - */ - *dma_addr =3D (*dma_addr & ODP_DMA_ADDR_MASK) | access_mask; - return 0; - } - *dma_addr =3D ib_dma_map_page(dev, page, 0, 1 << umem_odp->page_shift, DMA_BIDIRECTIONAL); if (ib_dma_mapping_error(dev, *dma_addr)) { @@ -330,7 +319,6 @@ static int ib_umem_odp_map_dma_single_page( return -EFAULT; } umem_odp->npages++; - *dma_addr |=3D access_mask; return 0; } =20 @@ -366,9 +354,6 @@ int ib_umem_odp_map_dma_and_lock(struct ib_umem_odp *um= em_odp, u64 user_virt, struct hmm_range range =3D {}; unsigned long timeout; =20 - if (access_mask =3D=3D 0) - return -EINVAL; - if (user_virt < ib_umem_start(umem_odp) || user_virt + bcnt > ib_umem_end(umem_odp)) return -EFAULT; @@ -394,7 +379,7 @@ int ib_umem_odp_map_dma_and_lock(struct ib_umem_odp *um= em_odp, u64 user_virt, if (fault) { range.default_flags =3D HMM_PFN_REQ_FAULT; =20 - if (access_mask & ODP_WRITE_ALLOWED_BIT) + if (access_mask & HMM_PFN_WRITE) range.default_flags |=3D HMM_PFN_REQ_WRITE; } =20 @@ -426,22 +411,17 @@ int ib_umem_odp_map_dma_and_lock(struct ib_umem_odp *= umem_odp, u64 user_virt, for (pfn_index =3D 0; pfn_index < num_pfns; pfn_index +=3D 1 << (page_shift - PAGE_SHIFT), dma_index++) { =20 - if (fault) { - /* - * Since we asked for hmm_range_fault() to populate - * pages it shouldn't return an error entry on success. - */ - WARN_ON(range.hmm_pfns[pfn_index] & HMM_PFN_ERROR); - WARN_ON(!(range.hmm_pfns[pfn_index] & HMM_PFN_VALID)); - } else { - if (!(range.hmm_pfns[pfn_index] & HMM_PFN_VALID)) { - WARN_ON(umem_odp->dma_list[dma_index]); - continue; - } - access_mask =3D ODP_READ_ALLOWED_BIT; - if (range.hmm_pfns[pfn_index] & HMM_PFN_WRITE) - access_mask |=3D ODP_WRITE_ALLOWED_BIT; - } + /* + * Since we asked for hmm_range_fault() to populate + * pages it shouldn't return an error entry on success. + */ + WARN_ON(fault && range.hmm_pfns[pfn_index] & HMM_PFN_ERROR); + WARN_ON(fault && !(range.hmm_pfns[pfn_index] & HMM_PFN_VALID)); + if (!(range.hmm_pfns[pfn_index] & HMM_PFN_VALID)) + continue; + + if (range.hmm_pfns[pfn_index] & HMM_PFN_DMA_MAPPED) + continue; =20 hmm_order =3D hmm_pfn_to_map_order(range.hmm_pfns[pfn_index]); /* If a hugepage was detected and ODP wasn't set for, the umem @@ -456,13 +436,13 @@ int ib_umem_odp_map_dma_and_lock(struct ib_umem_odp *= umem_odp, u64 user_virt, } =20 ret =3D ib_umem_odp_map_dma_single_page( - umem_odp, dma_index, hmm_pfn_to_page(range.hmm_pfns[pfn_index]), - access_mask); + umem_odp, dma_index, hmm_pfn_to_page(range.hmm_pfns[pfn_index])); if (ret < 0) { ibdev_dbg(umem_odp->umem.ibdev, "ib_umem_odp_map_dma_single_page failed with error %d\n", ret); break; } + range.hmm_pfns[pfn_index] |=3D HMM_PFN_DMA_MAPPED; } /* upon success lock should stay on hold for the callee */ if (!ret) @@ -482,7 +462,6 @@ EXPORT_SYMBOL(ib_umem_odp_map_dma_and_lock); void ib_umem_odp_unmap_dma_pages(struct ib_umem_odp *umem_odp, u64 virt, u64 bound) { - dma_addr_t dma_addr; dma_addr_t dma; int idx; u64 addr; @@ -493,34 +472,33 @@ void ib_umem_odp_unmap_dma_pages(struct ib_umem_odp *= umem_odp, u64 virt, virt =3D max_t(u64, virt, ib_umem_start(umem_odp)); bound =3D min_t(u64, bound, ib_umem_end(umem_odp)); for (addr =3D virt; addr < bound; addr +=3D BIT(umem_odp->page_shift)) { + unsigned long pfn_idx =3D (addr - ib_umem_start(umem_odp)) >> PAGE_SHIFT; + struct page *page =3D hmm_pfn_to_page(umem_odp->pfn_list[pfn_idx]); + idx =3D (addr - ib_umem_start(umem_odp)) >> umem_odp->page_shift; dma =3D umem_odp->dma_list[idx]; =20 - /* The access flags guaranteed a valid DMA address in case was NULL */ - if (dma) { - unsigned long pfn_idx =3D (addr - ib_umem_start(umem_odp)) >> PAGE_SHIF= T; - struct page *page =3D hmm_pfn_to_page(umem_odp->pfn_list[pfn_idx]); - - dma_addr =3D dma & ODP_DMA_ADDR_MASK; - ib_dma_unmap_page(dev, dma_addr, - BIT(umem_odp->page_shift), - DMA_BIDIRECTIONAL); - if (dma & ODP_WRITE_ALLOWED_BIT) { - struct page *head_page =3D compound_head(page); - /* - * set_page_dirty prefers being called with - * the page lock. However, MMU notifiers are - * called sometimes with and sometimes without - * the lock. We rely on the umem_mutex instead - * to prevent other mmu notifiers from - * continuing and allowing the page mapping to - * be removed. - */ - set_page_dirty(head_page); - } - umem_odp->dma_list[idx] =3D 0; - umem_odp->npages--; + if (!(umem_odp->pfn_list[pfn_idx] & HMM_PFN_VALID)) + continue; + if (!(umem_odp->pfn_list[pfn_idx] & HMM_PFN_DMA_MAPPED)) + continue; + + ib_dma_unmap_page(dev, dma, BIT(umem_odp->page_shift), + DMA_BIDIRECTIONAL); + if (umem_odp->pfn_list[pfn_idx] & HMM_PFN_WRITE) { + struct page *head_page =3D compound_head(page); + /* + * set_page_dirty prefers being called with + * the page lock. However, MMU notifiers are + * called sometimes with and sometimes without + * the lock. We rely on the umem_mutex instead + * to prevent other mmu notifiers from + * continuing and allowing the page mapping to + * be removed. + */ + set_page_dirty(head_page); } + umem_odp->npages--; } } EXPORT_SYMBOL(ib_umem_odp_unmap_dma_pages); diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/m= lx5/mlx5_ib.h index d5eb1b726675..8149b4c3d3db 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -347,6 +347,7 @@ struct mlx5_ib_flow_db { #define MLX5_IB_UPD_XLT_PD BIT(4) #define MLX5_IB_UPD_XLT_ACCESS BIT(5) #define MLX5_IB_UPD_XLT_INDIRECT BIT(6) +#define MLX5_IB_UPD_XLT_DOWNGRADE BIT(7) =20 /* Private QP creation flags to be passed in ib_qp_init_attr.create_flags. * diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/= odp.c index a524181f34df..4bf691fb266f 100644 --- a/drivers/infiniband/hw/mlx5/odp.c +++ b/drivers/infiniband/hw/mlx5/odp.c @@ -34,6 +34,7 @@ #include #include #include +#include =20 #include "mlx5_ib.h" #include "cmd.h" @@ -143,22 +144,12 @@ static void populate_klm(struct mlx5_klm *pklm, size_= t idx, size_t nentries, } } =20 -static u64 umem_dma_to_mtt(dma_addr_t umem_dma) -{ - u64 mtt_entry =3D umem_dma & ODP_DMA_ADDR_MASK; - - if (umem_dma & ODP_READ_ALLOWED_BIT) - mtt_entry |=3D MLX5_IB_MTT_READ; - if (umem_dma & ODP_WRITE_ALLOWED_BIT) - mtt_entry |=3D MLX5_IB_MTT_WRITE; - - return mtt_entry; -} - static void populate_mtt(__be64 *pas, size_t idx, size_t nentries, struct mlx5_ib_mr *mr, int flags) { struct ib_umem_odp *odp =3D to_ib_umem_odp(mr->umem); + bool downgrade =3D flags & MLX5_IB_UPD_XLT_DOWNGRADE; + unsigned long pfn; dma_addr_t pa; size_t i; =20 @@ -166,8 +157,17 @@ static void populate_mtt(__be64 *pas, size_t idx, size= _t nentries, return; =20 for (i =3D 0; i < nentries; i++) { + pfn =3D odp->pfn_list[idx + i]; + if (!(pfn & HMM_PFN_VALID)) + /* Initial ODP init */ + continue; + pa =3D odp->dma_list[idx + i]; - pas[i] =3D cpu_to_be64(umem_dma_to_mtt(pa)); + pa |=3D MLX5_IB_MTT_READ; + if ((pfn & HMM_PFN_WRITE) && !downgrade) + pa |=3D MLX5_IB_MTT_WRITE; + + pas[i] =3D cpu_to_be64(pa); } } =20 @@ -268,8 +268,7 @@ static bool mlx5_ib_invalidate_range(struct mmu_interva= l_notifier *mni, * estimate the cost of another UMR vs. the cost of bigger * UMR. */ - if (umem_odp->dma_list[idx] & - (ODP_READ_ALLOWED_BIT | ODP_WRITE_ALLOWED_BIT)) { + if (umem_odp->pfn_list[idx] & HMM_PFN_VALID) { if (!in_block) { blk_start_idx =3D idx; in_block =3D 1; @@ -555,7 +554,7 @@ static int pagefault_real_mr(struct mlx5_ib_mr *mr, str= uct ib_umem_odp *odp, { int page_shift, ret, np; bool downgrade =3D flags & MLX5_PF_FLAGS_DOWNGRADE; - u64 access_mask; + u64 access_mask =3D 0; u64 start_idx; bool fault =3D !(flags & MLX5_PF_FLAGS_SNAPSHOT); u32 xlt_flags =3D MLX5_IB_UPD_XLT_ATOMIC; @@ -563,12 +562,14 @@ static int pagefault_real_mr(struct mlx5_ib_mr *mr, s= truct ib_umem_odp *odp, if (flags & MLX5_PF_FLAGS_ENABLE) xlt_flags |=3D MLX5_IB_UPD_XLT_ENABLE; =20 + if (flags & MLX5_PF_FLAGS_DOWNGRADE) + xlt_flags |=3D MLX5_IB_UPD_XLT_DOWNGRADE; + page_shift =3D odp->page_shift; start_idx =3D (user_va - ib_umem_start(odp)) >> page_shift; - access_mask =3D ODP_READ_ALLOWED_BIT; =20 if (odp->umem.writable && !downgrade) - access_mask |=3D ODP_WRITE_ALLOWED_BIT; + access_mask |=3D HMM_PFN_WRITE; =20 np =3D ib_umem_odp_map_dma_and_lock(odp, user_va, bcnt, access_mask, faul= t); if (np < 0) diff --git a/include/rdma/ib_umem_odp.h b/include/rdma/ib_umem_odp.h index c0c1215925eb..f99911b478c4 100644 --- a/include/rdma/ib_umem_odp.h +++ b/include/rdma/ib_umem_odp.h @@ -8,6 +8,7 @@ =20 #include #include +#include =20 struct ib_umem_odp { struct ib_umem umem; @@ -68,19 +69,6 @@ static inline size_t ib_umem_odp_num_pages(struct ib_ume= m_odp *umem_odp) umem_odp->page_shift; } =20 -/* - * The lower 2 bits of the DMA address signal the R/W permissions for - * the entry. To upgrade the permissions, provide the appropriate - * bitmask to the map_dma_pages function. - * - * Be aware that upgrading a mapped address might result in change of - * the DMA address for the page. - */ -#define ODP_READ_ALLOWED_BIT (1<<0ULL) -#define ODP_WRITE_ALLOWED_BIT (1<<1ULL) - -#define ODP_DMA_ADDR_MASK (~(ODP_READ_ALLOWED_BIT | ODP_WRITE_ALLOWED_BIT)) - #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING =20 struct ib_umem_odp * --=20 2.46.0 From nobody Sat Nov 30 01:44:58 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EE09C1B12C6; Thu, 12 Sep 2024 11:16:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139810; cv=none; b=Io33vzJVV6Czrk07trABkpIeaPqbJKMl4FrbgZIIUBGqq1Pe+giGK2A0XjO5tUs3g8XHs4TZOJyZfm/zndzhtHM05tUX3YBGLzBvlx1LeQ/TYigI6D1feM9kJdRMjPNd5h/iLC2II0EIeMWVLaMsucW9ZNHy8uxsIcNqerUQFXg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139810; c=relaxed/simple; bh=spVFy3m3YsiOCt+eM/7kAiwbky1QV6tGttipLIv2zOw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=choyhksEyh36jSBZoQ+fVndYUyed/5a944GFxYs5H+d4+9/RE/g1GIHV1CZKJF04Xau5l6zIbfFUuk1Zcefi9nvKnSS7sK/QUdRRo71m22BcD18HrbghY9Nljsp7jFI5LzwLFkRuvJjboSkACeBt8O+WFu4zkv4fTG/zMKnVdaU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=gxIRjDv2; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="gxIRjDv2" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B3784C4CEC3; Thu, 12 Sep 2024 11:16:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726139809; bh=spVFy3m3YsiOCt+eM/7kAiwbky1QV6tGttipLIv2zOw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=gxIRjDv2xSAwxfC7Pzv93fwrrIfQ9PwEmdy9PQY9kxiAaKE9zksC7l6BVJARsx/Su 1STWOMET2EijMQV/ogv3irEMdmIuZj3nnGt4KBmBcT6BEGtZX8cjNM1Pr20esfKwTf J4gCuwNEqcuVTf4mIf24b0bSOAFHULKIEK1sxUpdhCWhDJud3jPtiwwQRwp14hntAr VSglEmDWqgoCBkDlNN7vrfcRvvwdFx4IRR1eaFLfLtGEBfPB8gWc/Xet5lmJKcD4NA zjaak/j/qqJF6x0ng9Y22CZ07xQLxhebgIlmpr7AwEEvH/7NVecTBsfEfiaTHT8btF KIbgGEdSyuStA== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 12/21] RDMA/core: Separate DMA mapping to caching IOVA and page linkage Date: Thu, 12 Sep 2024 14:15:47 +0300 Message-ID: <32e9e95b05e49d95079dc7cbfd458b00b47b1c81.1726138681.git.leon@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Leon Romanovsky Reuse newly added DMA API to cache IOVA and only link/unlink pages in fast path. Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/umem_odp.c | 61 +++--------------------------- drivers/infiniband/hw/mlx5/odp.c | 7 +++- include/rdma/ib_umem_odp.h | 8 +--- kernel/dma/mapping.c | 7 +--- 4 files changed, 14 insertions(+), 69 deletions(-) diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/u= mem_odp.c index 72885eca4181..7bfa1e54454c 100644 --- a/drivers/infiniband/core/umem_odp.c +++ b/drivers/infiniband/core/umem_odp.c @@ -81,19 +81,12 @@ static inline int ib_init_umem_odp(struct ib_umem_odp *= umem_odp, if (!umem_odp->pfn_list) return -ENOMEM; =20 - umem_odp->dma_list =3D kvcalloc( - ndmas, sizeof(*umem_odp->dma_list), GFP_KERNEL); - if (!umem_odp->dma_list) { - ret =3D -ENOMEM; - goto out_pfn_list; - } =20 dma_init_iova_state(&umem_odp->state, dev->dma_device, DMA_BIDIRECTIONAL); ret =3D dma_alloc_iova(&umem_odp->state, end - start); if (ret) - goto out_dma_list; - + goto out_pfn_list; =20 ret =3D mmu_interval_notifier_insert(&umem_odp->notifier, umem_odp->umem.owning_mm, @@ -106,8 +99,6 @@ static inline int ib_init_umem_odp(struct ib_umem_odp *u= mem_odp, =20 out_free_iova: dma_free_iova(&umem_odp->state); -out_dma_list: - kvfree(umem_odp->dma_list); out_pfn_list: kvfree(umem_odp->pfn_list); return ret; @@ -285,7 +276,6 @@ void ib_umem_odp_release(struct ib_umem_odp *umem_odp) mutex_unlock(&umem_odp->umem_mutex); mmu_interval_notifier_remove(&umem_odp->notifier); dma_free_iova(&umem_odp->state); - kvfree(umem_odp->dma_list); kvfree(umem_odp->pfn_list); } put_pid(umem_odp->tgid); @@ -293,40 +283,10 @@ void ib_umem_odp_release(struct ib_umem_odp *umem_odp) } EXPORT_SYMBOL(ib_umem_odp_release); =20 -/* - * Map for DMA and insert a single page into the on-demand paging page tab= les. - * - * @umem: the umem to insert the page to. - * @dma_index: index in the umem to add the dma to. - * @page: the page struct to map and add. - * @access_mask: access permissions needed for this page. - * - * The function returns -EFAULT if the DMA mapping operation fails. - * - */ -static int ib_umem_odp_map_dma_single_page( - struct ib_umem_odp *umem_odp, - unsigned int dma_index, - struct page *page) -{ - struct ib_device *dev =3D umem_odp->umem.ibdev; - dma_addr_t *dma_addr =3D &umem_odp->dma_list[dma_index]; - - *dma_addr =3D ib_dma_map_page(dev, page, 0, 1 << umem_odp->page_shift, - DMA_BIDIRECTIONAL); - if (ib_dma_mapping_error(dev, *dma_addr)) { - *dma_addr =3D 0; - return -EFAULT; - } - umem_odp->npages++; - return 0; -} - /** * ib_umem_odp_map_dma_and_lock - DMA map userspace memory in an ODP MR an= d lock it. * * Maps the range passed in the argument to DMA addresses. - * The DMA addresses of the mapped pages is updated in umem_odp->dma_list. * Upon success the ODP MR will be locked to let caller complete its device * page table update. * @@ -434,15 +394,6 @@ int ib_umem_odp_map_dma_and_lock(struct ib_umem_odp *u= mem_odp, u64 user_virt, __func__, hmm_order, page_shift); break; } - - ret =3D ib_umem_odp_map_dma_single_page( - umem_odp, dma_index, hmm_pfn_to_page(range.hmm_pfns[pfn_index])); - if (ret < 0) { - ibdev_dbg(umem_odp->umem.ibdev, - "ib_umem_odp_map_dma_single_page failed with error %d\n", ret); - break; - } - range.hmm_pfns[pfn_index] |=3D HMM_PFN_DMA_MAPPED; } /* upon success lock should stay on hold for the callee */ if (!ret) @@ -462,10 +413,8 @@ EXPORT_SYMBOL(ib_umem_odp_map_dma_and_lock); void ib_umem_odp_unmap_dma_pages(struct ib_umem_odp *umem_odp, u64 virt, u64 bound) { - dma_addr_t dma; int idx; u64 addr; - struct ib_device *dev =3D umem_odp->umem.ibdev; =20 lockdep_assert_held(&umem_odp->umem_mutex); =20 @@ -473,19 +422,19 @@ void ib_umem_odp_unmap_dma_pages(struct ib_umem_odp *= umem_odp, u64 virt, bound =3D min_t(u64, bound, ib_umem_end(umem_odp)); for (addr =3D virt; addr < bound; addr +=3D BIT(umem_odp->page_shift)) { unsigned long pfn_idx =3D (addr - ib_umem_start(umem_odp)) >> PAGE_SHIFT; - struct page *page =3D hmm_pfn_to_page(umem_odp->pfn_list[pfn_idx]); =20 idx =3D (addr - ib_umem_start(umem_odp)) >> umem_odp->page_shift; - dma =3D umem_odp->dma_list[idx]; =20 if (!(umem_odp->pfn_list[pfn_idx] & HMM_PFN_VALID)) continue; if (!(umem_odp->pfn_list[pfn_idx] & HMM_PFN_DMA_MAPPED)) continue; =20 - ib_dma_unmap_page(dev, dma, BIT(umem_odp->page_shift), - DMA_BIDIRECTIONAL); + dma_hmm_unlink_page(&umem_odp->state, + &umem_odp->pfn_list[pfn_idx], + idx * (1 << umem_odp->page_shift)); if (umem_odp->pfn_list[pfn_idx] & HMM_PFN_WRITE) { + struct page *page =3D hmm_pfn_to_page(umem_odp->pfn_list[pfn_idx]); struct page *head_page =3D compound_head(page); /* * set_page_dirty prefers being called with diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/= odp.c index 4bf691fb266f..f1fe2b941bb4 100644 --- a/drivers/infiniband/hw/mlx5/odp.c +++ b/drivers/infiniband/hw/mlx5/odp.c @@ -149,6 +149,7 @@ static void populate_mtt(__be64 *pas, size_t idx, size_= t nentries, { struct ib_umem_odp *odp =3D to_ib_umem_odp(mr->umem); bool downgrade =3D flags & MLX5_IB_UPD_XLT_DOWNGRADE; + struct ib_device *dev =3D odp->umem.ibdev; unsigned long pfn; dma_addr_t pa; size_t i; @@ -162,12 +163,16 @@ static void populate_mtt(__be64 *pas, size_t idx, siz= e_t nentries, /* Initial ODP init */ continue; =20 - pa =3D odp->dma_list[idx + i]; + pa =3D dma_hmm_link_page(&odp->state, &odp->pfn_list[idx + i], + (idx + i) * (1 << odp->page_shift)); + WARN_ON_ONCE(ib_dma_mapping_error(dev, pa)); + pa |=3D MLX5_IB_MTT_READ; if ((pfn & HMM_PFN_WRITE) && !downgrade) pa |=3D MLX5_IB_MTT_WRITE; =20 pas[i] =3D cpu_to_be64(pa); + odp->npages++; } } =20 diff --git a/include/rdma/ib_umem_odp.h b/include/rdma/ib_umem_odp.h index f99911b478c4..cb081c69fd1a 100644 --- a/include/rdma/ib_umem_odp.h +++ b/include/rdma/ib_umem_odp.h @@ -18,15 +18,9 @@ struct ib_umem_odp { /* An array of the pfns included in the on-demand paging umem. */ unsigned long *pfn_list; =20 - /* - * An array with DMA addresses mapped for pfns in pfn_list. - * The lower two bits designate access permissions. - * See ODP_READ_ALLOWED_BIT and ODP_WRITE_ALLOWED_BIT. - */ - dma_addr_t *dma_list; struct dma_iova_state state; /* - * The umem_mutex protects the page_list and dma_list fields of an ODP + * The umem_mutex protects the page_list field of an ODP * umem, allowing only a single thread to map/unmap pages. The mutex * also protects access to the mmu notifier counters. */ diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index 5354ddc3ac03..38d7b3239dbb 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -1108,7 +1108,7 @@ dma_addr_t dma_hmm_link_page(struct dma_iova_state *s= tate, unsigned long *pfn, struct page *page =3D hmm_pfn_to_page(*pfn); phys_addr_t phys =3D page_to_phys(page); bool coherent =3D dev_is_dma_coherent(dev); - dma_addr_t addr; + dma_addr_t addr =3D phys_to_dma(dev, phys); int ret; =20 if (*pfn & HMM_PFN_DMA_MAPPED) @@ -1123,8 +1123,7 @@ dma_addr_t dma_hmm_link_page(struct dma_iova_state *s= tate, unsigned long *pfn, * The DMA address calculation below is based on the fact that * HMM doesn't work with swiotlb. */ - return (state->addr) ? state->addr + dma_offset : - phys_to_dma(dev, phys); + return (state->addr) ? state->addr + dma_offset : addr; =20 state->range_size =3D dma_offset; =20 @@ -1136,8 +1135,6 @@ dma_addr_t dma_hmm_link_page(struct dma_iova_state *s= tate, unsigned long *pfn, if (!use_dma_iommu(dev)) { if (!coherent) arch_sync_dma_for_device(phys, PAGE_SIZE, state->dir); - - addr =3D phys_to_dma(dev, phys); goto done; } =20 --=20 2.46.0 From nobody Sat Nov 30 01:44:58 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2960B1BDAAF; Thu, 12 Sep 2024 11:17:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139838; cv=none; b=iRSQSxAe8pit+GvsSlL/wdaKsI91BUIJl4uO3SR4smYqvJxemPADJN1/2K6kekhuEOXfyoKhyRCgeTq3M42LjKOadurzWJ2iCuQRV3ffcRTm/7lKiqY/qiuLAprJqrd0HgkzdcMMjujU+aQdDpM8/QMJpAHJYpRfxbW38bdju28= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139838; c=relaxed/simple; bh=wkFj1cXNmX5ZlhYlYscU+J0rWGL2pg30iH0xC9v8VVY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UPOwg7vM41VyNKl4GGMMrrvp/vu6He2klpQOfgk2QyzBq1gDljFaY+1Ne5xgFK0C2yoCsVP5/d4iTWBvr7XeOKc0AYUuWLiELvzQfEIeDbenlu27QQIz7TegK2FkE7gzyBgiZzS9tllJOOimfgMQZNlDJqwUxETSfB2SZU/TlxI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=QFWaxyYp; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="QFWaxyYp" Received: by smtp.kernel.org (Postfix) with ESMTPSA id F0BF2C4CEC3; Thu, 12 Sep 2024 11:17:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726139837; bh=wkFj1cXNmX5ZlhYlYscU+J0rWGL2pg30iH0xC9v8VVY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=QFWaxyYpYfe2FFX+K0p+Goy6dyaOFgOpyojuT6rJgqucu1wtXJ7Ko6B4Z9/VXe95S 2I3S4vrw+Ff5zb5nynL5Vh4Yw/g8+uv0ZNC87b03g7DhVrmV3YivYg7uBbwVNe/GZG +MBvLijRXvMiwL3ZM84io5kv0P8Wbcp45n4DIbM7CtyDWAAQlTWkB20jUCGxLF406f CPRu/FtyIHE0iWOhGGjqbS+ZeLgoViJvPDk3twLQyYGb5FUYjNquCd5o/JMVAX1ObX p/Oa3w5s2HlnKu34VY+AD80OkZOKYp43R+53dQV6GuLtLI/rUJqwRY2wXjSwusplbM Yvypy/0K4VPSA== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 13/21] RDMA/umem: Prevent UMEM ODP creation with SWIOTLB Date: Thu, 12 Sep 2024 14:15:48 +0300 Message-ID: X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Leon Romanovsky RDMA UMEM never supported DMA addresses returned from SWIOTLB, as these addresses should be programmed to the hardware which is not aware that it is bounce buffers and not real ones. Instead of silently leave broken system for the users who didn't know it, let's be explicit and return an error to them. Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/umem_odp.c | 78 +++++++++++++++--------------- drivers/iommu/dma-iommu.c | 1 + 2 files changed, 40 insertions(+), 39 deletions(-) diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/u= mem_odp.c index 7bfa1e54454c..58fc3d4bfb73 100644 --- a/drivers/infiniband/core/umem_odp.c +++ b/drivers/infiniband/core/umem_odp.c @@ -42,7 +42,7 @@ #include #include #include - +#include #include =20 #include "uverbs.h" @@ -51,49 +51,49 @@ static inline int ib_init_umem_odp(struct ib_umem_odp *= umem_odp, const struct mmu_interval_notifier_ops *ops) { struct ib_device *dev =3D umem_odp->umem.ibdev; + size_t page_size =3D 1UL << umem_odp->page_shift; + unsigned long start, end; + size_t ndmas, npfns; int ret; =20 umem_odp->umem.is_odp =3D 1; mutex_init(&umem_odp->umem_mutex); + if (umem_odp->is_implicit_odp) + return 0; + + if (!iommu_can_use_iova(dev->dma_device, NULL, page_size, + DMA_BIDIRECTIONAL)) + return -EOPNOTSUPP; + + start =3D ALIGN_DOWN(umem_odp->umem.address, page_size); + if (check_add_overflow(umem_odp->umem.address, + (unsigned long)umem_odp->umem.length, &end)) + return -EOVERFLOW; + end =3D ALIGN(end, page_size); + if (unlikely(end < page_size)) + return -EOVERFLOW; + + ndmas =3D (end - start) >> umem_odp->page_shift; + if (!ndmas) + return -EINVAL; + + npfns =3D (end - start) >> PAGE_SHIFT; + umem_odp->pfn_list =3D + kvcalloc(npfns, sizeof(*umem_odp->pfn_list), GFP_KERNEL); + if (!umem_odp->pfn_list) + return -ENOMEM; + + dma_init_iova_state(&umem_odp->state, dev->dma_device, + DMA_BIDIRECTIONAL); + ret =3D dma_alloc_iova(&umem_odp->state, end - start); + if (ret) + goto out_pfn_list; =20 - if (!umem_odp->is_implicit_odp) { - size_t page_size =3D 1UL << umem_odp->page_shift; - unsigned long start; - unsigned long end; - size_t ndmas, npfns; - - start =3D ALIGN_DOWN(umem_odp->umem.address, page_size); - if (check_add_overflow(umem_odp->umem.address, - (unsigned long)umem_odp->umem.length, - &end)) - return -EOVERFLOW; - end =3D ALIGN(end, page_size); - if (unlikely(end < page_size)) - return -EOVERFLOW; - - ndmas =3D (end - start) >> umem_odp->page_shift; - if (!ndmas) - return -EINVAL; - - npfns =3D (end - start) >> PAGE_SHIFT; - umem_odp->pfn_list =3D kvcalloc( - npfns, sizeof(*umem_odp->pfn_list), GFP_KERNEL); - if (!umem_odp->pfn_list) - return -ENOMEM; - - - dma_init_iova_state(&umem_odp->state, dev->dma_device, - DMA_BIDIRECTIONAL); - ret =3D dma_alloc_iova(&umem_odp->state, end - start); - if (ret) - goto out_pfn_list; - - ret =3D mmu_interval_notifier_insert(&umem_odp->notifier, - umem_odp->umem.owning_mm, - start, end - start, ops); - if (ret) - goto out_free_iova; - } + ret =3D mmu_interval_notifier_insert(&umem_odp->notifier, + umem_odp->umem.owning_mm, start, + end - start, ops); + if (ret) + goto out_free_iova; =20 return 0; =20 diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 3e2e382bb502..af3428ae150d 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -1849,6 +1849,7 @@ bool iommu_can_use_iova(struct device *dev, struct pa= ge *page, size_t size, =20 return true; } +EXPORT_SYMBOL_GPL(iommu_can_use_iova); =20 void iommu_setup_dma_ops(struct device *dev) { --=20 2.46.0 From nobody Sat Nov 30 01:44:58 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DC4951B984C; Thu, 12 Sep 2024 11:16:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139818; cv=none; b=QaiYq78kqUjnxmFn3QX408STIsA6BVyxG6tsEsSqcz5HdE1wh3+ISD53zdrfcYgfY4lLhk4YgVKhZ7QMsUdBWrhaU5PdUDPBj35e8F1eru7IJbhI8nPjJNpfzS+2G07K8zlJZ6BCp6gMBRE7AdS7pkpBzFiqCcqt9DJshutiHXM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139818; c=relaxed/simple; bh=cC5WVFVLsV04Ng3PmoGLp8oMvNfM8af8nEIAZfXZFwM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LwSnHxmXmPhUsAo+qJZpIVGOLz1KROEx66MQ1xp4zFa8kdW9XYLdKTpGZfyxfHhYa06rVyynNV3vroobBGG262GOym6nmTJ0gdjQL3e5aYcNW2ZhoH3r18TT/zK+7Tpo/OFT89ujfRRfyrfvhiV1uUKlq5pZG9dCfPXIAfuSWbE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=XgkpL9dA; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="XgkpL9dA" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EC6DBC4CECC; Thu, 12 Sep 2024 11:16:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726139817; bh=cC5WVFVLsV04Ng3PmoGLp8oMvNfM8af8nEIAZfXZFwM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=XgkpL9dAYw9R+tD+B3UEjGJFrVUvX1Xw+6lm2MrJQdQ2r7jtbxTWS059rlVncbbDM viAybphzEL6aZfdX67tx5qN60zahQm9sodxMVAdLuwJHQ4ZhAeHc+9lT0hho9L0Zt+ oBpgO08eoYR/nGGJg1qmcUPLU0/xHQhWvv8YK/+2OtKG7b/OMqTL43zYry+FPFQG+8 2tTdF4FeAPzQ2hPkC3ThKVPLEiaC0HCnSFhHkI1r8he/ZWRnHohIbxSBimVMY4w8VD xwP/4P1UVs4oknIzlIhf86d8YFOU8zNkaFWhwj+ZYMfzr8KOFF/pS3cyUpgyn6p2As QR+JcJ1RifusQ== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 14/21] vfio/mlx5: Explicitly use number of pages instead of allocated length Date: Thu, 12 Sep 2024 14:15:49 +0300 Message-ID: <29dea17e8e4dbbd839f14d3b248f5f3d06d251fa.1726138681.git.leon@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Leon Romanovsky allocated_length is a multiple of page size and number of pages, so let's change the functions to accept number of pages. It opens us a venue to combine receive and send paths together with code readability improvement. Signed-off-by: Leon Romanovsky --- drivers/vfio/pci/mlx5/cmd.c | 32 ++++++++++----------- drivers/vfio/pci/mlx5/cmd.h | 10 +++---- drivers/vfio/pci/mlx5/main.c | 56 +++++++++++++++++++++++------------- 3 files changed, 57 insertions(+), 41 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index 41a4b0cf4297..fdc3e515741f 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -318,8 +318,7 @@ static int _create_mkey(struct mlx5_core_dev *mdev, u32= pdn, struct mlx5_vhca_recv_buf *recv_buf, u32 *mkey) { - size_t npages =3D buf ? DIV_ROUND_UP(buf->allocated_length, PAGE_SIZE) : - recv_buf->npages; + size_t npages =3D buf ? buf->npages : recv_buf->npages; int err =3D 0, inlen; __be64 *mtt; void *mkc; @@ -375,7 +374,7 @@ static int mlx5vf_dma_data_buffer(struct mlx5_vhca_data= _buffer *buf) if (mvdev->mdev_detach) return -ENOTCONN; =20 - if (buf->dmaed || !buf->allocated_length) + if (buf->dmaed || !buf->npages) return -EINVAL; =20 ret =3D dma_map_sgtable(mdev->device, &buf->table.sgt, buf->dma_dir, 0); @@ -444,7 +443,7 @@ static int mlx5vf_add_migration_pages(struct mlx5_vhca_= data_buffer *buf, =20 if (ret) goto err; - buf->allocated_length +=3D filled * PAGE_SIZE; + buf->npages +=3D filled; /* clean input for another bulk allocation */ memset(page_list, 0, filled * sizeof(*page_list)); to_fill =3D min_t(unsigned int, to_alloc, @@ -460,8 +459,7 @@ static int mlx5vf_add_migration_pages(struct mlx5_vhca_= data_buffer *buf, } =20 struct mlx5_vhca_data_buffer * -mlx5vf_alloc_data_buffer(struct mlx5_vf_migration_file *migf, - size_t length, +mlx5vf_alloc_data_buffer(struct mlx5_vf_migration_file *migf, u32 npages, enum dma_data_direction dma_dir) { struct mlx5_vhca_data_buffer *buf; @@ -473,9 +471,8 @@ mlx5vf_alloc_data_buffer(struct mlx5_vf_migration_file = *migf, =20 buf->dma_dir =3D dma_dir; buf->migf =3D migf; - if (length) { - ret =3D mlx5vf_add_migration_pages(buf, - DIV_ROUND_UP_ULL(length, PAGE_SIZE)); + if (npages) { + ret =3D mlx5vf_add_migration_pages(buf, npages); if (ret) goto end; =20 @@ -501,8 +498,8 @@ void mlx5vf_put_data_buffer(struct mlx5_vhca_data_buffe= r *buf) } =20 struct mlx5_vhca_data_buffer * -mlx5vf_get_data_buffer(struct mlx5_vf_migration_file *migf, - size_t length, enum dma_data_direction dma_dir) +mlx5vf_get_data_buffer(struct mlx5_vf_migration_file *migf, u32 npages, + enum dma_data_direction dma_dir) { struct mlx5_vhca_data_buffer *buf, *temp_buf; struct list_head free_list; @@ -517,7 +514,7 @@ mlx5vf_get_data_buffer(struct mlx5_vf_migration_file *m= igf, list_for_each_entry_safe(buf, temp_buf, &migf->avail_list, buf_elm) { if (buf->dma_dir =3D=3D dma_dir) { list_del_init(&buf->buf_elm); - if (buf->allocated_length >=3D length) { + if (buf->npages >=3D npages) { spin_unlock_irq(&migf->list_lock); goto found; } @@ -531,7 +528,7 @@ mlx5vf_get_data_buffer(struct mlx5_vf_migration_file *m= igf, } } spin_unlock_irq(&migf->list_lock); - buf =3D mlx5vf_alloc_data_buffer(migf, length, dma_dir); + buf =3D mlx5vf_alloc_data_buffer(migf, npages, dma_dir); =20 found: while ((temp_buf =3D list_first_entry_or_null(&free_list, @@ -712,7 +709,7 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_d= evice *mvdev, MLX5_SET(save_vhca_state_in, in, op_mod, 0); MLX5_SET(save_vhca_state_in, in, vhca_id, mvdev->vhca_id); MLX5_SET(save_vhca_state_in, in, mkey, buf->mkey); - MLX5_SET(save_vhca_state_in, in, size, buf->allocated_length); + MLX5_SET(save_vhca_state_in, in, size, buf->npages * PAGE_SIZE); MLX5_SET(save_vhca_state_in, in, incremental, inc); MLX5_SET(save_vhca_state_in, in, set_track, track); =20 @@ -734,8 +731,11 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_= device *mvdev, } =20 if (!header_buf) { - header_buf =3D mlx5vf_get_data_buffer(migf, - sizeof(struct mlx5_vf_migration_header), DMA_NONE); + header_buf =3D mlx5vf_get_data_buffer( + migf, + DIV_ROUND_UP(sizeof(struct mlx5_vf_migration_header), + PAGE_SIZE), + DMA_NONE); if (IS_ERR(header_buf)) { err =3D PTR_ERR(header_buf); goto err_free; diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index df421dc6de04..7d4a833b6900 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -56,7 +56,7 @@ struct mlx5_vhca_data_buffer { struct sg_append_table table; loff_t start_pos; u64 length; - u64 allocated_length; + u32 npages; u32 mkey; enum dma_data_direction dma_dir; u8 dmaed:1; @@ -217,12 +217,12 @@ int mlx5vf_cmd_alloc_pd(struct mlx5_vf_migration_file= *migf); void mlx5vf_cmd_dealloc_pd(struct mlx5_vf_migration_file *migf); void mlx5fv_cmd_clean_migf_resources(struct mlx5_vf_migration_file *migf); struct mlx5_vhca_data_buffer * -mlx5vf_alloc_data_buffer(struct mlx5_vf_migration_file *migf, - size_t length, enum dma_data_direction dma_dir); +mlx5vf_alloc_data_buffer(struct mlx5_vf_migration_file *migf, u32 npages, + enum dma_data_direction dma_dir); void mlx5vf_free_data_buffer(struct mlx5_vhca_data_buffer *buf); struct mlx5_vhca_data_buffer * -mlx5vf_get_data_buffer(struct mlx5_vf_migration_file *migf, - size_t length, enum dma_data_direction dma_dir); +mlx5vf_get_data_buffer(struct mlx5_vf_migration_file *migf, u32 npages, + enum dma_data_direction dma_dir); void mlx5vf_put_data_buffer(struct mlx5_vhca_data_buffer *buf); struct page *mlx5vf_get_migration_page(struct mlx5_vhca_data_buffer *buf, unsigned long offset); diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index 61d9b0f9146d..d899cd499e27 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -308,6 +308,7 @@ static struct mlx5_vhca_data_buffer * mlx5vf_mig_file_get_stop_copy_buf(struct mlx5_vf_migration_file *migf, u8 index, size_t required_length) { + u32 npages =3D DIV_ROUND_UP(required_length, PAGE_SIZE); struct mlx5_vhca_data_buffer *buf =3D migf->buf[index]; u8 chunk_num; =20 @@ -315,12 +316,11 @@ mlx5vf_mig_file_get_stop_copy_buf(struct mlx5_vf_migr= ation_file *migf, chunk_num =3D buf->stop_copy_chunk_num; buf->migf->buf[index] =3D NULL; /* Checking whether the pre-allocated buffer can fit */ - if (buf->allocated_length >=3D required_length) + if (buf->npages >=3D npages) return buf; =20 mlx5vf_put_data_buffer(buf); - buf =3D mlx5vf_get_data_buffer(buf->migf, required_length, - DMA_FROM_DEVICE); + buf =3D mlx5vf_get_data_buffer(buf->migf, npages, DMA_FROM_DEVICE); if (IS_ERR(buf)) return buf; =20 @@ -373,7 +373,8 @@ static int mlx5vf_add_stop_copy_header(struct mlx5_vf_m= igration_file *migf, u8 *to_buff; int ret; =20 - header_buf =3D mlx5vf_get_data_buffer(migf, size, DMA_NONE); + header_buf =3D mlx5vf_get_data_buffer(migf, DIV_ROUND_UP(size, PAGE_SIZE), + DMA_NONE); if (IS_ERR(header_buf)) return PTR_ERR(header_buf); =20 @@ -388,7 +389,7 @@ static int mlx5vf_add_stop_copy_header(struct mlx5_vf_m= igration_file *migf, to_buff =3D kmap_local_page(page); memcpy(to_buff, &header, sizeof(header)); header_buf->length =3D sizeof(header); - data.stop_copy_size =3D cpu_to_le64(migf->buf[0]->allocated_length); + data.stop_copy_size =3D cpu_to_le64(migf->buf[0]->npages * PAGE_SIZE); memcpy(to_buff + sizeof(header), &data, sizeof(data)); header_buf->length +=3D sizeof(data); kunmap_local(to_buff); @@ -437,15 +438,20 @@ static int mlx5vf_prep_stop_copy(struct mlx5vf_pci_co= re_device *mvdev, =20 num_chunks =3D mvdev->chunk_mode ? MAX_NUM_CHUNKS : 1; for (i =3D 0; i < num_chunks; i++) { - buf =3D mlx5vf_get_data_buffer(migf, inc_state_size, DMA_FROM_DEVICE); + buf =3D mlx5vf_get_data_buffer( + migf, DIV_ROUND_UP(inc_state_size, PAGE_SIZE), + DMA_FROM_DEVICE); if (IS_ERR(buf)) { ret =3D PTR_ERR(buf); goto err; } =20 migf->buf[i] =3D buf; - buf =3D mlx5vf_get_data_buffer(migf, - sizeof(struct mlx5_vf_migration_header), DMA_NONE); + buf =3D mlx5vf_get_data_buffer( + migf, + DIV_ROUND_UP(sizeof(struct mlx5_vf_migration_header), + PAGE_SIZE), + DMA_NONE); if (IS_ERR(buf)) { ret =3D PTR_ERR(buf); goto err; @@ -553,7 +559,8 @@ static long mlx5vf_precopy_ioctl(struct file *filp, uns= igned int cmd, * We finished transferring the current state and the device has a * dirty state, save a new state to be ready for. */ - buf =3D mlx5vf_get_data_buffer(migf, inc_length, DMA_FROM_DEVICE); + buf =3D mlx5vf_get_data_buffer(migf, DIV_ROUND_UP(inc_length, PAGE_SIZE), + DMA_FROM_DEVICE); if (IS_ERR(buf)) { ret =3D PTR_ERR(buf); mlx5vf_mark_err(migf); @@ -674,8 +681,8 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_devi= ce *mvdev, bool track) =20 if (track) { /* leave the allocated buffer ready for the stop-copy phase */ - buf =3D mlx5vf_alloc_data_buffer(migf, - migf->buf[0]->allocated_length, DMA_FROM_DEVICE); + buf =3D mlx5vf_alloc_data_buffer(migf, migf->buf[0]->npages, + DMA_FROM_DEVICE); if (IS_ERR(buf)) { ret =3D PTR_ERR(buf); goto out_pd; @@ -918,11 +925,14 @@ static ssize_t mlx5vf_resume_write(struct file *filp,= const char __user *buf, goto out_unlock; break; case MLX5_VF_LOAD_STATE_PREP_HEADER_DATA: - if (vhca_buf_header->allocated_length < migf->record_size) { + { + u32 npages =3D DIV_ROUND_UP(migf->record_size, PAGE_SIZE); + + if (vhca_buf_header->npages < npages) { mlx5vf_free_data_buffer(vhca_buf_header); =20 - migf->buf_header[0] =3D mlx5vf_alloc_data_buffer(migf, - migf->record_size, DMA_NONE); + migf->buf_header[0] =3D mlx5vf_alloc_data_buffer( + migf, npages, DMA_NONE); if (IS_ERR(migf->buf_header[0])) { ret =3D PTR_ERR(migf->buf_header[0]); migf->buf_header[0] =3D NULL; @@ -935,6 +945,7 @@ static ssize_t mlx5vf_resume_write(struct file *filp, c= onst char __user *buf, vhca_buf_header->start_pos =3D migf->max_pos; migf->load_state =3D MLX5_VF_LOAD_STATE_READ_HEADER_DATA; break; + } case MLX5_VF_LOAD_STATE_READ_HEADER_DATA: ret =3D mlx5vf_resume_read_header_data(migf, vhca_buf_header, &buf, &len, pos, &done); @@ -945,12 +956,13 @@ static ssize_t mlx5vf_resume_write(struct file *filp,= const char __user *buf, { u64 size =3D max(migf->record_size, migf->stop_copy_prep_size); + u32 npages =3D DIV_ROUND_UP(size, PAGE_SIZE); =20 - if (vhca_buf->allocated_length < size) { + if (vhca_buf->npages < npages) { mlx5vf_free_data_buffer(vhca_buf); =20 - migf->buf[0] =3D mlx5vf_alloc_data_buffer(migf, - size, DMA_TO_DEVICE); + migf->buf[0] =3D mlx5vf_alloc_data_buffer( + migf, npages, DMA_TO_DEVICE); if (IS_ERR(migf->buf[0])) { ret =3D PTR_ERR(migf->buf[0]); migf->buf[0] =3D NULL; @@ -1033,8 +1045,11 @@ mlx5vf_pci_resume_device_data(struct mlx5vf_pci_core= _device *mvdev) } =20 migf->buf[0] =3D buf; - buf =3D mlx5vf_alloc_data_buffer(migf, - sizeof(struct mlx5_vf_migration_header), DMA_NONE); + buf =3D mlx5vf_alloc_data_buffer( + migf, + DIV_ROUND_UP(sizeof(struct mlx5_vf_migration_header), + PAGE_SIZE), + DMA_NONE); if (IS_ERR(buf)) { ret =3D PTR_ERR(buf); goto out_buf; @@ -1151,7 +1166,8 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci= _core_device *mvdev, MLX5VF_QUERY_INC | MLX5VF_QUERY_CLEANUP); if (ret) return ERR_PTR(ret); - buf =3D mlx5vf_get_data_buffer(migf, size, DMA_FROM_DEVICE); + buf =3D mlx5vf_get_data_buffer(migf, + DIV_ROUND_UP(size, PAGE_SIZE), DMA_FROM_DEVICE); if (IS_ERR(buf)) return ERR_CAST(buf); /* pre_copy cleanup */ --=20 2.46.0 From nobody Sat Nov 30 01:44:58 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D4C911A302B; Thu, 12 Sep 2024 11:17:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139822; cv=none; b=oqdD2d41l1995Yp7TYHyx599ELu3ElpBzjUXfmhUEdFQNT4XXhql+0ALJ9D+tbrihirc6soZyoZIVaRjF4eYXZhBpIJgDdkEe2OOxwKxmwuY3rOLhVdb0qFdOykYL2Qmxxn/UMFDuKM5d6Wkxr5uhuzrhukcVTkkBYx3dK/7UrY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139822; c=relaxed/simple; bh=yF378rSxOmaEB1KQyXFnlpitufhUf1qEDwDdVCzADsk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=iTsYkqpKAnoWvSTp5194tmbu6372V/I7ziSAw3c9+OJqO7dulGWmv650i44w5xTG2FAUMrTyWNxxQIA+9ac5XU4qH9QB+fmDvXcUv9EAJPsr+jzW8rhIjo4W5zOr01EYgFI7HmHgrRMz/W0tC0of6c37pq6Gmzp15Sswv8Rfqdk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=NiV6oHmJ; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="NiV6oHmJ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EBBD2C4CEC3; Thu, 12 Sep 2024 11:17:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726139821; bh=yF378rSxOmaEB1KQyXFnlpitufhUf1qEDwDdVCzADsk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=NiV6oHmJseR+ndn1UqV+Ilrat+GhnqqpPHeKSnqhVUuIHcdziH99Uslcv+vxhwj9x ejDSH5390uYhbTFlSnL3FuYg0OqTJKwz1snVJNSTHOBVFpcKrTMPo3hLFwiKwWzVWB 0oFNUNEURS7xj82souQoVDAUnW15w6cKkAfyI/Z55bnykzizga5/J4wJPvhVIcBAsh QR7Tm+6wMUz5YibP5bRhzlRRLRCdBxSrpm+9B93GfJsK9MzlH5Cr6J4clYc2St8IlH 8lO3nz0w1UBORXviWvfXUfXtzaNj0dKrW/Pe+fFILltSD3nR2aMYpQAp1g8hzdqJX3 cEbOXRiaU7d5g== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 15/21] vfio/mlx5: Rewrite create mkey flow to allow better code reuse Date: Thu, 12 Sep 2024 14:15:50 +0300 Message-ID: <22ea84d941b486127bd8bee0655abd2918035d81.1726138681.git.leon@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Leon Romanovsky Change the creation of mkey to be performed in multiple steps: data allocation, DMA setup and actual call to HW to create that mkey. In this new flow, the whole input to MKEY command is saved to eliminate the need to keep array of pointers for DMA addresses for receive list and in the future patches for send list too. In addition to memory size reduce and elimination of unnecessary data movements to set MKEY input, the code is prepared for future reuse. Signed-off-by: Leon Romanovsky --- drivers/vfio/pci/mlx5/cmd.c | 156 ++++++++++++++++++++---------------- drivers/vfio/pci/mlx5/cmd.h | 4 +- 2 files changed, 90 insertions(+), 70 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index fdc3e515741f..1832a6c1f35d 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -313,39 +313,21 @@ static int mlx5vf_cmd_get_vhca_id(struct mlx5_core_de= v *mdev, u16 function_id, return ret; } =20 -static int _create_mkey(struct mlx5_core_dev *mdev, u32 pdn, - struct mlx5_vhca_data_buffer *buf, - struct mlx5_vhca_recv_buf *recv_buf, - u32 *mkey) +static u32 *alloc_mkey_in(u32 npages, u32 pdn) { - size_t npages =3D buf ? buf->npages : recv_buf->npages; - int err =3D 0, inlen; - __be64 *mtt; + int inlen; void *mkc; u32 *in; =20 inlen =3D MLX5_ST_SZ_BYTES(create_mkey_in) + - sizeof(*mtt) * round_up(npages, 2); + sizeof(__be64) * round_up(npages, 2); =20 - in =3D kvzalloc(inlen, GFP_KERNEL); + in =3D kvzalloc(inlen, GFP_KERNEL_ACCOUNT); if (!in) - return -ENOMEM; + return NULL; =20 MLX5_SET(create_mkey_in, in, translations_octword_actual_size, DIV_ROUND_UP(npages, 2)); - mtt =3D (__be64 *)MLX5_ADDR_OF(create_mkey_in, in, klm_pas_mtt); - - if (buf) { - struct sg_dma_page_iter dma_iter; - - for_each_sgtable_dma_page(&buf->table.sgt, &dma_iter, 0) - *mtt++ =3D cpu_to_be64(sg_page_iter_dma_address(&dma_iter)); - } else { - int i; - - for (i =3D 0; i < npages; i++) - *mtt++ =3D cpu_to_be64(recv_buf->dma_addrs[i]); - } =20 mkc =3D MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry); MLX5_SET(mkc, mkc, access_mode_1_0, MLX5_MKC_ACCESS_MODE_MTT); @@ -359,9 +341,29 @@ static int _create_mkey(struct mlx5_core_dev *mdev, u3= 2 pdn, MLX5_SET(mkc, mkc, log_page_size, PAGE_SHIFT); MLX5_SET(mkc, mkc, translations_octword_size, DIV_ROUND_UP(npages, 2)); MLX5_SET64(mkc, mkc, len, npages * PAGE_SIZE); - err =3D mlx5_core_create_mkey(mdev, mkey, in, inlen); - kvfree(in); - return err; + + return in; +} + +static int create_mkey(struct mlx5_core_dev *mdev, u32 npages, + struct mlx5_vhca_data_buffer *buf, u32 *mkey_in, + u32 *mkey) +{ + __be64 *mtt; + int inlen; + + mtt =3D (__be64 *)MLX5_ADDR_OF(create_mkey_in, mkey_in, klm_pas_mtt); + if (buf) { + struct sg_dma_page_iter dma_iter; + + for_each_sgtable_dma_page(&buf->table.sgt, &dma_iter, 0) + *mtt++ =3D cpu_to_be64(sg_page_iter_dma_address(&dma_iter)); + } + + inlen =3D MLX5_ST_SZ_BYTES(create_mkey_in) + + sizeof(__be64) * round_up(npages, 2); + + return mlx5_core_create_mkey(mdev, mkey, mkey_in, inlen); } =20 static int mlx5vf_dma_data_buffer(struct mlx5_vhca_data_buffer *buf) @@ -374,20 +376,28 @@ static int mlx5vf_dma_data_buffer(struct mlx5_vhca_da= ta_buffer *buf) if (mvdev->mdev_detach) return -ENOTCONN; =20 - if (buf->dmaed || !buf->npages) + if (buf->mkey_in || !buf->npages) return -EINVAL; =20 ret =3D dma_map_sgtable(mdev->device, &buf->table.sgt, buf->dma_dir, 0); if (ret) return ret; =20 - ret =3D _create_mkey(mdev, buf->migf->pdn, buf, NULL, &buf->mkey); - if (ret) + buf->mkey_in =3D alloc_mkey_in(buf->npages, buf->migf->pdn); + if (!buf->mkey_in) { + ret =3D -ENOMEM; goto err; + } =20 - buf->dmaed =3D true; + ret =3D create_mkey(mdev, buf->npages, buf, buf->mkey_in, &buf->mkey); + if (ret) + goto err_create_mkey; =20 return 0; + +err_create_mkey: + kvfree(buf->mkey_in); + buf->mkey_in =3D NULL; err: dma_unmap_sgtable(mdev->device, &buf->table.sgt, buf->dma_dir, 0); return ret; @@ -401,8 +411,9 @@ void mlx5vf_free_data_buffer(struct mlx5_vhca_data_buff= er *buf) lockdep_assert_held(&migf->mvdev->state_mutex); WARN_ON(migf->mvdev->mdev_detach); =20 - if (buf->dmaed) { + if (buf->mkey_in) { mlx5_core_destroy_mkey(migf->mvdev->mdev, buf->mkey); + kvfree(buf->mkey_in); dma_unmap_sgtable(migf->mvdev->mdev->device, &buf->table.sgt, buf->dma_dir, 0); } @@ -779,7 +790,7 @@ int mlx5vf_cmd_load_vhca_state(struct mlx5vf_pci_core_d= evice *mvdev, if (mvdev->mdev_detach) return -ENOTCONN; =20 - if (!buf->dmaed) { + if (!buf->mkey_in) { err =3D mlx5vf_dma_data_buffer(buf); if (err) return err; @@ -1380,56 +1391,54 @@ static int alloc_recv_pages(struct mlx5_vhca_recv_b= uf *recv_buf, kvfree(recv_buf->page_list); return -ENOMEM; } +static void unregister_dma_pages(struct mlx5_core_dev *mdev, u32 npages, + u32 *mkey_in) +{ + dma_addr_t addr; + __be64 *mtt; + int i; + + mtt =3D (__be64 *)MLX5_ADDR_OF(create_mkey_in, mkey_in, klm_pas_mtt); + for (i =3D npages - 1; i >=3D 0; i--) { + addr =3D be64_to_cpu(mtt[i]); + dma_unmap_single(mdev->device, addr, PAGE_SIZE, + DMA_FROM_DEVICE); + } +} =20 -static int register_dma_recv_pages(struct mlx5_core_dev *mdev, - struct mlx5_vhca_recv_buf *recv_buf) +static int register_dma_pages(struct mlx5_core_dev *mdev, u32 npages, + struct page **page_list, u32 *mkey_in) { - int i, j; + dma_addr_t addr; + __be64 *mtt; + int i; =20 - recv_buf->dma_addrs =3D kvcalloc(recv_buf->npages, - sizeof(*recv_buf->dma_addrs), - GFP_KERNEL_ACCOUNT); - if (!recv_buf->dma_addrs) - return -ENOMEM; + mtt =3D (__be64 *)MLX5_ADDR_OF(create_mkey_in, mkey_in, klm_pas_mtt); =20 - for (i =3D 0; i < recv_buf->npages; i++) { - recv_buf->dma_addrs[i] =3D dma_map_page(mdev->device, - recv_buf->page_list[i], - 0, PAGE_SIZE, - DMA_FROM_DEVICE); - if (dma_mapping_error(mdev->device, recv_buf->dma_addrs[i])) + for (i =3D 0; i < npages; i++) { + addr =3D dma_map_page(mdev->device, page_list[i], 0, PAGE_SIZE, + DMA_FROM_DEVICE); + if (dma_mapping_error(mdev->device, addr)) goto error; + + *mtt++ =3D cpu_to_be64(addr); } + return 0; =20 error: - for (j =3D 0; j < i; j++) - dma_unmap_single(mdev->device, recv_buf->dma_addrs[j], - PAGE_SIZE, DMA_FROM_DEVICE); - - kvfree(recv_buf->dma_addrs); + unregister_dma_pages(mdev, i, mkey_in); return -ENOMEM; } =20 -static void unregister_dma_recv_pages(struct mlx5_core_dev *mdev, - struct mlx5_vhca_recv_buf *recv_buf) -{ - int i; - - for (i =3D 0; i < recv_buf->npages; i++) - dma_unmap_single(mdev->device, recv_buf->dma_addrs[i], - PAGE_SIZE, DMA_FROM_DEVICE); - - kvfree(recv_buf->dma_addrs); -} - static void mlx5vf_free_qp_recv_resources(struct mlx5_core_dev *mdev, struct mlx5_vhca_qp *qp) { struct mlx5_vhca_recv_buf *recv_buf =3D &qp->recv_buf; =20 mlx5_core_destroy_mkey(mdev, recv_buf->mkey); - unregister_dma_recv_pages(mdev, recv_buf); + unregister_dma_pages(mdev, recv_buf->npages, recv_buf->mkey_in); + kvfree(recv_buf->mkey_in); free_recv_pages(&qp->recv_buf); } =20 @@ -1445,18 +1454,29 @@ static int mlx5vf_alloc_qp_recv_resources(struct ml= x5_core_dev *mdev, if (err < 0) return err; =20 - err =3D register_dma_recv_pages(mdev, recv_buf); - if (err) + recv_buf->mkey_in =3D alloc_mkey_in(npages, pdn); + if (!recv_buf->mkey_in) { + err =3D -ENOMEM; goto end; + } + + err =3D register_dma_pages(mdev, npages, recv_buf->page_list, + recv_buf->mkey_in); + if (err) + goto err_register_dma; =20 - err =3D _create_mkey(mdev, pdn, NULL, recv_buf, &recv_buf->mkey); + err =3D create_mkey(mdev, npages, NULL, recv_buf->mkey_in, + &recv_buf->mkey); if (err) goto err_create_mkey; =20 return 0; =20 err_create_mkey: - unregister_dma_recv_pages(mdev, recv_buf); + unregister_dma_pages(mdev, npages, recv_buf->mkey_in); +err_register_dma: + kvfree(recv_buf->mkey_in); + recv_buf->mkey_in =3D NULL; end: free_recv_pages(recv_buf); return err; diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index 7d4a833b6900..25dd6ff54591 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -58,8 +58,8 @@ struct mlx5_vhca_data_buffer { u64 length; u32 npages; u32 mkey; + u32 *mkey_in; enum dma_data_direction dma_dir; - u8 dmaed:1; u8 stop_copy_chunk_num; struct list_head buf_elm; struct mlx5_vf_migration_file *migf; @@ -133,8 +133,8 @@ struct mlx5_vhca_cq { struct mlx5_vhca_recv_buf { u32 npages; struct page **page_list; - dma_addr_t *dma_addrs; u32 next_rq_offset; + u32 *mkey_in; u32 mkey; }; =20 --=20 2.46.0 From nobody Sat Nov 30 01:44:58 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 33F4B1BBBDC; Thu, 12 Sep 2024 11:17:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139826; cv=none; b=B2H+QqHL6JOxObSzi614r84YtnnegOTlBReYtlf1lh8+x2gcYjd1wb7ByXGb9SGloN37PxVv8XvG6o4n0XgAUL+IgTUJo9pKZs45TbK4v96HTY7+I0uVZXLtIzph9WKvp2vuA/Y1H0yHtkpJM+ei6qLTrHE8n1kNFV3icfFckS8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139826; c=relaxed/simple; bh=bjk1/BnJ6tUGghkO4rbI4kdxFyJB6hE60cQFrbYFFeA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ifc9zP8ol1KcNwtBZ3BBbXUuvjCDZF8TyaLQ/s2dL8SOwIgrnI59I3B1BHI3wpUrovM+o6c2UhEIHhMAf/LnJFWaBMBo7pXykYxcHNIYxkUFoKwAmxoD6Lqm6XkI3zUWWQS4FrXj1tLl6c7HVkaQdgKjNrVwm/Ll1TlDKMsznSA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=UsRoai/G; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="UsRoai/G" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E846BC4CEC3; Thu, 12 Sep 2024 11:17:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726139825; bh=bjk1/BnJ6tUGghkO4rbI4kdxFyJB6hE60cQFrbYFFeA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=UsRoai/Gl1O7Sp/5JxPcgDAMkM6ipvgzmFXZSPfGag9Gk7eyJG+VdfuCd9QQGwabJ pJBnMolBMOdTzs2oBWlCngWlKspt0aKi0IHd0NV1QihXlPw5ge9SnJv5HvlQ9XFtJ2 wvSrcUz771/xmw0ldpU61b3KgEsbAG/eWsjbov/ea9Im3sNpjnLMXT3teQeLNuWGlW zII+LQmGfNzh72GCbpz3SCfX2gX6N1CfXTsSY+Ww/DdM41MOuzoB721TYZNr8BqV4f riGijidjEDVmzfKlimoUn1I9Q+v5auRTA9QlQWa335Fob1c5NshgGTh7q6eI8Gv2/D Lza+f4KBMQA/g== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 16/21] vfio/mlx5: Explicitly store page list Date: Thu, 12 Sep 2024 14:15:51 +0300 Message-ID: X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Leon Romanovsky As a preparation to removal scatter-gather table and unifying receive and send list, explicitly store page list. Signed-off-by: Leon Romanovsky --- drivers/vfio/pci/mlx5/cmd.c | 29 ++++++++++++----------------- drivers/vfio/pci/mlx5/cmd.h | 1 + 2 files changed, 13 insertions(+), 17 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index 1832a6c1f35d..34ae3e299a9e 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -422,6 +422,7 @@ void mlx5vf_free_data_buffer(struct mlx5_vhca_data_buff= er *buf) for_each_sgtable_page(&buf->table.sgt, &sg_iter, 0) __free_page(sg_page_iter_page(&sg_iter)); sg_free_append_table(&buf->table); + kvfree(buf->page_list); kfree(buf); } =20 @@ -434,39 +435,33 @@ static int mlx5vf_add_migration_pages(struct mlx5_vhc= a_data_buffer *buf, unsigned int to_fill; int ret; =20 - to_fill =3D min_t(unsigned int, npages, PAGE_SIZE / sizeof(*page_list)); - page_list =3D kvzalloc(to_fill * sizeof(*page_list), GFP_KERNEL_ACCOUNT); + to_fill =3D min_t(unsigned int, npages, PAGE_SIZE / sizeof(*buf->page_lis= t)); + page_list =3D kvzalloc(to_fill * sizeof(*buf->page_list), GFP_KERNEL_ACCO= UNT); if (!page_list) return -ENOMEM; =20 + buf->page_list =3D page_list; + do { filled =3D alloc_pages_bulk_array(GFP_KERNEL_ACCOUNT, to_fill, - page_list); - if (!filled) { - ret =3D -ENOMEM; - goto err; - } + buf->page_list + buf->npages); + if (!filled) + return -ENOMEM; + to_alloc -=3D filled; ret =3D sg_alloc_append_table_from_pages( - &buf->table, page_list, filled, 0, + &buf->table, buf->page_list + buf->npages, filled, 0, filled << PAGE_SHIFT, UINT_MAX, SG_MAX_SINGLE_ALLOC, GFP_KERNEL_ACCOUNT); =20 if (ret) - goto err; + return ret; buf->npages +=3D filled; - /* clean input for another bulk allocation */ - memset(page_list, 0, filled * sizeof(*page_list)); to_fill =3D min_t(unsigned int, to_alloc, - PAGE_SIZE / sizeof(*page_list)); + PAGE_SIZE / sizeof(*buf->page_list)); } while (to_alloc > 0); =20 - kvfree(page_list); return 0; - -err: - kvfree(page_list); - return ret; } =20 struct mlx5_vhca_data_buffer * diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index 25dd6ff54591..5b764199db53 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -53,6 +53,7 @@ struct mlx5_vf_migration_header { }; =20 struct mlx5_vhca_data_buffer { + struct page **page_list; struct sg_append_table table; loff_t start_pos; u64 length; --=20 2.46.0 From nobody Sat Nov 30 01:44:58 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E04451BC9F8; Thu, 12 Sep 2024 11:17:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139830; cv=none; b=LdalVTTN/SkezhSAKz+AFCRw38bSB3y5wriN6r9hfZfmULvPUmJJT/xGFFVnPcAlxk7U2u5kzIpvHVJrimf6G0rEZTjQQQsuQTOH0PEd4xB8LIve8ZWeru9ub0sAp4QuXcz8Q2kOTljJuw2N45X3cu9cXw+DCvR8ZThPpIPDnBI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139830; c=relaxed/simple; bh=Dk4iS9NnVthuwT4fk3pwSykLGjgnL3XiLR0Yj7fW2pg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=J9TJMRWYcn/wOooOVn40h6rFwsAVANyXYcjnWcys362WMAQXXFqkLqN7/y6AAWUGUF7oF2Yhpxk9LiX+o03w7NB74Hm6RUzD09p0vIJUp2XbyF8d5oTQlt0ViafbhL/MTeL1Z9HGSKRx189D9f/BvlwJrK3LnPHXf21jdXL7Pkw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=kjpRjm1h; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="kjpRjm1h" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EE28CC4CEC3; Thu, 12 Sep 2024 11:17:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726139829; bh=Dk4iS9NnVthuwT4fk3pwSykLGjgnL3XiLR0Yj7fW2pg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=kjpRjm1h2bUvRuIExn5ie51IkI7Jm6dqok5PQoPIBdaKTiScuVj9poh0E5VihqlwL JjaQ3GvVH6eIHFREycnTcfFnR8B5WdUisaTBpXd95amUmnXD6U/VBhBmxrOCXvWbD/ 6VZ+PGdVF+TBqen+57t389GmQ2RDNqfz8l3I/lRG44jgVUlAaTyuO2vAXzLUl++RsP IaiugOGUZp1xGgxLVQFtR+MfHUzhg/N4H6zHKemVyqeb6wLciiRIG9IIgIwv3tWa++ P5dgq3KrKQkx0UkFhXCgHL+QUWnncEyB/uwrBtULY/llel3bfOUIKmAh2lBllYM+3R 2ozmZrOHNHgmg== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 17/21] vfio/mlx5: Convert vfio to use DMA link API Date: Thu, 12 Sep 2024 14:15:52 +0300 Message-ID: <6369f834edd1e1144fbe11fd4b3aed3f63e33ade.1726138681.git.leon@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Leon Romanovsky Remove intermediate scatter-gather table as it is not needed if DMA link API is used. This conversion reduces drastically the memory used to manage that table. Signed-off-by: Leon Romanovsky --- drivers/vfio/pci/mlx5/cmd.c | 211 ++++++++++++++++++----------------- drivers/vfio/pci/mlx5/cmd.h | 8 +- drivers/vfio/pci/mlx5/main.c | 33 +----- 3 files changed, 112 insertions(+), 140 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index 34ae3e299a9e..aa2f1ec326c0 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -345,25 +345,78 @@ static u32 *alloc_mkey_in(u32 npages, u32 pdn) return in; } =20 -static int create_mkey(struct mlx5_core_dev *mdev, u32 npages, - struct mlx5_vhca_data_buffer *buf, u32 *mkey_in, +static int create_mkey(struct mlx5_core_dev *mdev, u32 npages, u32 *mkey_i= n, u32 *mkey) { + int inlen =3D MLX5_ST_SZ_BYTES(create_mkey_in) + + sizeof(__be64) * round_up(npages, 2); + + return mlx5_core_create_mkey(mdev, mkey, mkey_in, inlen); +} + +static void unregister_dma_pages(struct mlx5_core_dev *mdev, u32 npages, + u32 *mkey_in, struct dma_iova_state *state) +{ + dma_addr_t addr; __be64 *mtt; - int inlen; + int i; =20 - mtt =3D (__be64 *)MLX5_ADDR_OF(create_mkey_in, mkey_in, klm_pas_mtt); - if (buf) { - struct sg_dma_page_iter dma_iter; + WARN_ON_ONCE(state->dir =3D=3D DMA_NONE); =20 - for_each_sgtable_dma_page(&buf->table.sgt, &dma_iter, 0) - *mtt++ =3D cpu_to_be64(sg_page_iter_dma_address(&dma_iter)); + if (state->use_iova) { + dma_unlink_range(state); + } else { + mtt =3D (__be64 *)MLX5_ADDR_OF(create_mkey_in, mkey_in, + klm_pas_mtt); + for (i =3D npages - 1; i >=3D 0; i--) { + addr =3D be64_to_cpu(mtt[i]); + dma_unmap_page(state->dev, addr, PAGE_SIZE, state->dir); + } } + dma_free_iova(state); +} =20 - inlen =3D MLX5_ST_SZ_BYTES(create_mkey_in) + - sizeof(__be64) * round_up(npages, 2); +static int register_dma_pages(struct mlx5_core_dev *mdev, u32 npages, + struct page **page_list, u32 *mkey_in, + struct dma_iova_state *state) +{ + dma_addr_t addr; + __be64 *mtt; + int i, err; =20 - return mlx5_core_create_mkey(mdev, mkey, mkey_in, inlen); + WARN_ON_ONCE(state->dir =3D=3D DMA_NONE); + + err =3D dma_alloc_iova(state, npages * PAGE_SIZE); + if (err) + return err; + + dma_set_iova_state(state, page_list[0], PAGE_SIZE); + + mtt =3D (__be64 *)MLX5_ADDR_OF(create_mkey_in, mkey_in, klm_pas_mtt); + err =3D dma_start_range(state); + if (err) { + dma_free_iova(state); + return err; + } + for (i =3D 0; i < npages; i++) { + if (state->use_iova) + addr =3D dma_link_range(state, page_to_phys(page_list[i]), + PAGE_SIZE); + else + addr =3D dma_map_page(mdev->device, page_list[i], 0, + PAGE_SIZE, state->dir); + err =3D dma_mapping_error(mdev->device, addr); + if (err) + goto error; + *mtt++ =3D cpu_to_be64(addr); + } + dma_end_range(state); + + return 0; + +error: + unregister_dma_pages(mdev, i, mkey_in, state); + return err; } =20 static int mlx5vf_dma_data_buffer(struct mlx5_vhca_data_buffer *buf) @@ -379,50 +432,56 @@ static int mlx5vf_dma_data_buffer(struct mlx5_vhca_da= ta_buffer *buf) if (buf->mkey_in || !buf->npages) return -EINVAL; =20 - ret =3D dma_map_sgtable(mdev->device, &buf->table.sgt, buf->dma_dir, 0); - if (ret) - return ret; - buf->mkey_in =3D alloc_mkey_in(buf->npages, buf->migf->pdn); - if (!buf->mkey_in) { - ret =3D -ENOMEM; - goto err; - } + if (!buf->mkey_in) + return -ENOMEM; =20 - ret =3D create_mkey(mdev, buf->npages, buf, buf->mkey_in, &buf->mkey); + ret =3D register_dma_pages(mdev, buf->npages, buf->page_list, + buf->mkey_in, &buf->state); + if (ret) + goto err_register_dma; + + ret =3D create_mkey(mdev, buf->npages, buf->mkey_in, &buf->mkey); if (ret) goto err_create_mkey; =20 return 0; =20 err_create_mkey: + unregister_dma_pages(mdev, buf->npages, buf->mkey_in, &buf->state); +err_register_dma: kvfree(buf->mkey_in); buf->mkey_in =3D NULL; -err: - dma_unmap_sgtable(mdev->device, &buf->table.sgt, buf->dma_dir, 0); return ret; } =20 +static void free_page_list(u32 npages, struct page **page_list) +{ + int i; + + /* Undo alloc_pages_bulk_array() */ + for (i =3D npages - 1; i >=3D 0; i--) + __free_page(page_list[i]); + + kvfree(page_list); +} + void mlx5vf_free_data_buffer(struct mlx5_vhca_data_buffer *buf) { - struct mlx5_vf_migration_file *migf =3D buf->migf; - struct sg_page_iter sg_iter; + struct mlx5vf_pci_core_device *mvdev =3D buf->migf->mvdev; + struct mlx5_core_dev *mdev =3D mvdev->mdev; =20 - lockdep_assert_held(&migf->mvdev->state_mutex); - WARN_ON(migf->mvdev->mdev_detach); + lockdep_assert_held(&mvdev->state_mutex); + WARN_ON(mvdev->mdev_detach); =20 if (buf->mkey_in) { - mlx5_core_destroy_mkey(migf->mvdev->mdev, buf->mkey); + mlx5_core_destroy_mkey(mdev, buf->mkey); + unregister_dma_pages(mdev, buf->npages, buf->mkey_in, + &buf->state); kvfree(buf->mkey_in); - dma_unmap_sgtable(migf->mvdev->mdev->device, &buf->table.sgt, - buf->dma_dir, 0); } =20 - /* Undo alloc_pages_bulk_array() */ - for_each_sgtable_page(&buf->table.sgt, &sg_iter, 0) - __free_page(sg_page_iter_page(&sg_iter)); - sg_free_append_table(&buf->table); - kvfree(buf->page_list); + free_page_list(buf->npages, buf->page_list); kfree(buf); } =20 @@ -433,7 +492,6 @@ static int mlx5vf_add_migration_pages(struct mlx5_vhca_= data_buffer *buf, struct page **page_list; unsigned long filled; unsigned int to_fill; - int ret; =20 to_fill =3D min_t(unsigned int, npages, PAGE_SIZE / sizeof(*buf->page_lis= t)); page_list =3D kvzalloc(to_fill * sizeof(*buf->page_list), GFP_KERNEL_ACCO= UNT); @@ -443,22 +501,13 @@ static int mlx5vf_add_migration_pages(struct mlx5_vhc= a_data_buffer *buf, buf->page_list =3D page_list; =20 do { - filled =3D alloc_pages_bulk_array(GFP_KERNEL_ACCOUNT, to_fill, - buf->page_list + buf->npages); + filled =3D alloc_pages_bulk_array(GFP_KERNEL_ACCOUNT, to_alloc, + buf->page_list + buf->npages); if (!filled) return -ENOMEM; =20 to_alloc -=3D filled; - ret =3D sg_alloc_append_table_from_pages( - &buf->table, buf->page_list + buf->npages, filled, 0, - filled << PAGE_SHIFT, UINT_MAX, SG_MAX_SINGLE_ALLOC, - GFP_KERNEL_ACCOUNT); - - if (ret) - return ret; buf->npages +=3D filled; - to_fill =3D min_t(unsigned int, to_alloc, - PAGE_SIZE / sizeof(*buf->page_list)); } while (to_alloc > 0); =20 return 0; @@ -468,6 +517,7 @@ struct mlx5_vhca_data_buffer * mlx5vf_alloc_data_buffer(struct mlx5_vf_migration_file *migf, u32 npages, enum dma_data_direction dma_dir) { + struct mlx5_core_dev *mdev =3D migf->mvdev->mdev; struct mlx5_vhca_data_buffer *buf; int ret; =20 @@ -475,7 +525,7 @@ mlx5vf_alloc_data_buffer(struct mlx5_vf_migration_file = *migf, u32 npages, if (!buf) return ERR_PTR(-ENOMEM); =20 - buf->dma_dir =3D dma_dir; + dma_init_iova_state(&buf->state, mdev->device, dma_dir); buf->migf =3D migf; if (npages) { ret =3D mlx5vf_add_migration_pages(buf, npages); @@ -518,7 +568,7 @@ mlx5vf_get_data_buffer(struct mlx5_vf_migration_file *m= igf, u32 npages, =20 spin_lock_irq(&migf->list_lock); list_for_each_entry_safe(buf, temp_buf, &migf->avail_list, buf_elm) { - if (buf->dma_dir =3D=3D dma_dir) { + if (buf->state.dir =3D=3D dma_dir) { list_del_init(&buf->buf_elm); if (buf->npages >=3D npages) { spin_unlock_irq(&migf->list_lock); @@ -1340,17 +1390,6 @@ static void mlx5vf_destroy_qp(struct mlx5_core_dev *= mdev, kfree(qp); } =20 -static void free_recv_pages(struct mlx5_vhca_recv_buf *recv_buf) -{ - int i; - - /* Undo alloc_pages_bulk_array() */ - for (i =3D 0; i < recv_buf->npages; i++) - __free_page(recv_buf->page_list[i]); - - kvfree(recv_buf->page_list); -} - static int alloc_recv_pages(struct mlx5_vhca_recv_buf *recv_buf, unsigned int npages) { @@ -1386,45 +1425,6 @@ static int alloc_recv_pages(struct mlx5_vhca_recv_bu= f *recv_buf, kvfree(recv_buf->page_list); return -ENOMEM; } -static void unregister_dma_pages(struct mlx5_core_dev *mdev, u32 npages, - u32 *mkey_in) -{ - dma_addr_t addr; - __be64 *mtt; - int i; - - mtt =3D (__be64 *)MLX5_ADDR_OF(create_mkey_in, mkey_in, klm_pas_mtt); - for (i =3D npages - 1; i >=3D 0; i--) { - addr =3D be64_to_cpu(mtt[i]); - dma_unmap_single(mdev->device, addr, PAGE_SIZE, - DMA_FROM_DEVICE); - } -} - -static int register_dma_pages(struct mlx5_core_dev *mdev, u32 npages, - struct page **page_list, u32 *mkey_in) -{ - dma_addr_t addr; - __be64 *mtt; - int i; - - mtt =3D (__be64 *)MLX5_ADDR_OF(create_mkey_in, mkey_in, klm_pas_mtt); - - for (i =3D 0; i < npages; i++) { - addr =3D dma_map_page(mdev->device, page_list[i], 0, PAGE_SIZE, - DMA_FROM_DEVICE); - if (dma_mapping_error(mdev->device, addr)) - goto error; - - *mtt++ =3D cpu_to_be64(addr); - } - - return 0; - -error: - unregister_dma_pages(mdev, i, mkey_in); - return -ENOMEM; -} =20 static void mlx5vf_free_qp_recv_resources(struct mlx5_core_dev *mdev, struct mlx5_vhca_qp *qp) @@ -1432,9 +1432,10 @@ static void mlx5vf_free_qp_recv_resources(struct mlx= 5_core_dev *mdev, struct mlx5_vhca_recv_buf *recv_buf =3D &qp->recv_buf; =20 mlx5_core_destroy_mkey(mdev, recv_buf->mkey); - unregister_dma_pages(mdev, recv_buf->npages, recv_buf->mkey_in); + unregister_dma_pages(mdev, recv_buf->npages, recv_buf->mkey_in, + &recv_buf->state); kvfree(recv_buf->mkey_in); - free_recv_pages(&qp->recv_buf); + free_page_list(recv_buf->npages, recv_buf->page_list); } =20 static int mlx5vf_alloc_qp_recv_resources(struct mlx5_core_dev *mdev, @@ -1455,25 +1456,25 @@ static int mlx5vf_alloc_qp_recv_resources(struct ml= x5_core_dev *mdev, goto end; } =20 + recv_buf->state.dir =3D DMA_FROM_DEVICE; err =3D register_dma_pages(mdev, npages, recv_buf->page_list, - recv_buf->mkey_in); + recv_buf->mkey_in, &recv_buf->state); if (err) goto err_register_dma; =20 - err =3D create_mkey(mdev, npages, NULL, recv_buf->mkey_in, - &recv_buf->mkey); + err =3D create_mkey(mdev, npages, recv_buf->mkey_in, &recv_buf->mkey); if (err) goto err_create_mkey; =20 return 0; =20 err_create_mkey: - unregister_dma_pages(mdev, npages, recv_buf->mkey_in); + unregister_dma_pages(mdev, npages, recv_buf->mkey_in, &recv_buf->state); err_register_dma: kvfree(recv_buf->mkey_in); recv_buf->mkey_in =3D NULL; end: - free_recv_pages(recv_buf); + free_page_list(npages, recv_buf->page_list); return err; } =20 diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index 5b764199db53..8b0cd0ee11a0 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -54,20 +54,15 @@ struct mlx5_vf_migration_header { =20 struct mlx5_vhca_data_buffer { struct page **page_list; - struct sg_append_table table; + struct dma_iova_state state; loff_t start_pos; u64 length; u32 npages; u32 mkey; u32 *mkey_in; - enum dma_data_direction dma_dir; u8 stop_copy_chunk_num; struct list_head buf_elm; struct mlx5_vf_migration_file *migf; - /* Optimize mlx5vf_get_migration_page() for sequential access */ - struct scatterlist *last_offset_sg; - unsigned int sg_last_entry; - unsigned long last_offset; }; =20 struct mlx5vf_async_data { @@ -134,6 +129,7 @@ struct mlx5_vhca_cq { struct mlx5_vhca_recv_buf { u32 npages; struct page **page_list; + struct dma_iova_state state; u32 next_rq_offset; u32 *mkey_in; u32 mkey; diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index d899cd499e27..f395b526e0ef 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -34,35 +34,10 @@ static struct mlx5vf_pci_core_device *mlx5vf_drvdata(st= ruct pci_dev *pdev) core_device); } =20 -struct page * -mlx5vf_get_migration_page(struct mlx5_vhca_data_buffer *buf, - unsigned long offset) +struct page *mlx5vf_get_migration_page(struct mlx5_vhca_data_buffer *buf, + unsigned long offset) { - unsigned long cur_offset =3D 0; - struct scatterlist *sg; - unsigned int i; - - /* All accesses are sequential */ - if (offset < buf->last_offset || !buf->last_offset_sg) { - buf->last_offset =3D 0; - buf->last_offset_sg =3D buf->table.sgt.sgl; - buf->sg_last_entry =3D 0; - } - - cur_offset =3D buf->last_offset; - - for_each_sg(buf->last_offset_sg, sg, - buf->table.sgt.orig_nents - buf->sg_last_entry, i) { - if (offset < sg->length + cur_offset) { - buf->last_offset_sg =3D sg; - buf->sg_last_entry +=3D i; - buf->last_offset =3D cur_offset; - return nth_page(sg_page(sg), - (offset - cur_offset) / PAGE_SIZE); - } - cur_offset +=3D sg->length; - } - return NULL; + return buf->page_list[offset / PAGE_SIZE]; } =20 static void mlx5vf_disable_fd(struct mlx5_vf_migration_file *migf) @@ -121,7 +96,7 @@ static void mlx5vf_buf_read_done(struct mlx5_vhca_data_b= uffer *vhca_buf) struct mlx5_vf_migration_file *migf =3D vhca_buf->migf; =20 if (vhca_buf->stop_copy_chunk_num) { - bool is_header =3D vhca_buf->dma_dir =3D=3D DMA_NONE; + bool is_header =3D vhca_buf->state.dir =3D=3D DMA_NONE; u8 chunk_num =3D vhca_buf->stop_copy_chunk_num; size_t next_required_umem_size =3D 0; =20 --=20 2.46.0 From nobody Sat Nov 30 01:44:58 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E22571BD4E7; Thu, 12 Sep 2024 11:17:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139834; cv=none; b=gTsX02G95J/YpN0gChPOC6laKg0f7TozW1O/n9nX2e31uTrtQASUvGmh/DVO1gFtnYmgYgh2zn1DEIsg1bK7TGlHvH6d1rameBKxpQr3XzpR6Cqb666BgNX5dC7MZPdQopHeuIs1AsXN4AF5ZoKfwrwKmF0uqFOKQsMo2+hml34= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139834; c=relaxed/simple; bh=6/tc2ugkTwWAFbLVbxAp24kBUNCe2t7dWg8SzqJRz7s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BAVyB+rBx9Rkbo/cpOIjX4j/3FoB6Mv+W9XObe+zenwFFWeUcfRm95xarM9SzlYBk7w2tICF9cZWuFaujdCUjOk+T8+xgva5fVp0+dMo0HLZ5QPlNRsmlM3AeXS1UwXdZHdSvo9oOsNPqhaqdRbsCvI2Ty1LsWNqcY9qmPlUw88= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Xr8vEFr1; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Xr8vEFr1" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 02A2FC4CEC3; Thu, 12 Sep 2024 11:17:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726139833; bh=6/tc2ugkTwWAFbLVbxAp24kBUNCe2t7dWg8SzqJRz7s=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Xr8vEFr1DBAnFhLFzCX/ZunFrp1xgq07DQNQ6E4SLge/45rmHGLi/xIM/74GFsJFy DblrLX/Paxj097l/XVrGmAEAGjBmNAPhhYMAYhI75lkR60W7Rv92gMkz81H6ZIwBGw AAFtnY/3JoE6a4DOYz8fqOV3WqwNG8ugzOPPEA6BL4yTXDI2WsJ0p569Ke0Qjc6Tls Y9acUkiss7PZkIFREINyhR2/pEWITABWRwwBwbuLSAd2W41RsvHksDG9Lln6FB6B97 ukDvlvhuUGueTTORLCsLbcVzFqp9H/eO2Wg7hI+EmTp+BvHdbdrfwpoJ1hqzhrJUKG FYCii5JzIKsAQ== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 18/21] nvme-pci: remove optimizations for single DMA entry Date: Thu, 12 Sep 2024 14:15:53 +0300 Message-ID: <875d92e2c453649e9d95080f27e631f196270008.1726138681.git.leon@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Leon Romanovsky Future patches will remove SG table allocation from the NVMe PCI code, which these single DMA entries tried to optimize. As a preparation, let's remove them to unify the DMA mapping code. Signed-off-by: Leon Romanovsky --- drivers/nvme/host/pci.c | 69 ----------------------------------------- 1 file changed, 69 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 6cd9395ba9ec..a9a66f184138 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -233,7 +233,6 @@ struct nvme_iod { bool aborted; s8 nr_allocations; /* PRP list pool allocations. 0 means small pool in use */ - unsigned int dma_len; /* length of single DMA segment mapping */ dma_addr_t first_dma; dma_addr_t meta_dma; struct sg_table sgt; @@ -541,12 +540,6 @@ static void nvme_unmap_data(struct nvme_dev *dev, stru= ct request *req) { struct nvme_iod *iod =3D blk_mq_rq_to_pdu(req); =20 - if (iod->dma_len) { - dma_unmap_page(dev->dev, iod->first_dma, iod->dma_len, - rq_dma_dir(req)); - return; - } - WARN_ON_ONCE(!iod->sgt.nents); =20 dma_unmap_sgtable(dev->dev, &iod->sgt, rq_dma_dir(req), 0); @@ -696,11 +689,6 @@ static blk_status_t nvme_pci_setup_sgls(struct nvme_de= v *dev, /* setting the transfer type as SGL */ cmd->flags =3D NVME_CMD_SGL_METABUF; =20 - if (entries =3D=3D 1) { - nvme_pci_sgl_set_data(&cmd->dptr.sgl, sg); - return BLK_STS_OK; - } - if (entries <=3D (256 / sizeof(struct nvme_sgl_desc))) { pool =3D dev->prp_small_pool; iod->nr_allocations =3D 0; @@ -727,45 +715,6 @@ static blk_status_t nvme_pci_setup_sgls(struct nvme_de= v *dev, return BLK_STS_OK; } =20 -static blk_status_t nvme_setup_prp_simple(struct nvme_dev *dev, - struct request *req, struct nvme_rw_command *cmnd, - struct bio_vec *bv) -{ - struct nvme_iod *iod =3D blk_mq_rq_to_pdu(req); - unsigned int offset =3D bv->bv_offset & (NVME_CTRL_PAGE_SIZE - 1); - unsigned int first_prp_len =3D NVME_CTRL_PAGE_SIZE - offset; - - iod->first_dma =3D dma_map_bvec(dev->dev, bv, rq_dma_dir(req), 0); - if (dma_mapping_error(dev->dev, iod->first_dma)) - return BLK_STS_RESOURCE; - iod->dma_len =3D bv->bv_len; - - cmnd->dptr.prp1 =3D cpu_to_le64(iod->first_dma); - if (bv->bv_len > first_prp_len) - cmnd->dptr.prp2 =3D cpu_to_le64(iod->first_dma + first_prp_len); - else - cmnd->dptr.prp2 =3D 0; - return BLK_STS_OK; -} - -static blk_status_t nvme_setup_sgl_simple(struct nvme_dev *dev, - struct request *req, struct nvme_rw_command *cmnd, - struct bio_vec *bv) -{ - struct nvme_iod *iod =3D blk_mq_rq_to_pdu(req); - - iod->first_dma =3D dma_map_bvec(dev->dev, bv, rq_dma_dir(req), 0); - if (dma_mapping_error(dev->dev, iod->first_dma)) - return BLK_STS_RESOURCE; - iod->dma_len =3D bv->bv_len; - - cmnd->flags =3D NVME_CMD_SGL_METABUF; - cmnd->dptr.sgl.addr =3D cpu_to_le64(iod->first_dma); - cmnd->dptr.sgl.length =3D cpu_to_le32(iod->dma_len); - cmnd->dptr.sgl.type =3D NVME_SGL_FMT_DATA_DESC << 4; - return BLK_STS_OK; -} - static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *re= q, struct nvme_command *cmnd) { @@ -773,24 +722,6 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev= , struct request *req, blk_status_t ret =3D BLK_STS_RESOURCE; int rc; =20 - if (blk_rq_nr_phys_segments(req) =3D=3D 1) { - struct nvme_queue *nvmeq =3D req->mq_hctx->driver_data; - struct bio_vec bv =3D req_bvec(req); - - if (!is_pci_p2pdma_page(bv.bv_page)) { - if ((bv.bv_offset & (NVME_CTRL_PAGE_SIZE - 1)) + - bv.bv_len <=3D NVME_CTRL_PAGE_SIZE * 2) - return nvme_setup_prp_simple(dev, req, - &cmnd->rw, &bv); - - if (nvmeq->qid && sgl_threshold && - nvme_ctrl_sgl_supported(&dev->ctrl)) - return nvme_setup_sgl_simple(dev, req, - &cmnd->rw, &bv); - } - } - - iod->dma_len =3D 0; iod->sgt.sgl =3D mempool_alloc(dev->iod_mempool, GFP_ATOMIC); if (!iod->sgt.sgl) return BLK_STS_RESOURCE; --=20 2.46.0 From nobody Sat Nov 30 01:44:58 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4D0101BBBCB; Thu, 12 Sep 2024 11:17:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139850; cv=none; b=sHfOuN4fR0Oq+D3V6FWCudeKmNoS3KZ5lmwPp3nEVhBhBvducIBr/lD517gFsPv56y3Pz6xKWzpNHFIBFmA8ojK/RQpOOiFQxJAhyHsJ9nq0EGUtahCqUaXVy6U3hATeRwsVBTtf/YE/7TEwzfT/7jxJ1lY296mX4zs42s85CY0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139850; c=relaxed/simple; bh=Ch307cuLMtu9IkxYPq9fEPmV/1GlNElaatF0p6+K/3o=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fpym9wnWzy48ngUNVe2+2Mazkp9ae4Lh6KSykxQtPULo2+U5AZmWTPXkXMwbmez3p9ZH5pKWECs/VP89Bx9z4R0PAWfAVmSU2LeN3npFFFxFWc1eSBnP4esgrmy01GwZA3UYrtrx5M3kQ61VUPQ+2B+tvKDds7o9gykfrSW4/sA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=rjQEVR0W; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="rjQEVR0W" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0B911C4CECE; Thu, 12 Sep 2024 11:17:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726139849; bh=Ch307cuLMtu9IkxYPq9fEPmV/1GlNElaatF0p6+K/3o=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=rjQEVR0WNIDFIumqiRWFBZ/LeaRIQrWt7cRNRpwd5Qr3Ery2891OThDNpX5O70OkA 3u+K7SybOwx4RJVpNWxWVcp9dPJOARjRDtQ5bfDWc6ZBdwjeLjroi/LGRBt7k8iEnk rrKGRP00PzsLM6BQjTbD0bOTaSYxwnYo6ipA4GhON7Q3sfr8WoiFk8o1QO2Fv+ILhF /xfjffj8UsYtxpX9Wuzy0IcN8Uxvpj7UQIuJSgTBOPVfFQgt+1eyO4AqZDO/l5tLYd 5CdaCvA6GAmBErDw8MkhJzDPoXCGNeg9rN3Ks4+pahOUrr4xwY6R/1BbLfmuF5yF5c mxeUvTEss0paA== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 19/21] nvme-pci: precalculate number of DMA entries for each command Date: Thu, 12 Sep 2024 14:15:54 +0300 Message-ID: <8c5b0e5ab1716166fc93e76cb2d3e01ca9cf8769.1726138681.git.leon@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Leon Romanovsky Calculate the number of DMA entries for each command in the request in advance. Signed-off-by: Leon Romanovsky --- drivers/nvme/host/pci.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index a9a66f184138..2b236b1d209e 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -231,6 +231,7 @@ struct nvme_iod { struct nvme_request req; struct nvme_command cmd; bool aborted; + u8 nr_dmas; s8 nr_allocations; /* PRP list pool allocations. 0 means small pool in use */ dma_addr_t first_dma; @@ -766,6 +767,23 @@ static blk_status_t nvme_map_metadata(struct nvme_dev = *dev, struct request *req, return BLK_STS_OK; } =20 +static u8 nvme_calc_num_dmas(struct request *req) +{ + struct bio_vec bv; + u8 nr_dmas; + + if (blk_rq_nr_phys_segments(req) =3D=3D 0) + return 0; + + nr_dmas =3D DIV_ROUND_UP(blk_rq_payload_bytes(req), NVME_CTRL_PAGE_SIZE); + bv =3D req_bvec(req); + if (bv.bv_offset && (bv.bv_offset + bv.bv_len) >=3D NVME_CTRL_PAGE_SIZE) + /* Accommodate for unaligned first page */ + nr_dmas++; + + return nr_dmas; +} + static blk_status_t nvme_prep_rq(struct nvme_dev *dev, struct request *req) { struct nvme_iod *iod =3D blk_mq_rq_to_pdu(req); @@ -779,6 +797,8 @@ static blk_status_t nvme_prep_rq(struct nvme_dev *dev, = struct request *req) if (ret) return ret; =20 + iod->nr_dmas =3D nvme_calc_num_dmas(req); + if (blk_rq_nr_phys_segments(req)) { ret =3D nvme_map_data(dev, req, &iod->cmd); if (ret) --=20 2.46.0 From nobody Sat Nov 30 01:44:58 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E597C1BE238; Thu, 12 Sep 2024 11:17:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139842; cv=none; b=pbI+iv+jHUHBlf9J0RF99ENzitT2W/4WmrsC5JGd+2S+4PUQsDskCQ0GqhDoIDpObk2m4uVBsZO3DvRjOsv5NMbKfRv8Y7YQTGp9YVRA2AwvB9q7kTEZsWdZ0dHZMMfV6/mrEpCq9OIp+hCWdplnn2Ps5Y9h3j3FXaX80viz5TE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139842; c=relaxed/simple; bh=lOZJXJxWXIBcVAPdAmMr2ULkiHyK5NnvqvNZOrmoKEg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=nHD2oJT1AayeJPgENBQ5zE3QB1ALhnVypA2juKIElrqUKVPt8lRT8dMJ2ZgBzkiyBqLcIH4y/SUR7k+id9EfrmJIPjrOeml+uNBGvMzv1W2xLypjajCUx8oLWaU3i2ryuaa/Vv16NDflPuqckCG2fJbk3sB5/nvYgmGraLWaRqc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Aa4jqYp4; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Aa4jqYp4" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EBFD2C4CEC3; Thu, 12 Sep 2024 11:17:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726139841; bh=lOZJXJxWXIBcVAPdAmMr2ULkiHyK5NnvqvNZOrmoKEg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Aa4jqYp4u1ExCOP7/w9kgMgTLanlFjcSENpCvHwLxZJG24/4JEMDB8WrWR/ZXE8kY Juyav+p0VQovEVj9JAlq+j6aeQwGcbFAnb01BLl/tXzu7uKtwokGJL0ZctSl0LoCs7 P3QuWaV5EUXW7Wouq24VW1LbPHzZdmmoKaQGTyPu2gT81+/pLyiftxi0OCxKlymzGX oU3F0T4vwXclQxcMSER6kRBjONpvJzo7+ESdvUT5S/Y5RorNZdQCwAdOoxlqTbphSz Ik8xt5/NptqmHyfH7KdmzxpRq1LwwSbYqDHqX56CkwE+ey2gSFTwy5a0IjjPmgrzgt eRh1dP9PiSeNg== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 20/21] nvme-pci: use new dma API Date: Thu, 12 Sep 2024 14:15:55 +0300 Message-ID: X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Leon Romanovsky This demonstrates how the new DMA API can fit into the NVMe driver and replace the old DMA APIs. As this is an RFC, I expect more robust error handling, optimizations, and in-depth testing for the final version once we agree on DMA API architecture. Following is the performance comparision for existing DMA API case with sg_table and with dma_map, once we have agreement on the new DMA API design I intend to get similar profiling numbers for new DMA API. sgl (sg_table + old dma API ) vs no_sgl (iod_dma_map + new DMA API) :- block size IOPS (k) Average of 3 4K -------------------------------------------------------------- sg-list-fio-perf.bs-4k-1.fio: 68.6 sg-list-fio-perf.bs-4k-2.fio: 68 68.36 sg-list-fio-perf.bs-4k-3.fio: 68.5 no-sg-list-fio-perf.bs-4k-1.fio: 68.7 no-sg-list-fio-perf.bs-4k-2.fio: 68.5 68.43 no-sg-list-fio-perf.bs-4k-3.fio: 68.1 % Change default vs new DMA API =3D +0.0975% 8K -------------------------------------------------------------- sg-list-fio-perf.bs-8k-1.fio: 67 sg-list-fio-perf.bs-8k-2.fio: 67.1 67.03 sg-list-fio-perf.bs-8k-3.fio: 67 no-sg-list-fio-perf.bs-8k-1.fio: 66.7 no-sg-list-fio-perf.bs-8k-2.fio: 66.7 66.7 no-sg-list-fio-perf.bs-8k-3.fio: 66.7 % Change default vs new DMA API =3D +0.4993% 16K -------------------------------------------------------------- sg-list-fio-perf.bs-16k-1.fio: 63.8 sg-list-fio-perf.bs-16k-2.fio: 63.4 63.5 sg-list-fio-perf.bs-16k-3.fio: 63.3 no-sg-list-fio-perf.bs-16k-1.fio: 63.5 no-sg-list-fio-perf.bs-16k-2.fio: 63.4 63.33 no-sg-list-fio-perf.bs-16k-3.fio: 63.1 % Change default vs new DMA API =3D -0.2632% 32K -------------------------------------------------------------- sg-list-fio-perf.bs-32k-1.fio: 59.3 sg-list-fio-perf.bs-32k-2.fio: 59.3 59.36 sg-list-fio-perf.bs-32k-3.fio: 59.5 no-sg-list-fio-perf.bs-32k-1.fio: 59.5 no-sg-list-fio-perf.bs-32k-2.fio: 59.6 59.43 no-sg-list-fio-perf.bs-32k-3.fio: 59.2 % Change default vs new DMA API =3D +0.1122% 64K -------------------------------------------------------------- sg-list-fio-perf.bs-64k-1.fio: 53.7 sg-list-fio-perf.bs-64k-2.fio: 53.4 53.56 sg-list-fio-perf.bs-64k-3.fio: 53.6 no-sg-list-fio-perf.bs-64k-1.fio: 53.5 no-sg-list-fio-perf.bs-64k-2.fio: 53.8 53.63 no-sg-list-fio-perf.bs-64k-3.fio: 53.6 % Change default vs new DMA API =3D +0.1246% 128K -------------------------------------------------------------- sg-list-fio-perf/bs-128k-1.fio: 48 sg-list-fio-perf/bs-128k-2.fio: 46.4 47.13 sg-list-fio-perf/bs-128k-3.fio: 47 no-sg-list-fio-perf/bs-128k-1.fio: 46.6 no-sg-list-fio-perf/bs-128k-2.fio: 47 46.9 no-sg-list-fio-perf/bs-128k-3.fio: 47.1 % Change default vs new DMA API =3D =E2=88=920.495% 256K -------------------------------------------------------------- sg-list-fio-perf/bs-256k-1.fio: 37 sg-list-fio-perf/bs-256k-2.fio: 41 39.93 sg-list-fio-perf/bs-256k-3.fio: 41.8 no-sg-list-fio-perf/bs-256k-1.fio: 37.5 no-sg-list-fio-perf/bs-256k-2.fio: 41.4 40.5 no-sg-list-fio-perf/bs-256k-3.fio: 42.6 % Change default vs new DMA API =3D +1.42% 512K -------------------------------------------------------------- sg-list-fio-perf/bs-512k-1.fio: 28.5 sg-list-fio-perf/bs-512k-2.fio: 28.2 28.4 sg-list-fio-perf/bs-512k-3.fio: 28.5 no-sg-list-fio-perf/bs-512k-1.fio: 28.7 no-sg-list-fio-perf/bs-512k-2.fio: 28.6 28.7 no-sg-list-fio-perf/bs-512k-3.fio: 28.8 % Change default vs new DMA API =3D +1.06% Signed-off-by: Chaitanya Kulkarni Signed-off-by: Leon Romanovsky --- drivers/nvme/host/pci.c | 354 ++++++++++++++++++++++++---------------- 1 file changed, 215 insertions(+), 139 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 2b236b1d209e..881cbf2c0cac 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -221,6 +221,12 @@ union nvme_descriptor { __le64 *prp_list; }; =20 +/* TODO: move to common header */ +struct dma_entry { + dma_addr_t addr; + unsigned int len; +}; + /* * The nvme_iod describes the data in an I/O. * @@ -234,9 +240,11 @@ struct nvme_iod { u8 nr_dmas; s8 nr_allocations; /* PRP list pool allocations. 0 means small pool in use */ + struct dma_iova_state state; + struct dma_entry dma; + struct dma_entry *map; dma_addr_t first_dma; dma_addr_t meta_dma; - struct sg_table sgt; union nvme_descriptor list[NVME_MAX_NR_ALLOCATIONS]; }; =20 @@ -540,10 +548,9 @@ static void nvme_free_prps(struct nvme_dev *dev, struc= t request *req) static void nvme_unmap_data(struct nvme_dev *dev, struct request *req) { struct nvme_iod *iod =3D blk_mq_rq_to_pdu(req); - - WARN_ON_ONCE(!iod->sgt.nents); - - dma_unmap_sgtable(dev->dev, &iod->sgt, rq_dma_dir(req), 0); + struct req_iterator iter; + struct bio_vec bv; + int cnt =3D 0; =20 if (iod->nr_allocations =3D=3D 0) dma_pool_free(dev->prp_small_pool, iod->list[0].sg_list, @@ -553,20 +560,17 @@ static void nvme_unmap_data(struct nvme_dev *dev, str= uct request *req) iod->first_dma); else nvme_free_prps(dev, req); - mempool_free(iod->sgt.sgl, dev->iod_mempool); -} =20 -static void nvme_print_sgl(struct scatterlist *sgl, int nents) -{ - int i; - struct scatterlist *sg; - - for_each_sg(sgl, sg, nents, i) { - dma_addr_t phys =3D sg_phys(sg); - pr_warn("sg[%d] phys_addr:%pad offset:%d length:%d " - "dma_address:%pad dma_length:%d\n", - i, &phys, sg->offset, sg->length, &sg_dma_address(sg), - sg_dma_len(sg)); + if (iod->map) { + rq_for_each_bvec(bv, req, iter) { + dma_unmap_page(dev->dev, iod->map[cnt].addr, + iod->map[cnt].len, rq_dma_dir(req)); + cnt++; + } + kfree(iod->map); + } else { + dma_unlink_range(&iod->state); + dma_free_iova(&iod->state); } } =20 @@ -574,97 +578,63 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_d= ev *dev, struct request *req, struct nvme_rw_command *cmnd) { struct nvme_iod *iod =3D blk_mq_rq_to_pdu(req); - struct dma_pool *pool; - int length =3D blk_rq_payload_bytes(req); - struct scatterlist *sg =3D iod->sgt.sgl; - int dma_len =3D sg_dma_len(sg); - u64 dma_addr =3D sg_dma_address(sg); - int offset =3D dma_addr & (NVME_CTRL_PAGE_SIZE - 1); - __le64 *prp_list; - dma_addr_t prp_dma; - int nprps, i; - - length -=3D (NVME_CTRL_PAGE_SIZE - offset); - if (length <=3D 0) { - iod->first_dma =3D 0; - goto done; - } - - dma_len -=3D (NVME_CTRL_PAGE_SIZE - offset); - if (dma_len) { - dma_addr +=3D (NVME_CTRL_PAGE_SIZE - offset); - } else { - sg =3D sg_next(sg); - dma_addr =3D sg_dma_address(sg); - dma_len =3D sg_dma_len(sg); - } + __le64 *prp_list =3D iod->list[0].prp_list; + int i =3D 0, idx =3D 0; + struct bio_vec bv; + struct req_iterator iter; + dma_addr_t offset =3D 0; =20 - if (length <=3D NVME_CTRL_PAGE_SIZE) { - iod->first_dma =3D dma_addr; - goto done; + if (iod->nr_dmas <=3D 2) { + i =3D iod->nr_dmas; + /* We can use the inline PRP/SG list */ + goto set_addr; } =20 - nprps =3D DIV_ROUND_UP(length, NVME_CTRL_PAGE_SIZE); - if (nprps <=3D (256 / 8)) { - pool =3D dev->prp_small_pool; - iod->nr_allocations =3D 0; - } else { - pool =3D dev->prp_page_pool; - iod->nr_allocations =3D 1; - } + rq_for_each_bvec(bv, req, iter) { + dma_addr_t addr; =20 - prp_list =3D dma_pool_alloc(pool, GFP_ATOMIC, &prp_dma); - if (!prp_list) { - iod->nr_allocations =3D -1; - return BLK_STS_RESOURCE; - } - iod->list[0].prp_list =3D prp_list; - iod->first_dma =3D prp_dma; - i =3D 0; - for (;;) { - if (i =3D=3D NVME_CTRL_PAGE_SIZE >> 3) { - __le64 *old_prp_list =3D prp_list; - prp_list =3D dma_pool_alloc(pool, GFP_ATOMIC, &prp_dma); - if (!prp_list) - goto free_prps; - iod->list[iod->nr_allocations++].prp_list =3D prp_list; - prp_list[0] =3D old_prp_list[i - 1]; - old_prp_list[i - 1] =3D cpu_to_le64(prp_dma); - i =3D 1; + if (iod->map) + offset =3D 0; + + while (offset < bv.bv_len) { + if (iod->map) + addr =3D iod->map[i].addr; + else + addr =3D iod->dma.addr; + + prp_list[idx] =3D cpu_to_le64(addr + offset); + offset +=3D NVME_CTRL_PAGE_SIZE; + idx++; } - prp_list[i++] =3D cpu_to_le64(dma_addr); - dma_len -=3D NVME_CTRL_PAGE_SIZE; - dma_addr +=3D NVME_CTRL_PAGE_SIZE; - length -=3D NVME_CTRL_PAGE_SIZE; - if (length <=3D 0) - break; - if (dma_len > 0) - continue; - if (unlikely(dma_len < 0)) - goto bad_sgl; - sg =3D sg_next(sg); - dma_addr =3D sg_dma_address(sg); - dma_len =3D sg_dma_len(sg); - } -done: - cmnd->dptr.prp1 =3D cpu_to_le64(sg_dma_address(iod->sgt.sgl)); - cmnd->dptr.prp2 =3D cpu_to_le64(iod->first_dma); + i++; + } + +set_addr: + if (iod->map) + cmnd->dptr.prp1 =3D cpu_to_le64(iod->map[0].addr); + else + cmnd->dptr.prp1 =3D cpu_to_le64(iod->dma.addr); + if (idx =3D=3D 1 && i =3D=3D 1) + cmnd->dptr.prp2 =3D 0; + else if (idx =3D=3D 2 && i =3D=3D 2) + if (iod->map) + cmnd->dptr.prp2 =3D + cpu_to_le64((iod->map[0].addr + NVME_CTRL_PAGE_SIZE) & + ~(NVME_CTRL_PAGE_SIZE - 1)); + else + cmnd->dptr.prp2 =3D + cpu_to_le64((iod->dma.addr + NVME_CTRL_PAGE_SIZE) & + ~(NVME_CTRL_PAGE_SIZE - 1)); + else + cmnd->dptr.prp2 =3D cpu_to_le64(iod->first_dma); return BLK_STS_OK; -free_prps: - nvme_free_prps(dev, req); - return BLK_STS_RESOURCE; -bad_sgl: - WARN(DO_ONCE(nvme_print_sgl, iod->sgt.sgl, iod->sgt.nents), - "Invalid SGL for payload:%d nents:%d\n", - blk_rq_payload_bytes(req), iod->sgt.nents); - return BLK_STS_IOERR; } =20 -static void nvme_pci_sgl_set_data(struct nvme_sgl_desc *sge, - struct scatterlist *sg) +static void nvme_pci_sgl_set_data(struct nvme_sgl_desc *sge, dma_addr_t ad= dr, + int len) { - sge->addr =3D cpu_to_le64(sg_dma_address(sg)); - sge->length =3D cpu_to_le32(sg_dma_len(sg)); + sge->addr =3D cpu_to_le64(addr); + sge->length =3D cpu_to_le32(len); sge->type =3D NVME_SGL_FMT_DATA_DESC << 4; } =20 @@ -680,17 +650,77 @@ static blk_status_t nvme_pci_setup_sgls(struct nvme_d= ev *dev, struct request *req, struct nvme_rw_command *cmd) { struct nvme_iod *iod =3D blk_mq_rq_to_pdu(req); - struct dma_pool *pool; - struct nvme_sgl_desc *sg_list; - struct scatterlist *sg =3D iod->sgt.sgl; - unsigned int entries =3D iod->sgt.nents; - dma_addr_t sgl_dma; - int i =3D 0; + struct nvme_sgl_desc *sg_list =3D iod->list[0].sg_list; + struct bio_vec bv =3D req_bvec(req); + struct req_iterator iter; + int i =3D 0, idx =3D 0; + dma_addr_t offset =3D 0; =20 /* setting the transfer type as SGL */ cmd->flags =3D NVME_CMD_SGL_METABUF; =20 - if (entries <=3D (256 / sizeof(struct nvme_sgl_desc))) { + if (iod->nr_dmas <=3D 1) + /* We can use the inline PRP/SG list */ + goto set_addr; + + rq_for_each_bvec(bv, req, iter) { + dma_addr_t addr; + + if (iod->map) + offset =3D 0; + + while (offset < bv.bv_len) { + if (iod->map) + addr =3D iod->map[i].addr; + else + addr =3D iod->dma.addr; + + nvme_pci_sgl_set_data(&sg_list[idx], addr + offset, + bv.bv_len); + offset +=3D NVME_CTRL_PAGE_SIZE; + idx++; + } + i++; + } + +set_addr: + nvme_pci_sgl_set_seg(&cmd->dptr.sgl, iod->first_dma, + blk_rq_nr_phys_segments(req)); + return BLK_STS_OK; +} + +static void nvme_pci_free_pool(struct nvme_dev *dev, struct request *req) +{ + struct nvme_iod *iod =3D blk_mq_rq_to_pdu(req); + + if (iod->nr_allocations =3D=3D 0) + dma_pool_free(dev->prp_small_pool, iod->list[0].sg_list, + iod->first_dma); + else if (iod->nr_allocations =3D=3D 1) + dma_pool_free(dev->prp_page_pool, iod->list[0].sg_list, + iod->first_dma); + else + nvme_free_prps(dev, req); +} + +static blk_status_t nvme_pci_setup_pool(struct nvme_dev *dev, + struct request *req, bool is_sgl) +{ + struct nvme_iod *iod =3D blk_mq_rq_to_pdu(req); + struct dma_pool *pool; + size_t entry_sz; + dma_addr_t addr; + u8 entries; + void *list; + + if (iod->nr_dmas <=3D 2) + /* Do nothing, we can use the inline PRP/SG list */ + return BLK_STS_OK; + + /* First DMA address goes to prp1 anyway */ + entries =3D iod->nr_dmas - 1; + entry_sz =3D (is_sgl) ? sizeof(struct nvme_sgl_desc) : sizeof(__le64); + if (entries <=3D (256 / entry_sz)) { pool =3D dev->prp_small_pool; iod->nr_allocations =3D 0; } else { @@ -698,21 +728,20 @@ static blk_status_t nvme_pci_setup_sgls(struct nvme_d= ev *dev, iod->nr_allocations =3D 1; } =20 - sg_list =3D dma_pool_alloc(pool, GFP_ATOMIC, &sgl_dma); - if (!sg_list) { + /* TBD: allocate mulitple pools and chain them */ + WARN_ON(entries > 512); + + list =3D dma_pool_alloc(pool, GFP_ATOMIC, &addr); + if (!list) { iod->nr_allocations =3D -1; return BLK_STS_RESOURCE; } =20 - iod->list[0].sg_list =3D sg_list; - iod->first_dma =3D sgl_dma; - - nvme_pci_sgl_set_seg(&cmd->dptr.sgl, sgl_dma, entries); - do { - nvme_pci_sgl_set_data(&sg_list[i++], sg); - sg =3D sg_next(sg); - } while (--entries > 0); - + if (is_sgl) + iod->list[0].sg_list =3D list; + else + iod->list[0].prp_list =3D list; + iod->first_dma =3D addr; return BLK_STS_OK; } =20 @@ -721,36 +750,84 @@ static blk_status_t nvme_map_data(struct nvme_dev *de= v, struct request *req, { struct nvme_iod *iod =3D blk_mq_rq_to_pdu(req); blk_status_t ret =3D BLK_STS_RESOURCE; - int rc; - - iod->sgt.sgl =3D mempool_alloc(dev->iod_mempool, GFP_ATOMIC); - if (!iod->sgt.sgl) + unsigned short n_segments =3D blk_rq_nr_phys_segments(req); + struct bio_vec bv =3D req_bvec(req); + struct req_iterator iter; + dma_addr_t dma_addr; + int rc, cnt =3D 0; + bool is_sgl; + + dma_init_iova_state(&iod->state, dev->dev, rq_dma_dir(req)); + dma_set_iova_state(&iod->state, bv.bv_page, bv.bv_len); + + rc =3D dma_start_range(&iod->state); + if (rc) return BLK_STS_RESOURCE; - sg_init_table(iod->sgt.sgl, blk_rq_nr_phys_segments(req)); - iod->sgt.orig_nents =3D blk_rq_map_sg(req->q, req, iod->sgt.sgl); - if (!iod->sgt.orig_nents) - goto out_free_sg; =20 - rc =3D dma_map_sgtable(dev->dev, &iod->sgt, rq_dma_dir(req), - DMA_ATTR_NO_WARN); - if (rc) { - if (rc =3D=3D -EREMOTEIO) - ret =3D BLK_STS_TARGET; - goto out_free_sg; + iod->dma.len =3D 0; + iod->dma.addr =3D 0; + + if (dma_can_use_iova(&iod->state)) { + iod->map =3D NULL; + rc =3D dma_alloc_iova_unaligned(&iod->state, bvec_phys(&bv), + blk_rq_payload_bytes(req)); + if (rc) + return BLK_STS_RESOURCE; + + rq_for_each_bvec(bv, req, iter) { + dma_addr =3D dma_link_range(&iod->state, bvec_phys(&bv), + bv.bv_len); + if (dma_mapping_error(dev->dev, dma_addr)) + goto out_free; + + if (!iod->dma.addr) + iod->dma.addr =3D dma_addr; + } + WARN_ON(blk_rq_payload_bytes(req) !=3D iod->state.range_size); + } else { + iod->map =3D kmalloc_array(n_segments, sizeof(*iod->map), + GFP_ATOMIC); + if (!iod->map) + return BLK_STS_RESOURCE; + + rq_for_each_bvec(bv, req, iter) { + dma_addr =3D dma_map_bvec(dev->dev, &bv, rq_dma_dir(req), 0); + if (dma_mapping_error(dev->dev, dma_addr)) + goto out_free; + + iod->map[cnt].addr =3D dma_addr; + iod->map[cnt].len =3D bv.bv_len; + cnt++; + } } + dma_end_range(&iod->state); =20 - if (nvme_pci_use_sgls(dev, req, iod->sgt.nents)) + is_sgl =3D nvme_pci_use_sgls(dev, req, n_segments); + ret =3D nvme_pci_setup_pool(dev, req, is_sgl); + if (ret !=3D BLK_STS_OK) + goto out_free; + + if (is_sgl) ret =3D nvme_pci_setup_sgls(dev, req, &cmnd->rw); else ret =3D nvme_pci_setup_prps(dev, req, &cmnd->rw); if (ret !=3D BLK_STS_OK) - goto out_unmap_sg; + goto out_free_pool; + return BLK_STS_OK; =20 -out_unmap_sg: - dma_unmap_sgtable(dev->dev, &iod->sgt, rq_dma_dir(req), 0); -out_free_sg: - mempool_free(iod->sgt.sgl, dev->iod_mempool); +out_free_pool: + nvme_pci_free_pool(dev, req); +out_free: + if (iod->map) { + while (cnt--) + dma_unmap_page(dev->dev, iod->map[cnt].addr, + iod->map[cnt].len, rq_dma_dir(req)); + kfree(iod->map); + } else { + dma_unlink_range(&iod->state); + dma_free_iova(&iod->state); + } return ret; } =20 @@ -791,7 +868,6 @@ static blk_status_t nvme_prep_rq(struct nvme_dev *dev, = struct request *req) =20 iod->aborted =3D false; iod->nr_allocations =3D -1; - iod->sgt.nents =3D 0; =20 ret =3D nvme_setup_cmd(req->q->queuedata, req); if (ret) --=20 2.46.0 From nobody Sat Nov 30 01:44:58 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DA2361BF335; Thu, 12 Sep 2024 11:17:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139846; cv=none; b=Myxsxv3MpNYmtflcuFPcSr61wxn0mfEuY8ZlfZ3x8URBuTGHwZeD2s5j6EewBf4rYVNo0JcBbGC8QkSsIQo6yrfO9C4yY4BfykyWycvzBMxzrQwa8+ftAY68AV8xsC4Wcd4RN44BJgu2PvVktqdnhmdXqZvcik3Vb5X15tbVMgE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726139846; c=relaxed/simple; bh=HzFyylaMePR5bS7VyKQApbkFy4MWvNmh8/Ifgo8KsU0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=X3k83cql+lGRLfEG2mznFjbvqmxe2GBmSacn8UrcLCoAa81N0pdiByeY+QUZOZl8GOoBQa1ewTZFSbWl7UtD3OyVfnaClBhtJdQ7DwRx+MfCR5z2s5ODNeB0B4/I01s6n9RIPLDKTqlcr6e8kG0iZw+Vmui3SzpDEhPnFc/A5Ro= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=iMt/9VHS; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="iMt/9VHS" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EFE14C4CEC5; Thu, 12 Sep 2024 11:17:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726139845; bh=HzFyylaMePR5bS7VyKQApbkFy4MWvNmh8/Ifgo8KsU0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=iMt/9VHSYIlDpVGdX26n1N0CdVrTRm10It8Sa8NMHcf+17B/DEKPd5jcRIQstDF5l y/zknkTz0QWWnnaDNoU/w5MrYILx1JHP+V4tz9u84Cewb0+/xN0X7Xobu9RYdt4sE3 8QSp8DqCuKUgnnC5JLgioLFGolGO2jWjErhnFWAI64U90eRPEY/a80PM1URWvna2tv EjnP1bBa2i6W+ZV7YFLYJDEIpcKOoFD0i83H5rqEd6sFjATC3qHX2bD88Wi/OivMQx jvCqxXIec8J/zMqtF5Uhjik7uSRXJU4fi3neuttBSeo6DVEN/1c89HG77KYw9oQArh FWiOq1isCI2iQ== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC v2 21/21] nvme-pci: don't allow mapping of bvecs with offset Date: Thu, 12 Sep 2024 14:15:56 +0300 Message-ID: <63cdbb87e1b08464705fa343b65e561eb3abd5f9.1726138681.git.leon@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Leon Romanovsky It is a hack, but direct DMA works now. Signed-off-by: Leon Romanovsky --- drivers/nvme/host/pci.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 881cbf2c0cac..1872fa91ac76 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -791,6 +791,9 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev,= struct request *req, return BLK_STS_RESOURCE; =20 rq_for_each_bvec(bv, req, iter) { + if (bv.bv_offset !=3D 0) + goto out_free; + dma_addr =3D dma_map_bvec(dev->dev, &bv, rq_dma_dir(req), 0); if (dma_mapping_error(dev->dev, dma_addr)) goto out_free; --=20 2.46.0