From nobody Wed Dec 17 12:06:56 2025 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 060DE18871F for ; Tue, 18 Mar 2025 03:59:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742270388; cv=none; b=m9zA/USod6ZDq4cjvgVlgbNlCa6fke4G1W6VlU4mh6FAd0gCjLPeguwVJYrLR4U9e29xwpr1MHS1/pmgpcgsDRcg1T7Xqr2rryh4ln0y0Lsd5N8PJJ1ZHCg+p8GTLtlvDclS6UXYlQ2BTqJasxJP/6fZXqRfD7P52RTpBA7TugQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742270388; c=relaxed/simple; bh=/USjbKKq5YfDfRJ6A+DM8l4Ia7tqauRSn5x7aeDJHKY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Hb57bYzTQGi/53XIPLsy1nGIg3vndJdjAy0CCrdsnoXEyG8ISCx+P/prnC5DjLV0707A8SeQ5wmtbbK8mBoI5/PMKw7uZdgxRZJ7BP1QUr110wBw+0xiaGxpUc2hfUz6buFNRz6uhxACe0oWZ/DeLf4BldaQ/Cl29CbEWtTaFsk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=HbnvhFYp; arc=none smtp.client-ip=209.85.214.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="HbnvhFYp" Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-22359001f1aso50273215ad.3 for ; Mon, 17 Mar 2025 20:59:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1742270385; x=1742875185; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=8OMb2TInCE9Qb7KUVvNgWIsY4Uxx9EaJDiVCyGMWyMw=; b=HbnvhFYpAeD7SPnNRisFWXl9tHpHw18JWbyxaA+Zo/MBJK2V5DoIzlclQGkMrKzrrF ZfP1xtmxDc55TF+tOeOAWzjL+xlwgRCPwB7H7aMqYEz2zs0O383/v2f3fdwD1VtqU1UW pCh5LfVPNTzD5+LqJ6suFH5lGv/0GynzbmcQ/cOyVgGFJpu5UBuugQXuF/wYNXVDDl6g 5eQ7RYCWG9rlq6vym/tP4XR88EIkNV+n6nATErNfFwpmRAxFKt/3DLJyZs3Rwe/jd/xT +HU1mQNrj/IKbcE2CoRCsxQb9KJmjsvR0GShzHatohgjGcuG4TLonafs1KldKiZ8ZvrY RFug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742270385; x=1742875185; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8OMb2TInCE9Qb7KUVvNgWIsY4Uxx9EaJDiVCyGMWyMw=; b=tmVZ4rNOjMMdtqR4yyz8jZq2wKhWwA+eRs2YjgnKAimmqGXYl6U5GMIMegrtUgwNiF OkQlXmM+gTv1oyZkaq1BPoUaMF6CFQzcqNCU2w6XrtCqvyQIPTOiqWFnZlOrJkgAm42e u+tXvsrsLJE/ykSyo/+Xj5d7Ip+OHh5ARstGh+u4ZFDdBUgNqOMHoe4yEw8NQemlOkLm 9CTgxuPDriQWdZ6lpsOWbJ31VvL7vT0QdxcgVdv/KArTj7kA6i8wJm7IPZhXawYeHt+o McXBXt7inf7sp4wJu12dux2qmf8Av4rduuqsKhPd+lHRJAlULjMd6v1D11XlG9VPLYcL zVPg== X-Forwarded-Encrypted: i=1; AJvYcCXqtlmukuTIdUc+Npz+K1mmHSzSW2KdrM14X0qDCBjHOyVUEyhEhOwGUhvUuXf1gl7Ov1spLrPftv2NtFk=@vger.kernel.org X-Gm-Message-State: AOJu0YwHGE/1f++3Yrgf9MvwC8F37aHyOub86idKjfIAzS0pPL3PpGHO QeMNl/3wB5Ie9f831EbQ/L57pPOtA1gq/GIzejykWDtjNrszcbyLlyDD+vqA89U= X-Gm-Gg: ASbGncsL+mmpfrB7omU8pGn5WpWCq0+BAc70b28MxBFOsNXWwauFXQlEqEbfRHgt8bC g/WaiWgfqdNcK/BO/8sgY2lqWVC7m/D9kgc6AjvTBg4PeFuyMa/avahSei7ArEjXn2PuWxo+ChP /biODFef66PTDpBVtCh2PaEZGFAttFrqoFrP+d+euShc4GfmDUX834fQ8J+R4SfeIoI5VbqNFRJ 8au2cluJ+kOMPtsodLNK3UHu5KfuIHmKMNmzsblIcX3YG+A+Y397rj+wQt07yLtuiFTePXQZlzN QQzhSHf3WGm9gY22F/gHdVBB01hET0zzIYE0rRPkdSRdP/WmZsmAT/R4k/+nYuGyrMCC/G0pLSe m6u5xP3dGrftesptT2SKmQHmXQyU= X-Google-Smtp-Source: AGHT+IGHO8cx8ckfDIHxWMgqvQhdq4sCCYAYAML9a+Udl9ONIKirtb2WHArjW9djfVtwLHqHp3jjYg== X-Received: by 2002:a17:903:1ca:b0:21f:b483:2ad5 with SMTP id d9443c01a7336-2262c555e78mr19060495ad.20.1742270385373; Mon, 17 Mar 2025 20:59:45 -0700 (PDT) Received: from J9GPGXL7NT.bytedance.net ([61.213.176.55]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-225c6bd4b30sm83720135ad.235.2025.03.17.20.59.40 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 17 Mar 2025 20:59:44 -0700 (PDT) From: Xu Lu To: akpm@linux-foundation.org, jhubbard@nvidia.com, kirill.shutemov@linux.intel.com, tjeznach@rivosinc.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com Cc: lihangjing@bytedance.com, xieyongji@bytedance.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Xu Lu Subject: [PATCH RESEND v2 1/4] mm/gup: Add huge pte handling logic in follow_page_pte() Date: Tue, 18 Mar 2025 11:59:27 +0800 Message-Id: <20250318035930.11855-2-luxu.kernel@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250318035930.11855-1-luxu.kernel@bytedance.com> References: <20250318035930.11855-1-luxu.kernel@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Page mapped at pte level can also be huge page when ARM CONT_PTE or RISC-V SVNAPOT is applied. Lack of huge pte handling logic in follow_page_pte() may lead to both performance and correctness issues. For example, on RISC-V platform, pages in the same 64K huge page have the same pte value, which means follow_page_pte() will get the same page for all of them using pte_pfn(). Then __get_user_pages() will return an array of pages with the same pfn. Mapping these pages causes memory confusion. This error can be triggered by the following code: void *addr =3D mmap(NULL, 0x10000, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE | MAP_HUGETLB | MAP_HUGE_64KB, -1, 0); struct vfio_iommu_type1_dma_map dmap_map =3D { .argsz =3D sizeof(dma_map), .flags =3D VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE, .vaddr =3D (uint64_t)addr, .size =3D 0x10000, }; ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map); This commit supplies huge pte handling logic in follow_page_pte() to avoid such problems. Signed-off-by: Xu Lu --- arch/riscv/include/asm/pgtable.h | 6 ++++++ include/linux/pgtable.h | 8 ++++++++ mm/gup.c | 17 +++++++++++------ 3 files changed, 25 insertions(+), 6 deletions(-) diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgta= ble.h index 050fdc49b5ad7..40ae5979dd82c 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -800,6 +800,12 @@ static inline bool pud_user_accessible_page(pud_t pud) #endif =20 #ifdef CONFIG_TRANSPARENT_HUGEPAGE +#define pte_trans_huge pte_trans_huge +static inline int pte_trans_huge(pte_t pte) +{ + return pte_huge(pte) && pte_napot(pte); +} + static inline int pmd_trans_huge(pmd_t pmd) { return pmd_leaf(pmd); diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 94d267d02372e..3f57ee6dcf017 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1584,6 +1584,14 @@ static inline unsigned long my_zero_pfn(unsigned lon= g addr) =20 #ifdef CONFIG_MMU =20 +#if (defined(CONFIG_TRANSPARENT_HUGEPAGE) && !defined(pte_trans_huge)) || \ + (!defined(CONFIG_TRANSPARENT_HUGEPAGE)) +static inline int pte_trans_huge(pte_t pte) +{ + return 0; +} +#endif + #ifndef CONFIG_TRANSPARENT_HUGEPAGE static inline int pmd_trans_huge(pmd_t pmd) { diff --git a/mm/gup.c b/mm/gup.c index 3883b307780ea..67981ee28df86 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -838,7 +838,7 @@ static inline bool can_follow_write_pte(pte_t pte, stru= ct page *page, =20 static struct page *follow_page_pte(struct vm_area_struct *vma, unsigned long address, pmd_t *pmd, unsigned int flags, - struct dev_pagemap **pgmap) + struct follow_page_context *ctx) { struct mm_struct *mm =3D vma->vm_mm; struct folio *folio; @@ -879,8 +879,8 @@ static struct page *follow_page_pte(struct vm_area_stru= ct *vma, * case since they are only valid while holding the pgmap * reference. */ - *pgmap =3D get_dev_pagemap(pte_pfn(pte), *pgmap); - if (*pgmap) + ctx->pgmap =3D get_dev_pagemap(pte_pfn(pte), ctx->pgmap); + if (ctx->pgmap) page =3D pte_page(pte); else goto no_page; @@ -940,6 +940,11 @@ static struct page *follow_page_pte(struct vm_area_str= uct *vma, */ folio_mark_accessed(folio); } + if (is_vm_hugetlb_page(vma) || pte_trans_huge(pte)) { + ctx->page_mask =3D (1 << folio_order(folio)) - 1; + page =3D folio_page(folio, 0) + + ((address & (folio_size(folio) - 1)) >> PAGE_SHIFT); + } out: pte_unmap_unlock(ptep, ptl); return page; @@ -975,7 +980,7 @@ static struct page *follow_pmd_mask(struct vm_area_stru= ct *vma, return no_page_table(vma, flags, address); } if (likely(!pmd_leaf(pmdval))) - return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); + return follow_page_pte(vma, address, pmd, flags, ctx); =20 if (pmd_protnone(pmdval) && !gup_can_follow_protnone(vma, flags)) return no_page_table(vma, flags, address); @@ -988,14 +993,14 @@ static struct page *follow_pmd_mask(struct vm_area_st= ruct *vma, } if (unlikely(!pmd_leaf(pmdval))) { spin_unlock(ptl); - return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); + return follow_page_pte(vma, address, pmd, flags, ctx); } if (pmd_trans_huge(pmdval) && (flags & FOLL_SPLIT_PMD)) { spin_unlock(ptl); split_huge_pmd(vma, pmd, address); /* If pmd was left empty, stuff a page table in there quickly */ return pte_alloc(mm, pmd) ? ERR_PTR(-ENOMEM) : - follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); + follow_page_pte(vma, address, pmd, flags, ctx); } page =3D follow_huge_pmd(vma, address, pmd, flags, ctx); spin_unlock(ptl); --=20 2.20.1 From nobody Wed Dec 17 12:06:56 2025 Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 61C721BC099 for ; Tue, 18 Mar 2025 03:59:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742270393; cv=none; b=OAkxWA93SqkBN01Fnf8H51W+x/VwahWaJUcggFRcmFtiJ6BDeiWmLlrkoErpcll72VzZrt07df8PCp4/Y79xHhi5k1Y7l8VUaIEMVGdtt1tWmN408/JneEdAgHYq+UPvt4l9nBhlekx4wVHJ+cPZea1YqXGSXmWutVU8pfOt2sY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742270393; c=relaxed/simple; bh=yjkhHriFUj/pjJLTsHrBMQnNQw7+EpDNpDVbDZBqKTg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=HCbNLfcDKVED/0cPWxyFpcD3HfQ2PYUYDqsnvKdWYRURg7w0pdJlgYf3e7+1dNTpuw4G+f4wobZTjfevGDx+kfZjzkBnzheX8SVnrhmWHaIJP2CrLcNSXxNPGbI9M1OSX4bzrr8Ro3rHweWw3xpLeMaa+ShcQMlpk5ydmt4XXUQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=k140h19y; arc=none smtp.client-ip=209.85.214.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="k140h19y" Received: by mail-pl1-f178.google.com with SMTP id d9443c01a7336-22398e09e39so109859485ad.3 for ; Mon, 17 Mar 2025 20:59:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1742270390; x=1742875190; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=4RzuTPLpNm7preMNxFl4olDUAY05FyWZH15sy7wmcR0=; b=k140h19yIS0k+9Om3zkwuECp43DxOKxa+N8F4tRzoEXQVIU+IpaytNBnXiLIpOQfjd LgaDTarfIWzhlSQbyBAkYEBDNlqvh1KGgeGSZ1kzU0n9Yg3kQNUanTkKC4LEJ/73COkB Rl6iEuduvwkmFFUNiWx226cRX/Br4dYsMcuV90ylWzC6uY1VjAkvQ4E2Eyy5me/LqCRr qK/3wEeAB/VKpExVTMKHqwPKr8ELp5lpUJTR6f2yLi8e3VIxUd3sYZq61Hw8L1vRDl5J nNz4FO+7DWxH3EwGFYQubCnwDvi41AHIZBU56mNIKa7R7yCobk4Sg+hcxdiaP2fCUHm7 NliQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742270390; x=1742875190; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4RzuTPLpNm7preMNxFl4olDUAY05FyWZH15sy7wmcR0=; b=lHMmo1UqYjH2h+uuL3lLMsqeyUgYM1UzmLoODJGZ7qXWQCfPX/Hh1t/Sg3AiOqJCe3 EdLsA19gsJ731Ic4Wwkg0xyt+Z6+RzBnyOjpN2rT2xQjk8sL04cwqi3ZbmcVSH1pXRLH eZYiwJ0gXSjBeqACkzNh0nSNmncliW6AKlynx5Cb9J0I6E0h1+VKwNEJe+dq4twtRvoL 2y2mIDhPAvfU8vr48rDKh/Wda7Vq+p7/x2Xvmv9Q/F+z+CLwpMITsSIhIahgct9dxuTW Dw2grfk1+6ge3nMt7r1nRC/vQ9gZM3I/nH/HMFvSw54ji2M4VpDnBtns+hA8Ly6LrQDZ EOWg== X-Forwarded-Encrypted: i=1; AJvYcCXoGL9PzbusJRwI57wD03GAY9gRHLq43fc7XBpZ/R+o6j6Sok3FXihjiOV6natygtPru5AQoFe9tG8jQdU=@vger.kernel.org X-Gm-Message-State: AOJu0YwHlOOeKVsSmBLCS7pUHiWXggAq6DsUKMcnxlkuPUxJH4XoBBbP 5QLlzuc2+m8iK0qQmbkRZ6oDlTl/3HU737jKkAswHPVHJkt6Sl9+IPYgE2pc3Z8= X-Gm-Gg: ASbGnct34FW/i6H88p+9K2nz1LYUFnnxfZOt32BCW44VGonhlIK+832dOF0GMD9umPI 1DKXHjwjeWar8hRjMRI40OvA50nc2px1hCxAV+/ln3Rcm1eqDdLNblvvywfCZhj3Jm+InWNZ6Pk PuQYtI+12Hf6yu2DI64Uh7BS/spPZUr0glSYt4EdG+5npv5PA9t0ISZ/Y8l3CRwH1X5rnxT9wZx FkD/+N6FOA2U0oWU4Kg/wEJnllFQ/1NiYP8grmkBHchoAonW2W5afG4UgczJNbp7oJqbDcMHSrk ga2yJk8Lfrb5sImLeunCBEl9ge5YMqK0e8cUy281ur6tvChFbUIjlk1etODa6OaNtcSku0/+HaZ BqgbycKF2uy63WvbPKGxIbIlOgII= X-Google-Smtp-Source: AGHT+IFE368rAoYFIO0lZVoCVYA7ODibd0bQTYRx4G4UreI6PSfoC7ExcjehLoXI4QO5c2ehzPanCg== X-Received: by 2002:a17:902:e886:b0:211:e812:3948 with SMTP id d9443c01a7336-225e08597f2mr204021375ad.0.1742270390531; Mon, 17 Mar 2025 20:59:50 -0700 (PDT) Received: from J9GPGXL7NT.bytedance.net ([61.213.176.55]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-225c6bd4b30sm83720135ad.235.2025.03.17.20.59.45 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 17 Mar 2025 20:59:49 -0700 (PDT) From: Xu Lu To: akpm@linux-foundation.org, jhubbard@nvidia.com, kirill.shutemov@linux.intel.com, tjeznach@rivosinc.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com Cc: lihangjing@bytedance.com, xieyongji@bytedance.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Xu Lu Subject: [PATCH RESEND v2 2/4] iommu/riscv: Use pte_t to represent page table entry Date: Tue, 18 Mar 2025 11:59:28 +0800 Message-Id: <20250318035930.11855-3-luxu.kernel@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250318035930.11855-1-luxu.kernel@bytedance.com> References: <20250318035930.11855-1-luxu.kernel@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Since RISC-V IOMMU has the same pte format and translation process with MMU as is specified in RISC-V Privileged specification, we use pte_t to represent IOMMU pte too to reuse existing pte operation functions. Signed-off-by: Xu Lu --- drivers/iommu/riscv/iommu.c | 79 ++++++++++++++++++------------------- 1 file changed, 39 insertions(+), 40 deletions(-) diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c index 8f049d4a0e2cb..3b0c934decd08 100644 --- a/drivers/iommu/riscv/iommu.c +++ b/drivers/iommu/riscv/iommu.c @@ -812,7 +812,7 @@ struct riscv_iommu_domain { bool amo_enabled; int numa_node; unsigned int pgd_mode; - unsigned long *pgd_root; + pte_t *pgd_root; }; =20 #define iommu_domain_to_riscv(iommu_domain) \ @@ -1081,27 +1081,29 @@ static void riscv_iommu_iotlb_sync(struct iommu_dom= ain *iommu_domain, =20 #define PT_SHIFT (PAGE_SHIFT - ilog2(sizeof(pte_t))) =20 -#define _io_pte_present(pte) ((pte) & (_PAGE_PRESENT | _PAGE_PROT_NONE)) -#define _io_pte_leaf(pte) ((pte) & _PAGE_LEAF) -#define _io_pte_none(pte) ((pte) =3D=3D 0) -#define _io_pte_entry(pn, prot) ((_PAGE_PFN_MASK & ((pn) << _PAGE_PFN_SHIF= T)) | (prot)) +#define _io_pte_present(pte) (pte_val(pte) & (_PAGE_PRESENT | _PAGE_PROT_N= ONE)) +#define _io_pte_leaf(pte) (pte_val(pte) & _PAGE_LEAF) +#define _io_pte_none(pte) (pte_val(pte) =3D=3D 0) +#define _io_pte_entry(pn, prot) (__pte((_PAGE_PFN_MASK & ((pn) << _PAGE_PF= N_SHIFT)) | (prot))) =20 static void riscv_iommu_pte_free(struct riscv_iommu_domain *domain, - unsigned long pte, struct list_head *freelist) + pte_t pte, struct list_head *freelist) { - unsigned long *ptr; + pte_t *ptr; int i; =20 if (!_io_pte_present(pte) || _io_pte_leaf(pte)) return; =20 - ptr =3D (unsigned long *)pfn_to_virt(__page_val_to_pfn(pte)); + ptr =3D (pte_t *)pfn_to_virt(pte_pfn(pte)); =20 /* Recursively free all sub page table pages */ for (i =3D 0; i < PTRS_PER_PTE; i++) { - pte =3D READ_ONCE(ptr[i]); - if (!_io_pte_none(pte) && cmpxchg_relaxed(ptr + i, pte, 0) =3D=3D pte) + pte =3D ptr[i]; + if (!_io_pte_none(pte)) { + ptr[i] =3D __pte(0); riscv_iommu_pte_free(domain, pte, freelist); + } } =20 if (freelist) @@ -1110,12 +1112,12 @@ static void riscv_iommu_pte_free(struct riscv_iommu= _domain *domain, iommu_free_page(ptr); } =20 -static unsigned long *riscv_iommu_pte_alloc(struct riscv_iommu_domain *dom= ain, - unsigned long iova, size_t pgsize, - gfp_t gfp) +static pte_t *riscv_iommu_pte_alloc(struct riscv_iommu_domain *domain, + unsigned long iova, size_t pgsize, + gfp_t gfp) { - unsigned long *ptr =3D domain->pgd_root; - unsigned long pte, old; + pte_t *ptr =3D domain->pgd_root; + pte_t pte, old; int level =3D domain->pgd_mode - RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2; void *addr; =20 @@ -1131,7 +1133,7 @@ static unsigned long *riscv_iommu_pte_alloc(struct ri= scv_iommu_domain *domain, if (((size_t)1 << shift) =3D=3D pgsize) return ptr; pte_retry: - pte =3D READ_ONCE(*ptr); + pte =3D ptep_get(ptr); /* * This is very likely incorrect as we should not be adding * new mapping with smaller granularity on top @@ -1147,38 +1149,37 @@ static unsigned long *riscv_iommu_pte_alloc(struct = riscv_iommu_domain *domain, addr =3D iommu_alloc_page_node(domain->numa_node, gfp); if (!addr) return NULL; - old =3D pte; - pte =3D _io_pte_entry(virt_to_pfn(addr), _PAGE_TABLE); - if (cmpxchg_relaxed(ptr, old, pte) !=3D old) { - iommu_free_page(addr); + old =3D ptep_get(ptr); + if (!_io_pte_none(old)) goto pte_retry; - } + pte =3D _io_pte_entry(virt_to_pfn(addr), _PAGE_TABLE); + set_pte(ptr, pte); } - ptr =3D (unsigned long *)pfn_to_virt(__page_val_to_pfn(pte)); + ptr =3D (pte_t *)pfn_to_virt(pte_pfn(pte)); } while (level-- > 0); =20 return NULL; } =20 -static unsigned long *riscv_iommu_pte_fetch(struct riscv_iommu_domain *dom= ain, - unsigned long iova, size_t *pte_pgsize) +static pte_t *riscv_iommu_pte_fetch(struct riscv_iommu_domain *domain, + unsigned long iova, size_t *pte_pgsize) { - unsigned long *ptr =3D domain->pgd_root; - unsigned long pte; + pte_t *ptr =3D domain->pgd_root; + pte_t pte; int level =3D domain->pgd_mode - RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2; =20 do { const int shift =3D PAGE_SHIFT + PT_SHIFT * level; =20 ptr +=3D ((iova >> shift) & (PTRS_PER_PTE - 1)); - pte =3D READ_ONCE(*ptr); + pte =3D ptep_get(ptr); if (_io_pte_present(pte) && _io_pte_leaf(pte)) { *pte_pgsize =3D (size_t)1 << shift; return ptr; } if (_io_pte_none(pte)) return NULL; - ptr =3D (unsigned long *)pfn_to_virt(__page_val_to_pfn(pte)); + ptr =3D (pte_t *)pfn_to_virt(pte_pfn(pte)); } while (level-- > 0); =20 return NULL; @@ -1191,8 +1192,9 @@ static int riscv_iommu_map_pages(struct iommu_domain = *iommu_domain, { struct riscv_iommu_domain *domain =3D iommu_domain_to_riscv(iommu_domain); size_t size =3D 0; - unsigned long *ptr; - unsigned long pte, old, pte_prot; + pte_t *ptr; + pte_t pte, old; + unsigned long pte_prot; int rc =3D 0; LIST_HEAD(freelist); =20 @@ -1210,10 +1212,9 @@ static int riscv_iommu_map_pages(struct iommu_domain= *iommu_domain, break; } =20 - old =3D READ_ONCE(*ptr); + old =3D ptep_get(ptr); pte =3D _io_pte_entry(phys_to_pfn(phys), pte_prot); - if (cmpxchg_relaxed(ptr, old, pte) !=3D old) - continue; + set_pte(ptr, pte); =20 riscv_iommu_pte_free(domain, old, &freelist); =20 @@ -1247,7 +1248,7 @@ static size_t riscv_iommu_unmap_pages(struct iommu_do= main *iommu_domain, { struct riscv_iommu_domain *domain =3D iommu_domain_to_riscv(iommu_domain); size_t size =3D pgcount << __ffs(pgsize); - unsigned long *ptr, old; + pte_t *ptr; size_t unmapped =3D 0; size_t pte_size; =20 @@ -1260,9 +1261,7 @@ static size_t riscv_iommu_unmap_pages(struct iommu_do= main *iommu_domain, if (iova & (pte_size - 1)) return unmapped; =20 - old =3D READ_ONCE(*ptr); - if (cmpxchg_relaxed(ptr, old, 0) !=3D old) - continue; + set_pte(ptr, __pte(0)); =20 iommu_iotlb_gather_add_page(&domain->domain, gather, iova, pte_size); @@ -1279,13 +1278,13 @@ static phys_addr_t riscv_iommu_iova_to_phys(struct = iommu_domain *iommu_domain, { struct riscv_iommu_domain *domain =3D iommu_domain_to_riscv(iommu_domain); size_t pte_size; - unsigned long *ptr; + pte_t *ptr; =20 ptr =3D riscv_iommu_pte_fetch(domain, iova, &pte_size); - if (_io_pte_none(*ptr) || !_io_pte_present(*ptr)) + if (_io_pte_none(ptep_get(ptr)) || !_io_pte_present(ptep_get(ptr))) return 0; =20 - return pfn_to_phys(__page_val_to_pfn(*ptr)) | (iova & (pte_size - 1)); + return pfn_to_phys(pte_pfn(ptep_get(ptr))) | (iova & (pte_size - 1)); } =20 static void riscv_iommu_free_paging_domain(struct iommu_domain *iommu_doma= in) --=20 2.20.1 From nobody Wed Dec 17 12:06:56 2025 Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 860401C3BF7 for ; Tue, 18 Mar 2025 03:59:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742270398; cv=none; b=BEbXkboeH1g9JXes3bZgoPkWNyO20JVKWwc4L/ylEM8k5u5QCy/0+9wDpST/MNa1O29o/C8I6cmq8e7UNJAKJhLEa+ubofsCjjhyySwdI7v8Mab7U1X2CZIc1mDGe5DjxKZCWNVYB/Rw/zrJ7l+eKKatDITQhwE5BNYZy+xsu04= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742270398; c=relaxed/simple; bh=+wRpuN0A/HjofcnYPeltvF9g4Pbwgb7sI3k1r3yDo7A=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=tCHWMQXe6V2mkJ7XgROoaL/ZGNIwl/fAmAgMz79rH1SQAOdb/UeV3zbHJrQ/JclbYGNoyjGibiEk0dsPJqQE9BMACn/2BpB6Roqt/aBHU4dM9VMovF2H2nOkEA/FDfJTJY0rzt0gjGz1ge1+pN9fVrJI2VP6mmnWkSXaxZT4xXA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=ABoydGab; arc=none smtp.client-ip=209.85.214.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="ABoydGab" Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-2241053582dso47153595ad.1 for ; Mon, 17 Mar 2025 20:59:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1742270396; x=1742875196; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=xaRB3Z7Zn9kGyUKfUpl8WYZaXTZf5T6vpq1GZf0TWl4=; b=ABoydGab2xRkj0xcnWwrvCHU9yOm/4PQVDimTTBJTEs6Rti/axjNRJkMGLHGcIU83u w00s+n9RMAJxrhm3mbP3Yax+PgFHdATR6fmNxvhSVxTsnYMXTGaAB61+s4i/t1tihyj+ J0ExMtN0yN9EmFouOfRjdbK33flCigCb/OOQMCy4z4XS0JYaZaksB1EtRjvq6UXPb/3M GDDHk2ulUcFEMIScq1qDLQUq/VR7W/kpS+uoGpfqvQmm1iwH/iO7O6dYK2n46HXNqosz jwRfw4aHVMvyfLUOwCU4ofQdMBl9SMa5ixcQhDxR++z3RjLQ9ZFjKhFgAO/xoS/P41rx O0eA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742270396; x=1742875196; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xaRB3Z7Zn9kGyUKfUpl8WYZaXTZf5T6vpq1GZf0TWl4=; b=j3q6M976ubr9jt9cfrwfkZzFzb0OQwR5rxUDTrF0tf9L7+/uIyElzNqh6k3VFTZTky OD2lDLg6oo1O6RdigUOKODaZQfHrfUEay/vKg+8Ndv5f/ojxpNqjSSRCl5u74YrhdoYc srA4OKFm2EUIPKwAVK/NgBfvAldzjUxzUv3GjBtUtdCvsq96w69VIk7Lftr/2JYevC7+ S7NTcW8tR4RF1cb/PR37rqIoK6qx85Xl9NdYAm5nqWkHlVNuJuG1ApBwuPhOKYA7+HyO WtaTUncFNQrnu/FIumX0W3v9BB0wy/BDAmEeDv7qvY0MipXxxgmr4T6Wmm58RhSTk38v HC/g== X-Forwarded-Encrypted: i=1; AJvYcCXXDHxXVWC45QHE+aKRJ/F11KKrJS+fGdYcPTVjDsUB0S53JouwdiuksHQdx965r+5QbHkLjOnR5CU9UTA=@vger.kernel.org X-Gm-Message-State: AOJu0Yz/kc7XzqiIJ1VkGykO1/03o8cDomp2BxUU+jghZlflAQe4Efac XweZqsFY0aLZqygIEus3SvIz9Cx9oUqD2tHuwPuKgX+ILVHVtJu4/1xpI4/KRvI= X-Gm-Gg: ASbGncvbI3YLmdMVSwMEvW0BFTBYXkoyre3aexYvKnGKs8WEP6Ev6sEc1pEkqAJ+q8H w0TgL7a/FQG5gc/+GbUL5wNmpl+RYbe48L6E+JWClqf8V46vqxJAuX1sxHbgP8nhnm/9QXU+Z3F 1ScP5iPGGdmb4T3qzv23UvsJFuv+QybZL4vyHqNLgD925vwYXiKDMB1wwQsaM/7f65BSbYTVSK7 5cg9+6gSNAI9S29DfvltiNyDKaSh21aYXRnvEj6UuRKXV6SPuxVVT1iIXkLyDeEYa5pAlE+Yfbe HAgNBiETFTrr+OMYxZhv53hAuEJocadKms1wtSy4C3gMfYXD+s6FZ0pTKBSp1ZW4jeCU2SVPIEU vog096bSxoVyIlv5MQc9K8OOimG0= X-Google-Smtp-Source: AGHT+IExX+iDrDCGHA8Aq12SWZuDNNckGnu8pihoRkyjLOg0HP1RuQ216rkdL14FWV3g+VDeX680Yg== X-Received: by 2002:a17:903:238b:b0:223:6180:1bf7 with SMTP id d9443c01a7336-225e0b2fba7mr186404745ad.42.1742270395708; Mon, 17 Mar 2025 20:59:55 -0700 (PDT) Received: from J9GPGXL7NT.bytedance.net ([61.213.176.55]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-225c6bd4b30sm83720135ad.235.2025.03.17.20.59.50 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 17 Mar 2025 20:59:54 -0700 (PDT) From: Xu Lu To: akpm@linux-foundation.org, jhubbard@nvidia.com, kirill.shutemov@linux.intel.com, tjeznach@rivosinc.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com Cc: lihangjing@bytedance.com, xieyongji@bytedance.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Xu Lu Subject: [PATCH RESEND v2 3/4] iommu/riscv: Introduce IOMMU page table lock Date: Tue, 18 Mar 2025 11:59:29 +0800 Message-Id: <20250318035930.11855-4-luxu.kernel@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250318035930.11855-1-luxu.kernel@bytedance.com> References: <20250318035930.11855-1-luxu.kernel@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce page table lock to address competition issues when modifying multiple PTEs, for example, when applying Svnapot. We use fine-grained page table locks to minimize lock contention. Signed-off-by: Xu Lu --- drivers/iommu/riscv/iommu.c | 123 +++++++++++++++++++++++++++++++----- 1 file changed, 107 insertions(+), 16 deletions(-) diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c index 3b0c934decd08..ce4cf6569ffb4 100644 --- a/drivers/iommu/riscv/iommu.c +++ b/drivers/iommu/riscv/iommu.c @@ -808,6 +808,7 @@ struct riscv_iommu_domain { struct iommu_domain domain; struct list_head bonds; spinlock_t lock; /* protect bonds list updates. */ + spinlock_t page_table_lock; /* protect page table updates. */ int pscid; bool amo_enabled; int numa_node; @@ -1086,8 +1087,80 @@ static void riscv_iommu_iotlb_sync(struct iommu_doma= in *iommu_domain, #define _io_pte_none(pte) (pte_val(pte) =3D=3D 0) #define _io_pte_entry(pn, prot) (__pte((_PAGE_PFN_MASK & ((pn) << _PAGE_PF= N_SHIFT)) | (prot))) =20 +#define RISCV_IOMMU_PMD_LEVEL 1 + +static bool riscv_iommu_ptlock_init(struct ptdesc *ptdesc, int level) +{ + if (level <=3D RISCV_IOMMU_PMD_LEVEL) + return ptlock_init(ptdesc); + return true; +} + +static void riscv_iommu_ptlock_free(struct ptdesc *ptdesc, int level) +{ + if (level <=3D RISCV_IOMMU_PMD_LEVEL) + ptlock_free(ptdesc); +} + +static spinlock_t *riscv_iommu_ptlock(struct riscv_iommu_domain *domain, + pte_t *pte, int level) +{ + spinlock_t *ptl; /* page table page lock */ + +#ifdef CONFIG_SPLIT_PTE_PTLOCKS + if (level <=3D RISCV_IOMMU_PMD_LEVEL) + ptl =3D ptlock_ptr(page_ptdesc(virt_to_page(pte))); + else +#endif + ptl =3D &domain->page_table_lock; + spin_lock(ptl); + + return ptl; +} + +static void *riscv_iommu_alloc_pagetable_node(int numa_node, gfp_t gfp, in= t level) +{ + struct ptdesc *ptdesc; + void *addr; + + addr =3D iommu_alloc_page_node(numa_node, gfp); + if (!addr) + return NULL; + + ptdesc =3D page_ptdesc(virt_to_page(addr)); + if (!riscv_iommu_ptlock_init(ptdesc, level)) { + iommu_free_page(addr); + addr =3D NULL; + } + + return addr; +} + +static void riscv_iommu_free_pagetable(void *addr, int level) +{ + struct ptdesc *ptdesc =3D page_ptdesc(virt_to_page(addr)); + + riscv_iommu_ptlock_free(ptdesc, level); + iommu_free_page(addr); +} + +static int pgsize_to_level(size_t pgsize) +{ + int level =3D RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV57 - + RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2; + int shift =3D PAGE_SHIFT + PT_SHIFT * level; + + while (pgsize < ((size_t)1 << shift)) { + shift -=3D PT_SHIFT; + level--; + } + + return level; +} + static void riscv_iommu_pte_free(struct riscv_iommu_domain *domain, - pte_t pte, struct list_head *freelist) + pte_t pte, int level, + struct list_head *freelist) { pte_t *ptr; int i; @@ -1102,10 +1175,11 @@ static void riscv_iommu_pte_free(struct riscv_iommu= _domain *domain, pte =3D ptr[i]; if (!_io_pte_none(pte)) { ptr[i] =3D __pte(0); - riscv_iommu_pte_free(domain, pte, freelist); + riscv_iommu_pte_free(domain, pte, level - 1, freelist); } } =20 + riscv_iommu_ptlock_free(page_ptdesc(virt_to_page(ptr)), level); if (freelist) list_add_tail(&virt_to_page(ptr)->lru, freelist); else @@ -1117,8 +1191,9 @@ static pte_t *riscv_iommu_pte_alloc(struct riscv_iomm= u_domain *domain, gfp_t gfp) { pte_t *ptr =3D domain->pgd_root; - pte_t pte, old; + pte_t pte; int level =3D domain->pgd_mode - RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2; + spinlock_t *ptl; /* page table page lock */ void *addr; =20 do { @@ -1146,14 +1221,21 @@ static pte_t *riscv_iommu_pte_alloc(struct riscv_io= mmu_domain *domain, * page table. This might race with other mappings, retry. */ if (_io_pte_none(pte)) { - addr =3D iommu_alloc_page_node(domain->numa_node, gfp); + addr =3D riscv_iommu_alloc_pagetable_node(domain->numa_node, gfp, + level - 1); if (!addr) return NULL; - old =3D ptep_get(ptr); - if (!_io_pte_none(old)) + + ptl =3D riscv_iommu_ptlock(domain, ptr, level); + pte =3D ptep_get(ptr); + if (!_io_pte_none(pte)) { + spin_unlock(ptl); + riscv_iommu_free_pagetable(addr, level - 1); goto pte_retry; + } pte =3D _io_pte_entry(virt_to_pfn(addr), _PAGE_TABLE); set_pte(ptr, pte); + spin_unlock(ptl); } ptr =3D (pte_t *)pfn_to_virt(pte_pfn(pte)); } while (level-- > 0); @@ -1193,9 +1275,10 @@ static int riscv_iommu_map_pages(struct iommu_domain= *iommu_domain, struct riscv_iommu_domain *domain =3D iommu_domain_to_riscv(iommu_domain); size_t size =3D 0; pte_t *ptr; - pte_t pte, old; + pte_t pte; unsigned long pte_prot; - int rc =3D 0; + int rc =3D 0, level; + spinlock_t *ptl; /* page table page lock */ LIST_HEAD(freelist); =20 if (!(prot & IOMMU_WRITE)) @@ -1212,11 +1295,12 @@ static int riscv_iommu_map_pages(struct iommu_domai= n *iommu_domain, break; } =20 - old =3D ptep_get(ptr); + level =3D pgsize_to_level(pgsize); + ptl =3D riscv_iommu_ptlock(domain, ptr, level); + riscv_iommu_pte_free(domain, ptep_get(ptr), level, &freelist); pte =3D _io_pte_entry(phys_to_pfn(phys), pte_prot); set_pte(ptr, pte); - - riscv_iommu_pte_free(domain, old, &freelist); + spin_unlock(ptl); =20 size +=3D pgsize; iova +=3D pgsize; @@ -1251,6 +1335,7 @@ static size_t riscv_iommu_unmap_pages(struct iommu_do= main *iommu_domain, pte_t *ptr; size_t unmapped =3D 0; size_t pte_size; + spinlock_t *ptl; /* page table page lock */ =20 while (unmapped < size) { ptr =3D riscv_iommu_pte_fetch(domain, iova, &pte_size); @@ -1261,7 +1346,9 @@ static size_t riscv_iommu_unmap_pages(struct iommu_do= main *iommu_domain, if (iova & (pte_size - 1)) return unmapped; =20 + ptl =3D riscv_iommu_ptlock(domain, ptr, pgsize_to_level(pte_size)); set_pte(ptr, __pte(0)); + spin_unlock(ptl); =20 iommu_iotlb_gather_add_page(&domain->domain, gather, iova, pte_size); @@ -1291,13 +1378,14 @@ static void riscv_iommu_free_paging_domain(struct i= ommu_domain *iommu_domain) { struct riscv_iommu_domain *domain =3D iommu_domain_to_riscv(iommu_domain); const unsigned long pfn =3D virt_to_pfn(domain->pgd_root); + int level =3D domain->pgd_mode - RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2; =20 WARN_ON(!list_empty(&domain->bonds)); =20 if ((int)domain->pscid > 0) ida_free(&riscv_iommu_pscids, domain->pscid); =20 - riscv_iommu_pte_free(domain, _io_pte_entry(pfn, _PAGE_TABLE), NULL); + riscv_iommu_pte_free(domain, _io_pte_entry(pfn, _PAGE_TABLE), level, NULL= ); kfree(domain); } =20 @@ -1358,7 +1446,7 @@ static struct iommu_domain *riscv_iommu_alloc_paging_= domain(struct device *dev) struct riscv_iommu_device *iommu; unsigned int pgd_mode; dma_addr_t va_mask; - int va_bits; + int va_bits, level; =20 iommu =3D dev_to_iommu(dev); if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV57) { @@ -1381,11 +1469,14 @@ static struct iommu_domain *riscv_iommu_alloc_pagin= g_domain(struct device *dev) =20 INIT_LIST_HEAD_RCU(&domain->bonds); spin_lock_init(&domain->lock); + spin_lock_init(&domain->page_table_lock); domain->numa_node =3D dev_to_node(iommu->dev); domain->amo_enabled =3D !!(iommu->caps & RISCV_IOMMU_CAPABILITIES_AMO_HWA= D); domain->pgd_mode =3D pgd_mode; - domain->pgd_root =3D iommu_alloc_page_node(domain->numa_node, - GFP_KERNEL_ACCOUNT); + level =3D domain->pgd_mode - RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2; + domain->pgd_root =3D riscv_iommu_alloc_pagetable_node(domain->numa_node, + GFP_KERNEL_ACCOUNT, + level); if (!domain->pgd_root) { kfree(domain); return ERR_PTR(-ENOMEM); @@ -1394,7 +1485,7 @@ static struct iommu_domain *riscv_iommu_alloc_paging_= domain(struct device *dev) domain->pscid =3D ida_alloc_range(&riscv_iommu_pscids, 1, RISCV_IOMMU_MAX_PSCID, GFP_KERNEL); if (domain->pscid < 0) { - iommu_free_page(domain->pgd_root); + riscv_iommu_free_pagetable(domain->pgd_root, level); kfree(domain); return ERR_PTR(-ENOMEM); } --=20 2.20.1 From nobody Wed Dec 17 12:06:56 2025 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BE0461C5D6C for ; Tue, 18 Mar 2025 04:00:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742270403; cv=none; b=dlz0kUyesjLVs7hdP63x1JwT3D8beLRo54v64ltUrO/bZeYlQQ0fiLJeLUIEIrQ56HmPeRPDPse2tdrjl6GXxrHADRQGikgkS/h47puXnPgHHRZ8Y31W5OkbaoKkwm1+0iw1pugATqBWqZNkciTWSkKY710MNfxeGoc23UxaopU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742270403; c=relaxed/simple; bh=EBw0pp0DQXvs46s1Cun5L7W7ygqXcv2euwtVQoMQwxM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=nhsmCcYGBQIQlykgyOq4LVeECNo+GS0n3xQ0eN9uOaumjbEQaesPmGRAXEPjtfWmtSgXTRafBuYrzsu3sfPCOpuzu/4eE//gvTBLuIAX7BUsQ3i+zWXGDJK2aIBmY38RwNy7qDUNZd0+SMP6iqmYsBzjmK53hWpNr69IY5bhUcM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=GBCkQqy0; arc=none smtp.client-ip=209.85.214.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="GBCkQqy0" Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-22438c356c8so89154685ad.1 for ; Mon, 17 Mar 2025 21:00:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1742270401; x=1742875201; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=25EEGwiFn3VnbkwivdTLYPKqZQgF1BeYiMnWI1gQipU=; b=GBCkQqy0oppsha48UjqLp0WzZblMXlAGQzNKEI8/Ekc2xMWxGPBxqNJATT3x146NAm ax3PGuJvyZsLvjO+25CYmddiSL5uwzn2Yh9jxKn7/C/G4KgAhQ6NX8Ue2rJ9l4OVXoUc IiJJfTFrtPI6QPIKVqpruh4zMuGPVdR0tyWjNhE99D4s0t528AAg0QLPPWK2YtLhGhCb XvdCHdwzpaat/IGtDq3+8F2ir90Ub6UF28U39R3uCHBwAVFg7fA8t6vGGnH2zqH8X6Ey 9vD6ZeJmWW+0cR/y/3Btiak05bQ2ahT3Bqn2VGx4q4tfpuAW3AGfIxRXm0OEa6e4395p 6rtw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742270401; x=1742875201; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=25EEGwiFn3VnbkwivdTLYPKqZQgF1BeYiMnWI1gQipU=; b=mJjjaos0ZdSR3f7fq3XKtYxghS3q4OJ109L9ehrEDssuzsWctGzTPKl30t+A8TvWVq 18+PQNKhUNOw9ngtx02Z17t9fvR56WUXccqP6QK1C6CMYHWTktHB7JSy7pjJkyAJCr3P CRDo1ujVB7Z1duokXY8dLfIYzySsblLiyiUUgg+hzHoXGFzA8D9O+pVumLPbZOpyp2Nj ObgV6yUbWXgNEb9W4YIpwPX4wtV0noNWKfemVGqdDZ9Sv5UhOciClnWkCDFUTqE86xgI mPCWBWh0xMwy3QGHh+Dq5D91HnVbhMy0TKfGTWNzNgl+cjMGqsPtQbwnaf6O5+9UGVHe vIbQ== X-Forwarded-Encrypted: i=1; AJvYcCWy2gXTIh+R9tcGk44jFgM7J7Q5Wcs6n9f5aK8PorF0rQCQ4dc5bGYVcpmLvJWImu3AOyK38VE4jZ3ADjY=@vger.kernel.org X-Gm-Message-State: AOJu0YxG88EKLEjzzZYtZ8HYXnLl1xfFucmolymr0SM+GL65SQzz1T2C o5Z6z3Wj3kM52JGFoIQ9uBQd21KgAZN8ip5UyCLo5wMCApcBbAM857fG4Ks3SjA= X-Gm-Gg: ASbGncsPbhezskVpiR8pphuRGxnkIhKN5t78+fAT6qGxbZqZ2qx6j0vjWe3E40PgG7U pKJ3SnZLib+XHx2wN6Q6uzOYh+wJ8nBRGOpArgAG/Yd7uzFQXItRyQWutlwAeXsV8l2dDwPa9Nk rssKUB2duXmIeXw1ve2DBendduMqz8lx02BmDjlrT0W3d5koe+3vvnzbykQgCD7M86YqPaqSXl+ ReQNWjpmRy8LK20mx7rJCL1fMECxjro7pDsJMLgiM4x3ZWyGFE9LeWwADCFua1VicEGl+eml8b2 Wgk4P9KwdVG62RLQ+vdhxg70CV7f7JTBuqTf7tRQFP78Bz62rOH0aQH78+nxnGoZRGpC/T1XKNS lNF0g2/QPmOjqUuZJ84o2Y02g32ZNu7Dob1FsGQ== X-Google-Smtp-Source: AGHT+IFBc6TvuWKKSrWPTLSErVQ1GkoHiMOoKLD8NwCVFkbU9sjHFxonJru7yaMaEU5aIspYftiubw== X-Received: by 2002:a17:903:230f:b0:223:5e54:c521 with SMTP id d9443c01a7336-225e0859fafmr199239715ad.0.1742270400974; Mon, 17 Mar 2025 21:00:00 -0700 (PDT) Received: from J9GPGXL7NT.bytedance.net ([61.213.176.55]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-225c6bd4b30sm83720135ad.235.2025.03.17.20.59.56 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 17 Mar 2025 20:59:59 -0700 (PDT) From: Xu Lu To: akpm@linux-foundation.org, jhubbard@nvidia.com, kirill.shutemov@linux.intel.com, tjeznach@rivosinc.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com Cc: lihangjing@bytedance.com, xieyongji@bytedance.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Xu Lu Subject: [PATCH RESEND v2 4/4] iommu/riscv: Add support for Svnapot Date: Tue, 18 Mar 2025 11:59:30 +0800 Message-Id: <20250318035930.11855-5-luxu.kernel@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250318035930.11855-1-luxu.kernel@bytedance.com> References: <20250318035930.11855-1-luxu.kernel@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add Svnapot size as supported page size and apply Svnapot when it is possible. Signed-off-by: Xu Lu --- drivers/iommu/riscv/iommu.c | 86 +++++++++++++++++++++++++++++++++---- 1 file changed, 77 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c index ce4cf6569ffb4..7cc736abd2a61 100644 --- a/drivers/iommu/riscv/iommu.c +++ b/drivers/iommu/riscv/iommu.c @@ -1158,6 +1158,26 @@ static int pgsize_to_level(size_t pgsize) return level; } =20 +static unsigned long napot_size_to_order(unsigned long size) +{ + unsigned long order; + + if (!has_svnapot()) + return 0; + + for_each_napot_order(order) { + if (size =3D=3D napot_cont_size(order)) + return order; + } + + return 0; +} + +static bool is_napot_size(unsigned long size) +{ + return napot_size_to_order(size) !=3D 0; +} + static void riscv_iommu_pte_free(struct riscv_iommu_domain *domain, pte_t pte, int level, struct list_head *freelist) @@ -1205,7 +1225,8 @@ static pte_t *riscv_iommu_pte_alloc(struct riscv_iomm= u_domain *domain, * existing mapping with smaller granularity. Up to the caller * to replace and invalidate. */ - if (((size_t)1 << shift) =3D=3D pgsize) + if ((((size_t)1 << shift) =3D=3D pgsize) || + (is_napot_size(pgsize) && pgsize_to_level(pgsize) =3D=3D level)) return ptr; pte_retry: pte =3D ptep_get(ptr); @@ -1256,7 +1277,10 @@ static pte_t *riscv_iommu_pte_fetch(struct riscv_iom= mu_domain *domain, ptr +=3D ((iova >> shift) & (PTRS_PER_PTE - 1)); pte =3D ptep_get(ptr); if (_io_pte_present(pte) && _io_pte_leaf(pte)) { - *pte_pgsize =3D (size_t)1 << shift; + if (pte_napot(pte)) + *pte_pgsize =3D napot_cont_size(napot_cont_order(pte)); + else + *pte_pgsize =3D (size_t)1 << shift; return ptr; } if (_io_pte_none(pte)) @@ -1274,13 +1298,18 @@ static int riscv_iommu_map_pages(struct iommu_domai= n *iommu_domain, { struct riscv_iommu_domain *domain =3D iommu_domain_to_riscv(iommu_domain); size_t size =3D 0; - pte_t *ptr; - pte_t pte; - unsigned long pte_prot; - int rc =3D 0, level; + pte_t *ptr, old, pte; + unsigned long pte_prot, order =3D 0; + int rc =3D 0, level, i; spinlock_t *ptl; /* page table page lock */ LIST_HEAD(freelist); =20 + if (iova & (pgsize - 1)) + return -EINVAL; + + if (is_napot_size(pgsize)) + order =3D napot_size_to_order(pgsize); + if (!(prot & IOMMU_WRITE)) pte_prot =3D _PAGE_BASE | _PAGE_READ; else if (domain->amo_enabled) @@ -1297,9 +1326,27 @@ static int riscv_iommu_map_pages(struct iommu_domain= *iommu_domain, =20 level =3D pgsize_to_level(pgsize); ptl =3D riscv_iommu_ptlock(domain, ptr, level); - riscv_iommu_pte_free(domain, ptep_get(ptr), level, &freelist); + + old =3D ptep_get(ptr); + if (pte_napot(old) && napot_cont_size(napot_cont_order(old)) > pgsize) { + spin_unlock(ptl); + rc =3D -EFAULT; + break; + } + pte =3D _io_pte_entry(phys_to_pfn(phys), pte_prot); - set_pte(ptr, pte); + if (order) { + pte =3D pte_mknapot(pte, order); + for (i =3D 0; i < napot_pte_num(order); i++, ptr++) { + old =3D ptep_get(ptr); + riscv_iommu_pte_free(domain, old, level, &freelist); + set_pte(ptr, pte); + } + } else { + riscv_iommu_pte_free(domain, old, level, &freelist); + set_pte(ptr, pte); + } + spin_unlock(ptl); =20 size +=3D pgsize; @@ -1336,6 +1383,9 @@ static size_t riscv_iommu_unmap_pages(struct iommu_do= main *iommu_domain, size_t unmapped =3D 0; size_t pte_size; spinlock_t *ptl; /* page table page lock */ + unsigned long pte_num; + pte_t pte; + int i; =20 while (unmapped < size) { ptr =3D riscv_iommu_pte_fetch(domain, iova, &pte_size); @@ -1347,7 +1397,21 @@ static size_t riscv_iommu_unmap_pages(struct iommu_d= omain *iommu_domain, return unmapped; =20 ptl =3D riscv_iommu_ptlock(domain, ptr, pgsize_to_level(pte_size)); - set_pte(ptr, __pte(0)); + if (is_napot_size(pte_size)) { + pte =3D ptep_get(ptr); + + if (!pte_napot(pte) || + napot_cont_size(napot_cont_order(pte)) !=3D pte_size) { + spin_unlock(ptl); + return unmapped; + } + + pte_num =3D napot_pte_num(napot_cont_order(pte)); + for (i =3D 0; i < pte_num; i++, ptr++) + set_pte(ptr, __pte(0)); + } else { + set_pte(ptr, __pte(0)); + } spin_unlock(ptl); =20 iommu_iotlb_gather_add_page(&domain->domain, gather, iova, @@ -1447,6 +1511,7 @@ static struct iommu_domain *riscv_iommu_alloc_paging_= domain(struct device *dev) unsigned int pgd_mode; dma_addr_t va_mask; int va_bits, level; + size_t order; =20 iommu =3D dev_to_iommu(dev); if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV57) { @@ -1506,6 +1571,9 @@ static struct iommu_domain *riscv_iommu_alloc_paging_= domain(struct device *dev) domain->domain.geometry.aperture_end =3D va_mask; domain->domain.geometry.force_aperture =3D true; domain->domain.pgsize_bitmap =3D va_mask & (SZ_4K | SZ_2M | SZ_1G | SZ_51= 2G); + if (has_svnapot()) + for_each_napot_order(order) + domain->domain.pgsize_bitmap |=3D napot_cont_size(order) & va_mask; =20 domain->domain.ops =3D &riscv_iommu_paging_domain_ops; =20 --=20 2.20.1