From nobody Sun Feb 8 08:27:55 2026 Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 67F85227E8F for ; Mon, 10 Mar 2025 13:22:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741612956; cv=none; b=Mgmx6b0jJwy4tkl7eygAQrcADGx7PTM7xj4gBf1SmRJKNDOYmUU+eHV2RDWaHx5vKnW/SO6wwIDqqgbXcRlZmTdBlnO7ldul9faXfFCcHw0PeeNa4uBBUeP4C1NlRJg6JjLOGMQc/sI47p69EYMwigyrNy/Ctr/81BVom3bLYaw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741612956; c=relaxed/simple; bh=b0Vp9alDNo3p5bMlniWA8CX7hk8UNqOsPhZqDPfDa9s=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=G5LVfMN8awCNVjvRQPgxKP77ygojoT8oMGqqrrz9pU11SiFT/qZxJXrztGSXYJXcVbno8enabmFAkAfnHEyGIA1VvbWLkOUrSM0aHmJJ9x0HlgeBcvv/TezXRQZMzqpDFepWKVteR82yMxuLZF5e8/PJis5OUrcPgkt83rthVI0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=SnCvSyPT; arc=none smtp.client-ip=209.85.214.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="SnCvSyPT" Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-2234e4b079cso72429465ad.1 for ; Mon, 10 Mar 2025 06:22:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1741612953; x=1742217753; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=FE1/PtV/9zmh+GtmYcbFmUleMVWqQrw0bsQW718u9NM=; b=SnCvSyPTFO9uVhYY09RvH4AYGU2/bE4wbitt+oDI4OH6klBWlHSOxbFks7WGc/M6/E drF/iAa/4b6M/qgFGH6Ay3pGyAPpXJPvJS+tKVoUuVfm0AYhbnxfPZI8MwPuMP0VZIgU 3VVXya2T3zaY5H2sr3gup+99hIPgJTmjYXWb3//m1skocdx21jgqvr9FJNICRxaxf0YY ATZ74mc2TTaAnjU75nw3cA7YKXziOjNqrDFXfLsT50zW7XVSvRgHmOZ/OSvSOsDR9rgG olZyVm5ZcLbAvldMN7rKjHKgh8lphDkfLYv55EgwFh0GwbsXgf2G5MgLRCGmF3CbiVkY PkKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741612953; x=1742217753; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FE1/PtV/9zmh+GtmYcbFmUleMVWqQrw0bsQW718u9NM=; b=Ui29Hc3YPX4IyUBazXha3nZrh7MTf4msqi2PUBQtiwR7jY0yETBYYaxnbZ6UcxxpA9 LELBWFzRRGLbM/sCF4p8VZ6KCGksbpmp8fbn4dlhKIcsBNNQkpNEXSKWqegJ8Ym51bvL MrEbVfFm4YLT9UQWxpCWYQpaE85mGg9OsJbloLW8TtbYPKx3GqLMFkussJPOFzCPou8T DsuLT/EdxsbrCN9+lOFkuoOiDf1LnV4gi6JWAdBa7qXQpBDOwPPUugINWqMpk/7V7P78 1deOUBKqvH7PAKGi5kc2qbaU6N4gMyM3Z03zab3wSv+fd1EjkfGTNoD3gp+SjKQZg1yA g1hg== X-Forwarded-Encrypted: i=1; AJvYcCWBh5IpNgz/13rhhtwNwqGr6T5KHrlxLSyr2KmIvceb56wrErIW7eSgVVc/qgNOsqFLvKuijDE1Xc32L74=@vger.kernel.org X-Gm-Message-State: AOJu0YztjJw7qXDj7/kc8YEtWYPatYGm1xtIUQ5cy/jjvuG4DnkUqLlx bIZ3DGFoaqqIlyfYemFzxFUL3HoANTU3Lz49SJRoLuRd+lCR6kQT14kZfTocMpk= X-Gm-Gg: ASbGncveialYC+yJ/VHEiiFjaAZg2B+MHR9CEO6F7FqIN7nDag65qERoFly/as/CgxV fj+ApRPNzxhLqlQOxywaPFNra5Iab4qzMv2FwEFBI0z9ki5FDZyc8bRsxHx18OONJQ5RNegZvd6 KM+LxBv4Ub+bilIv/rmqAJqmDNvoj1yHSAACsPeyDuUNKQDcefgsgyVBl2v39vaAlHRPL5UtIkx QkJm0WO3od2xTIepuTASSXLjolGpf6jnh8x9RuJM2veu6Ri6bLTu1gTPpZDqr+o/srLroZ5kbkB VlzH6sek7I9hW/+cQHUBMt1fmw4iKTR8andocR1eeeg7yoaAtJpEHVQzsgBq1DGXcBCRfhytLlr EpCHJ4G/0XwN59Xz9tOuTpf0eHuI= X-Google-Smtp-Source: AGHT+IFOJcXfNCpatjHpLLH+KqUeCMuEXwNM6dMHRsGOsPG3x+GF1UVPkseaFy0PQweH89Hz1NMnYg== X-Received: by 2002:a17:902:cccf:b0:224:194c:694c with SMTP id d9443c01a7336-22428aaeb6dmr273817455ad.28.1741612953507; Mon, 10 Mar 2025 06:22:33 -0700 (PDT) Received: from J9GPGXL7NT.bytedance.net ([61.213.176.55]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-224109e8617sm77318785ad.61.2025.03.10.06.22.30 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 10 Mar 2025 06:22:33 -0700 (PDT) From: Xu Lu To: akpm@linux-foundation.org, tjeznach@rivosinc.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com Cc: lihangjing@bytedance.com, xieyongji@bytedance.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Xu Lu Subject: [PATCH 1/4] mm/gup: Handle huge pte for follow_page_pte() Date: Mon, 10 Mar 2025 21:22:19 +0800 Message-Id: <20250310132222.58378-2-luxu.kernel@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250310132222.58378-1-luxu.kernel@bytedance.com> References: <20250310132222.58378-1-luxu.kernel@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Page mapped at pte level can also be huge page when ARM CONT_PTE or RISC-V SVNAPOT is enabled. Handle this scenario in follow_page_pte. Signed-off-by: Xu Lu --- arch/riscv/include/asm/pgtable.h | 6 ++++++ include/linux/pgtable.h | 8 ++++++++ mm/gup.c | 22 ++++++++++++++++------ 3 files changed, 30 insertions(+), 6 deletions(-) diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgta= ble.h index 050fdc49b5ad7..40ae5979dd82c 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -800,6 +800,12 @@ static inline bool pud_user_accessible_page(pud_t pud) #endif =20 #ifdef CONFIG_TRANSPARENT_HUGEPAGE +#define pte_trans_huge pte_trans_huge +static inline int pte_trans_huge(pte_t pte) +{ + return pte_huge(pte) && pte_napot(pte); +} + static inline int pmd_trans_huge(pmd_t pmd) { return pmd_leaf(pmd); diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 94d267d02372e..3f57ee6dcf017 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1584,6 +1584,14 @@ static inline unsigned long my_zero_pfn(unsigned lon= g addr) =20 #ifdef CONFIG_MMU =20 +#if (defined(CONFIG_TRANSPARENT_HUGEPAGE) && !defined(pte_trans_huge)) || \ + (!defined(CONFIG_TRANSPARENT_HUGEPAGE)) +static inline int pte_trans_huge(pte_t pte) +{ + return 0; +} +#endif + #ifndef CONFIG_TRANSPARENT_HUGEPAGE static inline int pmd_trans_huge(pmd_t pmd) { diff --git a/mm/gup.c b/mm/gup.c index 3883b307780ea..84710896f42eb 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -838,11 +838,12 @@ static inline bool can_follow_write_pte(pte_t pte, st= ruct page *page, =20 static struct page *follow_page_pte(struct vm_area_struct *vma, unsigned long address, pmd_t *pmd, unsigned int flags, - struct dev_pagemap **pgmap) + struct follow_page_context *ctx) { struct mm_struct *mm =3D vma->vm_mm; struct folio *folio; struct page *page; + struct hstate *h; spinlock_t *ptl; pte_t *ptep, pte; int ret; @@ -879,8 +880,8 @@ static struct page *follow_page_pte(struct vm_area_stru= ct *vma, * case since they are only valid while holding the pgmap * reference. */ - *pgmap =3D get_dev_pagemap(pte_pfn(pte), *pgmap); - if (*pgmap) + ctx->pgmap =3D get_dev_pagemap(pte_pfn(pte), ctx->pgmap); + if (ctx->pgmap) page =3D pte_page(pte); else goto no_page; @@ -940,6 +941,15 @@ static struct page *follow_page_pte(struct vm_area_str= uct *vma, */ folio_mark_accessed(folio); } + if (is_vm_hugetlb_page(vma)) { + h =3D hstate_vma(vma); + WARN_ON_ONCE(page_size(page) !=3D huge_page_size(h)); + page +=3D (address & (huge_page_size(h) - 1)) >> PAGE_SHIFT; + ctx->page_mask =3D (1 << huge_page_order(h)) - 1; + } else if (pte_trans_huge(pte)) { + page +=3D (address & (page_size(page) - 1)) >> PAGE_SHIFT; + ctx->page_mask =3D (page_size(page) >> PAGE_SHIFT) - 1; + } out: pte_unmap_unlock(ptep, ptl); return page; @@ -975,7 +985,7 @@ static struct page *follow_pmd_mask(struct vm_area_stru= ct *vma, return no_page_table(vma, flags, address); } if (likely(!pmd_leaf(pmdval))) - return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); + return follow_page_pte(vma, address, pmd, flags, ctx); =20 if (pmd_protnone(pmdval) && !gup_can_follow_protnone(vma, flags)) return no_page_table(vma, flags, address); @@ -988,14 +998,14 @@ static struct page *follow_pmd_mask(struct vm_area_st= ruct *vma, } if (unlikely(!pmd_leaf(pmdval))) { spin_unlock(ptl); - return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); + return follow_page_pte(vma, address, pmd, flags, ctx); } if (pmd_trans_huge(pmdval) && (flags & FOLL_SPLIT_PMD)) { spin_unlock(ptl); split_huge_pmd(vma, pmd, address); /* If pmd was left empty, stuff a page table in there quickly */ return pte_alloc(mm, pmd) ? ERR_PTR(-ENOMEM) : - follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); + follow_page_pte(vma, address, pmd, flags, ctx); } page =3D follow_huge_pmd(vma, address, pmd, flags, ctx); spin_unlock(ptl); --=20 2.20.1 From nobody Sun Feb 8 08:27:55 2026 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 09F4A22A4C3 for ; Mon, 10 Mar 2025 13:22:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741612959; cv=none; b=NXLS8Wq9VBcmT9wla8lefwPZnwlAX8yhwI5QK+c1GXgvHkh9NhIC0OH+cb8V0t7tG0456fCR5ZWIcntCDhafR31SeFVwqT5ViLOsW7m5Ib8soIrXBRGxG6uapxBIvIA4GsybmvNK01tlKSGq4uqn11eZfoDx8CSUcGNDIEEKic8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741612959; c=relaxed/simple; bh=Nmr5d7AwhXzi32rUJoW0NKVITyBiEMPY6kXQ9pwa5fM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=LvTJNXy4/k2CVsG734jNPycI7wsFiKFQaPsSZ1xGwp1Sa8eRFaSRoYSaLxYIpBVyTLQvKuavUi1dj2WbdcIblbKo/HrlpF53UON1Mg5MdoVm9Nrfn9vUsqfvI8K/FO1JH9JUoOI4dA+EkckdmOMljfg6KdJnSf1t5i3PQJOk2U0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=Nf9tyn0U; arc=none smtp.client-ip=209.85.214.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="Nf9tyn0U" Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-225477548e1so28404535ad.0 for ; Mon, 10 Mar 2025 06:22:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1741612957; x=1742217757; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+ZMOD/Qf6hHTG0QsYxy12Mjhes8Th3228e4iu7j8c1w=; b=Nf9tyn0U093zi8ErEFiueApqtgSDjC8AVdhC3u5SeFnypk85/YSph0RnbmeRACMeuM XeJkXVezGHDvYmYEKwiR8QCNOv47eZjqZ/fk03fevkPLFcmCKfjFhtNo2C1Had6BO0MT ksuggH8TsXLUwvleGMVGvcsVBIf1r9i8dEGyBSfLJx6E91JH2pUsBKeKYmy6G2vgHkUP zxuOKXuqXkIFQ8V5qN1IChqdUaJbB+DP4HSJHkZFjGVpHx7r+H/6aU985xGJEjRXn82k pHhxqZbpGZJJfLiq+AaL9P/tsBLASYKgrRG2Phex6fTIKvDn6iFgKYrwegiTnmSUyfD/ A2PA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741612957; x=1742217757; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+ZMOD/Qf6hHTG0QsYxy12Mjhes8Th3228e4iu7j8c1w=; b=hQPx4BWo5lx1ZuURicJRKP0fuWsLEmWTTIgJtX2Sn0Qghh2d1QX5VMVooUsch2ZJwo 3UmhR9VxigqJfrfQ1Z5NSbJou/d8QPp80QfB5LeGIA82y535E+WQie2qBnP76u1B2FNj 1QQYeEOwsXBsRCWzXtOO7UEvboQLUKJ36L+M7Xqid1PJjJKQoRX9MDEZkBtUzzZQVB3U wReyreujTE2w/hwP7EdCUdmfT1qAaJKUOXDcImUrp7Yofkae0IMSzFaLGKSbHBd/r+Vm wXJJCdqkQ4NRPop0Hi8Tg66Czmp2eee0miU2aO6UtZOpyFHZCu0dlIpedoN+Tvi3W3XR D3Jw== X-Forwarded-Encrypted: i=1; AJvYcCWroAo8vM7/+EUGBx24OeH9JPcWJGnDrlnXY95goAB6PKa8Xc/XFu4InJA3cBKVgPHOUFmZWKtgVj4sS5g=@vger.kernel.org X-Gm-Message-State: AOJu0YxT6QqRhFPhfY0VbuhQnxTDu2iBstqz2foaW8n3FMZCMJX3iZ5h 7dQFQZOLzyH2EAjXNJJrHA1dnJlS8dG8/+TgWXmjalT5gFYk38zgRTdPNSW4dQM= X-Gm-Gg: ASbGncvKWQpEdrzGCdNBCe767CKDSjsHot1Xds1NtJCoQ6OpwWGesSqzJ9738NpuOAR PfC4ard5as+ppNMtRrZFXjlXq/+tQb5z94aRxZw8CUhQDV8TOhM4+ITlPzCiboQY0z9DFzmYwyD wpa2Vj48ymNxUZBgsXXhIuFEthA/5SsrzREZdXcSHRyMXkioRiW8j06mYlAuNfTmcz9/ceTAJzC MafEzbNfhFt50lQQeEbiqX8OT0EWTw2VidqGUsJL/1BNLJ6d2e8/n0hI89rnAAlOZFoAF0XYsTy czbipYrVDVR3TVHBUdS4WtlKFdp5FMlVV4e2AOvtGgvU5F5evqJl/ehQHuFY6HiLUKFXMIX74k1 WN85S2pVBAmlm9oZ1RVgWn98pBG8= X-Google-Smtp-Source: AGHT+IFwyZ7/tPtBqU0g9RIHfp+iyK4Go8JH9gtSnoe0FV3ZnIyO4IbhPZG1IS3E7YfaXc51VxACFw== X-Received: by 2002:a17:903:1790:b0:224:2a6d:55ae with SMTP id d9443c01a7336-2242a6d585emr214130785ad.48.1741612957112; Mon, 10 Mar 2025 06:22:37 -0700 (PDT) Received: from J9GPGXL7NT.bytedance.net ([61.213.176.55]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-224109e8617sm77318785ad.61.2025.03.10.06.22.33 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 10 Mar 2025 06:22:36 -0700 (PDT) From: Xu Lu To: akpm@linux-foundation.org, tjeznach@rivosinc.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com Cc: lihangjing@bytedance.com, xieyongji@bytedance.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Xu Lu Subject: [PATCH 2/4] iommu/riscv: Use pte_t to represent page table entry Date: Mon, 10 Mar 2025 21:22:20 +0800 Message-Id: <20250310132222.58378-3-luxu.kernel@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250310132222.58378-1-luxu.kernel@bytedance.com> References: <20250310132222.58378-1-luxu.kernel@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Since RISC-V IOMMU has the same pte format and translation process with MMU as is specified in RISC-V Privileged specification, we use pte_t to represent IOMMU pte too to reuse existing pte operation functions. Signed-off-by: Xu Lu --- drivers/iommu/riscv/iommu.c | 66 ++++++++++++++++++------------------- 1 file changed, 33 insertions(+), 33 deletions(-) diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c index 8f049d4a0e2cb..f752096989a79 100644 --- a/drivers/iommu/riscv/iommu.c +++ b/drivers/iommu/riscv/iommu.c @@ -812,7 +812,7 @@ struct riscv_iommu_domain { bool amo_enabled; int numa_node; unsigned int pgd_mode; - unsigned long *pgd_root; + pte_t *pgd_root; }; =20 #define iommu_domain_to_riscv(iommu_domain) \ @@ -1081,27 +1081,29 @@ static void riscv_iommu_iotlb_sync(struct iommu_dom= ain *iommu_domain, =20 #define PT_SHIFT (PAGE_SHIFT - ilog2(sizeof(pte_t))) =20 -#define _io_pte_present(pte) ((pte) & (_PAGE_PRESENT | _PAGE_PROT_NONE)) -#define _io_pte_leaf(pte) ((pte) & _PAGE_LEAF) -#define _io_pte_none(pte) ((pte) =3D=3D 0) -#define _io_pte_entry(pn, prot) ((_PAGE_PFN_MASK & ((pn) << _PAGE_PFN_SHIF= T)) | (prot)) +#define _io_pte_present(pte) (pte_val(pte) & (_PAGE_PRESENT | _PAGE_PROT_N= ONE)) +#define _io_pte_leaf(pte) (pte_val(pte) & _PAGE_LEAF) +#define _io_pte_none(pte) (pte_val(pte) =3D=3D 0) +#define _io_pte_entry(pn, prot) (__pte((_PAGE_PFN_MASK & ((pn) << _PAGE_PF= N_SHIFT)) | (prot))) =20 static void riscv_iommu_pte_free(struct riscv_iommu_domain *domain, - unsigned long pte, struct list_head *freelist) + pte_t pte, struct list_head *freelist) { - unsigned long *ptr; + pte_t *ptr; int i; =20 if (!_io_pte_present(pte) || _io_pte_leaf(pte)) return; =20 - ptr =3D (unsigned long *)pfn_to_virt(__page_val_to_pfn(pte)); + ptr =3D (pte_t *)pfn_to_virt(pte_pfn(pte)); =20 /* Recursively free all sub page table pages */ for (i =3D 0; i < PTRS_PER_PTE; i++) { - pte =3D READ_ONCE(ptr[i]); - if (!_io_pte_none(pte) && cmpxchg_relaxed(ptr + i, pte, 0) =3D=3D pte) + pte =3D ptr[i]; + if (!_io_pte_none(pte)) { + ptr[i] =3D __pte(0); riscv_iommu_pte_free(domain, pte, freelist); + } } =20 if (freelist) @@ -1110,12 +1112,12 @@ static void riscv_iommu_pte_free(struct riscv_iommu= _domain *domain, iommu_free_page(ptr); } =20 -static unsigned long *riscv_iommu_pte_alloc(struct riscv_iommu_domain *dom= ain, +static pte_t *riscv_iommu_pte_alloc(struct riscv_iommu_domain *domain, unsigned long iova, size_t pgsize, gfp_t gfp) { - unsigned long *ptr =3D domain->pgd_root; - unsigned long pte, old; + pte_t *ptr =3D domain->pgd_root; + pte_t pte, old; int level =3D domain->pgd_mode - RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2; void *addr; =20 @@ -1131,7 +1133,7 @@ static unsigned long *riscv_iommu_pte_alloc(struct ri= scv_iommu_domain *domain, if (((size_t)1 << shift) =3D=3D pgsize) return ptr; pte_retry: - pte =3D READ_ONCE(*ptr); + pte =3D ptep_get(ptr); /* * This is very likely incorrect as we should not be adding * new mapping with smaller granularity on top @@ -1154,31 +1156,31 @@ static unsigned long *riscv_iommu_pte_alloc(struct = riscv_iommu_domain *domain, goto pte_retry; } } - ptr =3D (unsigned long *)pfn_to_virt(__page_val_to_pfn(pte)); + ptr =3D (pte_t *)pfn_to_virt(pte_pfn(pte)); } while (level-- > 0); =20 return NULL; } =20 -static unsigned long *riscv_iommu_pte_fetch(struct riscv_iommu_domain *dom= ain, - unsigned long iova, size_t *pte_pgsize) +static pte_t *riscv_iommu_pte_fetch(struct riscv_iommu_domain *domain, + unsigned long iova, size_t *pte_pgsize) { - unsigned long *ptr =3D domain->pgd_root; - unsigned long pte; + pte_t *ptr =3D domain->pgd_root; + pte_t pte; int level =3D domain->pgd_mode - RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2; =20 do { const int shift =3D PAGE_SHIFT + PT_SHIFT * level; =20 ptr +=3D ((iova >> shift) & (PTRS_PER_PTE - 1)); - pte =3D READ_ONCE(*ptr); + pte =3D ptep_get(ptr); if (_io_pte_present(pte) && _io_pte_leaf(pte)) { *pte_pgsize =3D (size_t)1 << shift; return ptr; } if (_io_pte_none(pte)) return NULL; - ptr =3D (unsigned long *)pfn_to_virt(__page_val_to_pfn(pte)); + ptr =3D (pte_t *)pfn_to_virt(pte_pfn(pte)); } while (level-- > 0); =20 return NULL; @@ -1191,8 +1193,9 @@ static int riscv_iommu_map_pages(struct iommu_domain = *iommu_domain, { struct riscv_iommu_domain *domain =3D iommu_domain_to_riscv(iommu_domain); size_t size =3D 0; - unsigned long *ptr; - unsigned long pte, old, pte_prot; + pte_t *ptr; + pte_t pte, old; + unsigned long pte_prot; int rc =3D 0; LIST_HEAD(freelist); =20 @@ -1210,10 +1213,9 @@ static int riscv_iommu_map_pages(struct iommu_domain= *iommu_domain, break; } =20 - old =3D READ_ONCE(*ptr); + old =3D ptep_get(ptr); pte =3D _io_pte_entry(phys_to_pfn(phys), pte_prot); - if (cmpxchg_relaxed(ptr, old, pte) !=3D old) - continue; + set_pte(ptr, pte); =20 riscv_iommu_pte_free(domain, old, &freelist); =20 @@ -1247,7 +1249,7 @@ static size_t riscv_iommu_unmap_pages(struct iommu_do= main *iommu_domain, { struct riscv_iommu_domain *domain =3D iommu_domain_to_riscv(iommu_domain); size_t size =3D pgcount << __ffs(pgsize); - unsigned long *ptr, old; + pte_t *ptr; size_t unmapped =3D 0; size_t pte_size; =20 @@ -1260,9 +1262,7 @@ static size_t riscv_iommu_unmap_pages(struct iommu_do= main *iommu_domain, if (iova & (pte_size - 1)) return unmapped; =20 - old =3D READ_ONCE(*ptr); - if (cmpxchg_relaxed(ptr, old, 0) !=3D old) - continue; + set_pte(ptr, __pte(0)); =20 iommu_iotlb_gather_add_page(&domain->domain, gather, iova, pte_size); @@ -1279,13 +1279,13 @@ static phys_addr_t riscv_iommu_iova_to_phys(struct = iommu_domain *iommu_domain, { struct riscv_iommu_domain *domain =3D iommu_domain_to_riscv(iommu_domain); size_t pte_size; - unsigned long *ptr; + pte_t *ptr; =20 ptr =3D riscv_iommu_pte_fetch(domain, iova, &pte_size); - if (_io_pte_none(*ptr) || !_io_pte_present(*ptr)) + if (_io_pte_none(ptep_get(ptr)) || !_io_pte_present(ptep_get(ptr))) return 0; =20 - return pfn_to_phys(__page_val_to_pfn(*ptr)) | (iova & (pte_size - 1)); + return pfn_to_phys(pte_pfn(ptep_get(ptr))) | (iova & (pte_size - 1)); } =20 static void riscv_iommu_free_paging_domain(struct iommu_domain *iommu_doma= in) --=20 2.20.1 From nobody Sun Feb 8 08:27:55 2026 Received: from mail-pj1-f44.google.com (mail-pj1-f44.google.com [209.85.216.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9B45B22AE52 for ; Mon, 10 Mar 2025 13:22:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741612963; cv=none; b=BxC8Z0cAiDhO/aCP8qtY1cw7VmJTtKcJ/OEIRO4DSSTmdLOL6cUy3UAWCBvrdsg8feTvvkTsH5YnplLWfYRB4xlFNScue+Uiu7ZBavgXo0N+GqS8gL5+5vpe3JVZUxk82ORzC9h2t/kJA2gFdJ5pROWmr3YOzUT+LkPh0yI9DUQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741612963; c=relaxed/simple; bh=EcQ9/0kuftjV10r1Jc9Z3p2FVmHPYosUXDwhSMywLSQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Od9L+63JRRlBzNObsxRve4IM+TiJ1oi9t9ii9WUij2u+LXVITCet1tXyocs3SJW9Plc5OoHx3PnCqSMCvqOpBGNpfbd08/uRUv6HTunCSQtJLmG3W8n25FCQdaHUzFyDnqa6S9Kub6KzasW1PZw+nGBY8SHRXitZ96jl21q8hz8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=lTynP7tR; arc=none smtp.client-ip=209.85.216.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="lTynP7tR" Received: by mail-pj1-f44.google.com with SMTP id 98e67ed59e1d1-2feae794508so6235919a91.0 for ; Mon, 10 Mar 2025 06:22:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1741612961; x=1742217761; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=CbZEX0tRaUwuzIIhwgz0oK3rdSaEKh+GGhpQM1uwbPo=; b=lTynP7tRHpUuChbFzKvX2JJwUC/pzpK9b+YJGE9Jtd5cCO9aXirtBGejdetxfSP81f Y0qBc+YxCFIyitoa8M+XvmOtsAqEl+JJPk4hOuUu9gXLJhIWo0D1k0/uBETrAdDOF/Mk KcqvMg6lmAbO8txJyYNlLbJHAbNH+jd0M6eJc/pf1qdxAtIx8ZImhG09HGyNoB73/08j ZaQYPxSnD31fFEQeN1EiFqukXM+zPWphhGXf+KYpJ8Zl2dRLa0vIuIs8geTIv4CEKLTw 1vTKhe0W8iSFIKxhVnAzFZPWJdxlBE0A/vyQIY8zeFxfLgCANkm1LKRkLkVsw7NpYwSx L8+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741612961; x=1742217761; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CbZEX0tRaUwuzIIhwgz0oK3rdSaEKh+GGhpQM1uwbPo=; b=V4MlFzYOFaYWidZhuy8W0XLs/l/Ixtb/G4hMggHJOdkQbfz6ocP0fUoOQyvzss93aG DDivPXLx3SrBd5Uw8Grf/z22hYa/KT1ySiPMdrrOyaHRa8O298N007WoCv1+SQAjxQd5 kzz8ggfE3K4GZj7bffiu7bzpNtaflOI7guimMbENzjBtxyruFlflhwpC0q3YZiW53HoD vwSbh10S1p4IFRYfcDmA+dDOMLCX24kpk/5fN0fwSe4qVI/riFRB4/ni1WULf8olwJmx OrcKHBiijBrHCQg2MZjRPN+dc6Wy3jZFwz1NCte2ubX34WwreMMQiAcJIV5ZXX7QM5KD Ts5w== X-Forwarded-Encrypted: i=1; AJvYcCWLaaaJk3VaEsj42ntjQLvCxDEgy9mLiyCg1pm0t8KokMxMxRNqjQnbjSAy/vBdQCsRTHlEfcAatOlJqvI=@vger.kernel.org X-Gm-Message-State: AOJu0YxhDImq/JaCJ8wf5EWsBWvSBrT2TG/CraFaphd2vX7cVNM/mCVU 970/wlM/rIHXdoYrw+JgcT2MyIEg7LFxf4fh9xQMU+TTsGpQZx+Tjrpj3EQpO6I= X-Gm-Gg: ASbGncs/TGEUWdPlcvpXJbPQR/hQ2GmkVb4RI+znxw4P4kIy1MFvxoDwdZPPtiDpnRO klwG95fwhwdm33gpjYgdOY3iJP8m00onpnB4l4n/xul7n2J14h97YCaPEBV0nrah6gjIpovBrOD FxnswbOwT9uxZSpIQ1h77F9LwoF02myGS/1vocnH8uiQGXARcw+8IkXQy1+xTpR160rI734+IRc oG/C+xX2kOYG5ZZXxOIsTBS2d6MbqN9CDbGcoZQkkhQI+NIx1pnL2p7gocKjdf6tQP1dCCB+69N 189d4whozFudhEToXZ2ekGn9qOic/PQf3AEFEId7lFWXiK5k7jon158ERWzLWT72witcEdpCLML S8jDH3aea7B//32vt2V0QViHgXxo= X-Google-Smtp-Source: AGHT+IHCQte13YpgjRC8VlNMd1j/6bcecbJqZgEElVCIYtvZlpVlkeYhvsrIux781DK+TTsFHbqgBw== X-Received: by 2002:a17:90b:4c4a:b0:2ff:7ad4:77b1 with SMTP id 98e67ed59e1d1-2ff7ce4f260mr22306375a91.2.1741612960728; Mon, 10 Mar 2025 06:22:40 -0700 (PDT) Received: from J9GPGXL7NT.bytedance.net ([61.213.176.55]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-224109e8617sm77318785ad.61.2025.03.10.06.22.37 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 10 Mar 2025 06:22:40 -0700 (PDT) From: Xu Lu To: akpm@linux-foundation.org, tjeznach@rivosinc.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com Cc: lihangjing@bytedance.com, xieyongji@bytedance.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Xu Lu Subject: [PATCH 3/4] iommu/riscv: Introduce IOMMU page table lock Date: Mon, 10 Mar 2025 21:22:21 +0800 Message-Id: <20250310132222.58378-4-luxu.kernel@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250310132222.58378-1-luxu.kernel@bytedance.com> References: <20250310132222.58378-1-luxu.kernel@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce page table lock to address competition issues when modifying multiple PTEs, for example, when applying Svnapot. We use fine-grained page table locks to minimize lock contention. Signed-off-by: Xu Lu --- drivers/iommu/riscv/iommu.c | 126 ++++++++++++++++++++++++++++++------ 1 file changed, 108 insertions(+), 18 deletions(-) diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c index f752096989a79..ffc474987a075 100644 --- a/drivers/iommu/riscv/iommu.c +++ b/drivers/iommu/riscv/iommu.c @@ -808,6 +808,7 @@ struct riscv_iommu_domain { struct iommu_domain domain; struct list_head bonds; spinlock_t lock; /* protect bonds list updates. */ + spinlock_t page_table_lock; /* protect page table updates. */ int pscid; bool amo_enabled; int numa_node; @@ -1086,8 +1087,80 @@ static void riscv_iommu_iotlb_sync(struct iommu_doma= in *iommu_domain, #define _io_pte_none(pte) (pte_val(pte) =3D=3D 0) #define _io_pte_entry(pn, prot) (__pte((_PAGE_PFN_MASK & ((pn) << _PAGE_PF= N_SHIFT)) | (prot))) =20 +#define RISCV_IOMMU_PMD_LEVEL 1 + +static bool riscv_iommu_ptlock_init(struct ptdesc *ptdesc, int level) +{ + if (level <=3D RISCV_IOMMU_PMD_LEVEL) + return ptlock_init(ptdesc); + return true; +} + +static void riscv_iommu_ptlock_free(struct ptdesc *ptdesc, int level) +{ + if (level <=3D RISCV_IOMMU_PMD_LEVEL) + ptlock_free(ptdesc); +} + +static spinlock_t *riscv_iommu_ptlock(struct riscv_iommu_domain *domain, + pte_t *pte, int level) +{ + spinlock_t *ptl; + +#ifdef CONFIG_SPLIT_PTE_PTLOCKS + if (level <=3D RISCV_IOMMU_PMD_LEVEL) + ptl =3D ptlock_ptr(page_ptdesc(virt_to_page(pte))); + else +#endif + ptl =3D &domain->page_table_lock; + spin_lock(ptl); + + return ptl; +} + +static void *riscv_iommu_alloc_pagetable_node(int numa_node, gfp_t gfp, in= t level) +{ + struct ptdesc *ptdesc; + void *addr; + + addr =3D iommu_alloc_page_node(numa_node, gfp); + if (!addr) + return NULL; + + ptdesc =3D page_ptdesc(virt_to_page(addr)); + if (!riscv_iommu_ptlock_init(ptdesc, level)) { + iommu_free_page(addr); + addr =3D NULL; + } + + return addr; +} + +static void riscv_iommu_free_pagetable(void *addr, int level) +{ + struct ptdesc *ptdesc =3D page_ptdesc(virt_to_page(addr)); + + riscv_iommu_ptlock_free(ptdesc, level); + iommu_free_page(addr); +} + +static int pgsize_to_level(size_t pgsize) +{ + int level =3D RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV57 - + RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2; + int shift =3D PAGE_SHIFT + PT_SHIFT * level; + + while (pgsize < ((size_t)1 << shift)) { + shift -=3D PT_SHIFT; + level--; + } + + return level; +} + static void riscv_iommu_pte_free(struct riscv_iommu_domain *domain, - pte_t pte, struct list_head *freelist) + pte_t pte, int level, + struct list_head *freelist) { pte_t *ptr; int i; @@ -1102,10 +1175,11 @@ static void riscv_iommu_pte_free(struct riscv_iommu= _domain *domain, pte =3D ptr[i]; if (!_io_pte_none(pte)) { ptr[i] =3D __pte(0); - riscv_iommu_pte_free(domain, pte, freelist); + riscv_iommu_pte_free(domain, pte, level - 1, freelist); } } =20 + riscv_iommu_ptlock_free(page_ptdesc(virt_to_page(ptr)), level); if (freelist) list_add_tail(&virt_to_page(ptr)->lru, freelist); else @@ -1117,8 +1191,9 @@ static pte_t *riscv_iommu_pte_alloc(struct riscv_iomm= u_domain *domain, gfp_t gfp) { pte_t *ptr =3D domain->pgd_root; - pte_t pte, old; + pte_t pte; int level =3D domain->pgd_mode - RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2; + spinlock_t *ptl; void *addr; =20 do { @@ -1146,15 +1221,21 @@ static pte_t *riscv_iommu_pte_alloc(struct riscv_io= mmu_domain *domain, * page table. This might race with other mappings, retry. */ if (_io_pte_none(pte)) { - addr =3D iommu_alloc_page_node(domain->numa_node, gfp); + addr =3D riscv_iommu_alloc_pagetable_node(domain->numa_node, gfp, + level - 1); if (!addr) return NULL; - old =3D pte; - pte =3D _io_pte_entry(virt_to_pfn(addr), _PAGE_TABLE); - if (cmpxchg_relaxed(ptr, old, pte) !=3D old) { - iommu_free_page(addr); + + ptl =3D riscv_iommu_ptlock(domain, ptr, level); + pte =3D ptep_get(ptr); + if (!_io_pte_none(pte)) { + spin_unlock(ptl); + riscv_iommu_free_pagetable(addr, level - 1); goto pte_retry; } + pte =3D _io_pte_entry(virt_to_pfn(addr), _PAGE_TABLE); + set_pte(ptr, pte); + spin_unlock(ptl); } ptr =3D (pte_t *)pfn_to_virt(pte_pfn(pte)); } while (level-- > 0); @@ -1194,9 +1275,10 @@ static int riscv_iommu_map_pages(struct iommu_domain= *iommu_domain, struct riscv_iommu_domain *domain =3D iommu_domain_to_riscv(iommu_domain); size_t size =3D 0; pte_t *ptr; - pte_t pte, old; + pte_t pte; unsigned long pte_prot; - int rc =3D 0; + int rc =3D 0, level; + spinlock_t *ptl; LIST_HEAD(freelist); =20 if (!(prot & IOMMU_WRITE)) @@ -1213,11 +1295,12 @@ static int riscv_iommu_map_pages(struct iommu_domai= n *iommu_domain, break; } =20 - old =3D ptep_get(ptr); + level =3D pgsize_to_level(pgsize); + ptl =3D riscv_iommu_ptlock(domain, ptr, level); + riscv_iommu_pte_free(domain, ptep_get(ptr), level, &freelist); pte =3D _io_pte_entry(phys_to_pfn(phys), pte_prot); set_pte(ptr, pte); - - riscv_iommu_pte_free(domain, old, &freelist); + spin_unlock(ptl); =20 size +=3D pgsize; iova +=3D pgsize; @@ -1252,6 +1335,7 @@ static size_t riscv_iommu_unmap_pages(struct iommu_do= main *iommu_domain, pte_t *ptr; size_t unmapped =3D 0; size_t pte_size; + spinlock_t *ptl; =20 while (unmapped < size) { ptr =3D riscv_iommu_pte_fetch(domain, iova, &pte_size); @@ -1262,7 +1346,9 @@ static size_t riscv_iommu_unmap_pages(struct iommu_do= main *iommu_domain, if (iova & (pte_size - 1)) return unmapped; =20 + ptl =3D riscv_iommu_ptlock(domain, ptr, pgsize_to_level(pte_size)); set_pte(ptr, __pte(0)); + spin_unlock(ptl); =20 iommu_iotlb_gather_add_page(&domain->domain, gather, iova, pte_size); @@ -1292,13 +1378,14 @@ static void riscv_iommu_free_paging_domain(struct i= ommu_domain *iommu_domain) { struct riscv_iommu_domain *domain =3D iommu_domain_to_riscv(iommu_domain); const unsigned long pfn =3D virt_to_pfn(domain->pgd_root); + int level =3D domain->pgd_mode - RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2; =20 WARN_ON(!list_empty(&domain->bonds)); =20 if ((int)domain->pscid > 0) ida_free(&riscv_iommu_pscids, domain->pscid); =20 - riscv_iommu_pte_free(domain, _io_pte_entry(pfn, _PAGE_TABLE), NULL); + riscv_iommu_pte_free(domain, _io_pte_entry(pfn, _PAGE_TABLE), level, NULL= ); kfree(domain); } =20 @@ -1359,7 +1446,7 @@ static struct iommu_domain *riscv_iommu_alloc_paging_= domain(struct device *dev) struct riscv_iommu_device *iommu; unsigned int pgd_mode; dma_addr_t va_mask; - int va_bits; + int va_bits, level; =20 iommu =3D dev_to_iommu(dev); if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV57) { @@ -1382,11 +1469,14 @@ static struct iommu_domain *riscv_iommu_alloc_pagin= g_domain(struct device *dev) =20 INIT_LIST_HEAD_RCU(&domain->bonds); spin_lock_init(&domain->lock); + spin_lock_init(&domain->page_table_lock); domain->numa_node =3D dev_to_node(iommu->dev); domain->amo_enabled =3D !!(iommu->caps & RISCV_IOMMU_CAPABILITIES_AMO_HWA= D); domain->pgd_mode =3D pgd_mode; - domain->pgd_root =3D iommu_alloc_page_node(domain->numa_node, - GFP_KERNEL_ACCOUNT); + level =3D domain->pgd_mode - RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2; + domain->pgd_root =3D riscv_iommu_alloc_pagetable_node(domain->numa_node, + GFP_KERNEL_ACCOUNT, + level); if (!domain->pgd_root) { kfree(domain); return ERR_PTR(-ENOMEM); @@ -1395,7 +1485,7 @@ static struct iommu_domain *riscv_iommu_alloc_paging_= domain(struct device *dev) domain->pscid =3D ida_alloc_range(&riscv_iommu_pscids, 1, RISCV_IOMMU_MAX_PSCID, GFP_KERNEL); if (domain->pscid < 0) { - iommu_free_page(domain->pgd_root); + riscv_iommu_free_pagetable(domain->pgd_root, level); kfree(domain); return ERR_PTR(-ENOMEM); } --=20 2.20.1 From nobody Sun Feb 8 08:27:55 2026 Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0B9BC22AE7F for ; Mon, 10 Mar 2025 13:22:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741612967; cv=none; b=To6tT7YyqxxA9sf1TLgh441g3EYirSa2vI6s/9BF8bV7w1D6tjp0R5KiHnhxvICw1Qh/ENgftpp6uEX3/Pve77V8DnZNnnD/sPWlbsyRLTiyS5RD7qoYOOh33RWSUcgGvLU3zmAJxXTCc59eAnRNqYFyd7zqk+v1Bxnan86JSA0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741612967; c=relaxed/simple; bh=0nfVO5sUm2e0FH3Gfvj26LwKA0FPWjtsqZ2XmVsGS9I=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=S8juHJRKYvIb/oKUJwPusAQSgIlJq4hl5cMqZK+wd6WUXLl29WtloIADH5xBIuwhFE5IKeOMI+4bCpPlhou+5MK/kPp1x8ydXxfY1b2/FBmapcUTrkAu9wYyYMw9EzIrU78UZ3VG3YRcHTZ37Tt+Mz/lL6l+O7XdyIrU2SHRtF0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=e5H1JXbw; arc=none smtp.client-ip=209.85.214.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="e5H1JXbw" Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-223f4c06e9fso69144385ad.1 for ; Mon, 10 Mar 2025 06:22:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1741612964; x=1742217764; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=BxeP0cya+7a8WL+U/abw2KfwIo+gie7J/iF7xNJEC1g=; b=e5H1JXbwuhyAP2NhvRy/d44AT0loD/wmjfjtePVIxUcI4dYC4ZQ+AnSxcVngneUUQC HgCTJMRv0/IjweucUUkSgvjCoklmcra7WeNZlj4oANwmXviYx/IaddZOLTAQs15RbITp g6oszNdU1NPKYaHVmy8UDjwLAHIQnmeF63mGuNfxEKiGJZjYqUNe/UtZ0GZEyMuC22Bo fkFTYnvIfWogbJWii6eLlBN54g5wJWgZTysFHjQwLezq/36TZBQhp9krxtNJFH31+RoC B6j/Y/+vlJlagcNwOQDYdR7fjztY/bAGdQInM8tOGPGyySU/niTXEfuhLoYo9ZuxFxH8 DuVA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741612964; x=1742217764; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BxeP0cya+7a8WL+U/abw2KfwIo+gie7J/iF7xNJEC1g=; b=eJErWgBIuaxCbAxx+ori1ZUHi9qu6gXIepbJw6Ftoqk28fUlYisNcwQoaIL34UhXcV FMD3W/IAcGJTDE8tnXukg9q1tihf9kaeDQ+F/4G/POv8XTinSSjJqiaJrS/WIq4obmlI wVVBw150NLRGCWdP+PhvZAirZePmZuk/7Q1qxlVp5KsaKzgJNQMfqy8mUQnsK2xBFvKA 3lFFchGVHjxgIQhy1O5S8H/lBgbqtW8bfevwhUm3NYGf20jvH0WWkCgknGVgpbAuyWQk 9IPtgpzGesWkPLWqJ9yd8qcTaGDvYSR/0r1SO4hYhFoH88E39zlU/y6x3tr/6AkdS3hq nnRQ== X-Forwarded-Encrypted: i=1; AJvYcCWv9Sgk/wg/ocgDt9divEAA7hLh+RwgUiVfTnZrfIKXle6exlXkrx/XoILq5Tdf39yg82NvS3+dqrz31AQ=@vger.kernel.org X-Gm-Message-State: AOJu0YzV5eYoeMRT/Q9YqN+rpYly0Rl4EhWRgyAmXhQ+QYQ99B/n5grj /wDEl6HlOI0l7OcIHuaPOXRtpq/yhXOIdC0p5JCNFPFQNM6nTpPbgpXkri7Q0gY= X-Gm-Gg: ASbGnctfhKqHe3kMjNnKSnTOBk9Gn8t6EcUKl8wVlILRvvNpxXAFNzYs30D1jYv9JU6 B/NzMKFFsCnlSNGwXOEO24aEGaJlxM88tFjLMXO2Ku1CZask18kgF2agEJRw5ujGQo3MSYRXVRw F0mm7wp2o2chahM24ykv+2puc7+8G+4JJnoCgzK2YSczR2jRS84c9eAqVNdb8vnidODzBoRDASq zchBWux5Jm4+1qiyTLPucSUWT9VXIup9tuYSmAYqtkBQKn2EwOysa4dWfqr5G5OWEhwpHrhz8C3 EkFcqrG9dHIPXbmNt0qMId3FpMcTZrRoiziiky1n5stSGvUp5xxqnEAytjof+k8PTfO+l9I47Nc LAF7YdsCcnlwSiS9JK3GJ7fb0zBU= X-Google-Smtp-Source: AGHT+IFGZvCmpJb7xpAqLXMSe32HiKjux3YzHuqmj20G2NlKQjkFcfFgUFq3QXpqyV1ePvUpOlmI/Q== X-Received: by 2002:a17:902:ce04:b0:215:6c5f:d142 with SMTP id d9443c01a7336-22464532f5emr148863585ad.20.1741612964286; Mon, 10 Mar 2025 06:22:44 -0700 (PDT) Received: from J9GPGXL7NT.bytedance.net ([61.213.176.55]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-224109e8617sm77318785ad.61.2025.03.10.06.22.41 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 10 Mar 2025 06:22:44 -0700 (PDT) From: Xu Lu To: akpm@linux-foundation.org, tjeznach@rivosinc.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com Cc: lihangjing@bytedance.com, xieyongji@bytedance.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Xu Lu Subject: [PATCH 4/4] iommu/riscv: Add support for Svnapot Date: Mon, 10 Mar 2025 21:22:22 +0800 Message-Id: <20250310132222.58378-5-luxu.kernel@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250310132222.58378-1-luxu.kernel@bytedance.com> References: <20250310132222.58378-1-luxu.kernel@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add Svnapot size as supported page size and apply Svnapot when it is possible. Signed-off-by: Xu Lu --- drivers/iommu/riscv/iommu.c | 85 +++++++++++++++++++++++++++++++++---- 1 file changed, 76 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c index ffc474987a075..379875d637901 100644 --- a/drivers/iommu/riscv/iommu.c +++ b/drivers/iommu/riscv/iommu.c @@ -1158,6 +1158,26 @@ static int pgsize_to_level(size_t pgsize) return level; } =20 +static unsigned long napot_size_to_order(unsigned long size) +{ + unsigned long order; + + if (!has_svnapot()) + return 0; + + for_each_napot_order(order) { + if (size =3D=3D napot_cont_size(order)) + return order; + } + + return 0; +} + +static bool is_napot_size(unsigned long size) +{ + return napot_size_to_order(size) !=3D 0; +} + static void riscv_iommu_pte_free(struct riscv_iommu_domain *domain, pte_t pte, int level, struct list_head *freelist) @@ -1205,7 +1225,8 @@ static pte_t *riscv_iommu_pte_alloc(struct riscv_iomm= u_domain *domain, * existing mapping with smaller granularity. Up to the caller * to replace and invalidate. */ - if (((size_t)1 << shift) =3D=3D pgsize) + if ((((size_t)1 << shift) =3D=3D pgsize) || + (is_napot_size(pgsize) && pgsize_to_level(pgsize) =3D=3D level)) return ptr; pte_retry: pte =3D ptep_get(ptr); @@ -1256,7 +1277,10 @@ static pte_t *riscv_iommu_pte_fetch(struct riscv_iom= mu_domain *domain, ptr +=3D ((iova >> shift) & (PTRS_PER_PTE - 1)); pte =3D ptep_get(ptr); if (_io_pte_present(pte) && _io_pte_leaf(pte)) { - *pte_pgsize =3D (size_t)1 << shift; + if (pte_napot(pte)) + *pte_pgsize =3D napot_cont_size(napot_cont_order(pte)); + else + *pte_pgsize =3D (size_t)1 << shift; return ptr; } if (_io_pte_none(pte)) @@ -1274,13 +1298,18 @@ static int riscv_iommu_map_pages(struct iommu_domai= n *iommu_domain, { struct riscv_iommu_domain *domain =3D iommu_domain_to_riscv(iommu_domain); size_t size =3D 0; - pte_t *ptr; - pte_t pte; - unsigned long pte_prot; - int rc =3D 0, level; + pte_t *ptr, old, pte; + unsigned long pte_prot, order =3D 0; + int rc =3D 0, level, i; spinlock_t *ptl; LIST_HEAD(freelist); =20 + if (iova & (pgsize - 1)) + return -EINVAL; + + if (is_napot_size(pgsize)) + order =3D napot_size_to_order(pgsize); + if (!(prot & IOMMU_WRITE)) pte_prot =3D _PAGE_BASE | _PAGE_READ; else if (domain->amo_enabled) @@ -1297,9 +1326,27 @@ static int riscv_iommu_map_pages(struct iommu_domain= *iommu_domain, =20 level =3D pgsize_to_level(pgsize); ptl =3D riscv_iommu_ptlock(domain, ptr, level); - riscv_iommu_pte_free(domain, ptep_get(ptr), level, &freelist); + + old =3D ptep_get(ptr); + if (pte_napot(old) && napot_cont_size(napot_cont_order(old)) > pgsize) { + spin_unlock(ptl); + rc =3D -EFAULT; + break; + } + pte =3D _io_pte_entry(phys_to_pfn(phys), pte_prot); - set_pte(ptr, pte); + if (order) { + pte =3D pte_mknapot(pte, order); + for (i =3D 0; i < napot_pte_num(order); i++, ptr++) { + old =3D ptep_get(ptr); + riscv_iommu_pte_free(domain, old, level, &freelist); + set_pte(ptr, pte); + } + } else { + riscv_iommu_pte_free(domain, old, level, &freelist); + set_pte(ptr, pte); + } + spin_unlock(ptl); =20 size +=3D pgsize; @@ -1336,6 +1383,9 @@ static size_t riscv_iommu_unmap_pages(struct iommu_do= main *iommu_domain, size_t unmapped =3D 0; size_t pte_size; spinlock_t *ptl; + unsigned long pte_num; + pte_t pte; + int i; =20 while (unmapped < size) { ptr =3D riscv_iommu_pte_fetch(domain, iova, &pte_size); @@ -1347,7 +1397,20 @@ static size_t riscv_iommu_unmap_pages(struct iommu_d= omain *iommu_domain, return unmapped; =20 ptl =3D riscv_iommu_ptlock(domain, ptr, pgsize_to_level(pte_size)); - set_pte(ptr, __pte(0)); + if (is_napot_size(pte_size)) { + pte =3D ptep_get(ptr); + + if (!pte_napot(pte) || + napot_cont_size(napot_cont_order(pte)) !=3D pte_size) { + spin_unlock(ptl); + return unmapped; + } + + pte_num =3D napot_pte_num(napot_cont_order(pte)); + for (i =3D 0; i < pte_num; i++, ptr++) + set_pte(ptr, __pte(0)); + } else + set_pte(ptr, __pte(0)); spin_unlock(ptl); =20 iommu_iotlb_gather_add_page(&domain->domain, gather, iova, @@ -1447,6 +1510,7 @@ static struct iommu_domain *riscv_iommu_alloc_paging_= domain(struct device *dev) unsigned int pgd_mode; dma_addr_t va_mask; int va_bits, level; + size_t order; =20 iommu =3D dev_to_iommu(dev); if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV57) { @@ -1506,6 +1570,9 @@ static struct iommu_domain *riscv_iommu_alloc_paging_= domain(struct device *dev) domain->domain.geometry.aperture_end =3D va_mask; domain->domain.geometry.force_aperture =3D true; domain->domain.pgsize_bitmap =3D va_mask & (SZ_4K | SZ_2M | SZ_1G | SZ_51= 2G); + if (has_svnapot()) + for_each_napot_order(order) + domain->domain.pgsize_bitmap |=3D napot_cont_size(order) & va_mask; =20 domain->domain.ops =3D &riscv_iommu_paging_domain_ops; =20 --=20 2.20.1