From nobody Mon Dec 29 20:12:48 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 355F6C5AD4C for ; Thu, 23 Nov 2023 06:57:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344800AbjKWG52 (ORCPT ); Thu, 23 Nov 2023 01:57:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34454 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231571AbjKWG5Z (ORCPT ); Thu, 23 Nov 2023 01:57:25 -0500 Received: from mail-oi1-x232.google.com (mail-oi1-x232.google.com [IPv6:2607:f8b0:4864:20::232]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B2D82C1 for ; Wed, 22 Nov 2023 22:57:31 -0800 (PST) Received: by mail-oi1-x232.google.com with SMTP id 5614622812f47-3b84402923fso329218b6e.0 for ; Wed, 22 Nov 2023 22:57:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1700722651; x=1701327451; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=9WCHkg8EwYgV8K7NhoHBdWRiWjVGGiMNhMP8Dvq/T3c=; b=IZ1oQT7rlVxuJ3/nFJrqcLkeNP0MNSZWN10NcvwoOrBiKjtTZcpGfzcxMj4yIn+UCi ylnoQDiFfw45PHuu9SNmF02NP/ZwoAvruEj7hXjQtJtHlkpfkohGPdDqBTUUHL9j5HdL xzORU0lNsNM1xDCLFyn9bUm19f4aYyi4+TGsp7aQcc51OyqoqWlCm2y/TzOvK8f1+qMr r+tYAk4+zOXTlxsDWausQnwuFyjEqz65zhIQr61hTiQtIdomxbkmhyZOG19gWl/P2r4v 6XArHvmB149aU7Rrm2Y+ObUMnMqGucCpd2zmQ02COz9yXmM/SW8fIRbuB7H70uTB4zyF 7OOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700722651; x=1701327451; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9WCHkg8EwYgV8K7NhoHBdWRiWjVGGiMNhMP8Dvq/T3c=; b=un5G1hDrM5kW/3HJ2o5OXl8i5maQmnUJDvc3A6sTyGjJQhu0wPwJg9zX/xSuppsvRX Lr6aJap/1Ca50byMmqWH6SmpfHLJy+pDmaZSCB+3td/WSES/b3xcX/jLkPvpi4PERJ2X C9MMTHZeiy4PwzoJUWMOiyG8k9RRAQIJUy1lIVxbvII1yuHE2q0UX7tHAlbzf7upr1Md qKqUMJk6+VAO+w98U3GvIHTDRCJuQ+U2Q3xhWXergTNKAun1un1UTvtBXbEQlDKmSkGv Jt95t2DG7GNUgr75GnMKx0EtvGITyksDlDj3pHktrOqileV6Con04NuR+PrQrAzt7USZ dQ/g== X-Gm-Message-State: AOJu0YxUo+Zw7b7fCZRrup+hGv85nKijQZocXzECBB4FvXcY7iyenx+S c8YAIyazYa41dymIBOcXuCAHtg== X-Google-Smtp-Source: AGHT+IF+JUV2qs3GxZcMJ8mDN8tfMhVpkCkyXhPxn9II1VI5nQr9SgqkEEhOwu5qk/0BmZa46AiHEg== X-Received: by 2002:a05:6808:2096:b0:3b2:e461:bcfc with SMTP id s22-20020a056808209600b003b2e461bcfcmr5671424oiw.20.1700722651004; Wed, 22 Nov 2023 22:57:31 -0800 (PST) Received: from J9GPGXL7NT.bytedance.net ([139.177.225.230]) by smtp.gmail.com with ESMTPSA id w37-20020a634765000000b005bd2b3a03eesm615437pgk.6.2023.11.22.22.57.26 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 22 Nov 2023 22:57:30 -0800 (PST) From: Xu Lu To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, ardb@kernel.org, anup@brainfault.org, atishp@atishpatra.org Cc: dengliang.1214@bytedance.com, xieyongji@bytedance.com, lihangjing@bytedance.com, songmuchun@bytedance.com, punit.agrawal@bytedance.com, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, Xu Lu Subject: [RFC PATCH V1 01/11] mm: Fix misused APIs on huge pte Date: Thu, 23 Nov 2023 14:56:58 +0800 Message-Id: <20231123065708.91345-2-luxu.kernel@bytedance.com> X-Mailer: git-send-email 2.39.3 (Apple Git-145) In-Reply-To: <20231123065708.91345-1-luxu.kernel@bytedance.com> References: <20231123065708.91345-1-luxu.kernel@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" There exist some paths that try to get value of huge pte via normal pte API ptep_get instead of huge pte API huge_ptep_get. This commit corrects these misused APIs. Signed-off-by: Xu Lu --- arch/riscv/mm/hugetlbpage.c | 2 +- fs/proc/task_mmu.c | 2 +- include/asm-generic/hugetlb.h | 7 +++++++ mm/hugetlb.c | 2 +- mm/migrate.c | 5 ++++- mm/mprotect.c | 2 +- mm/rmap.c | 10 ++++++++-- mm/vmalloc.c | 3 ++- 8 files changed, 25 insertions(+), 8 deletions(-) diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c index b52f0210481f..d7cf8e2d3c5b 100644 --- a/arch/riscv/mm/hugetlbpage.c +++ b/arch/riscv/mm/hugetlbpage.c @@ -74,7 +74,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, =20 out: if (pte) { - pte_t pteval =3D ptep_get_lockless(pte); + pte_t pteval =3D huge_ptep_get_lockless(pte); =20 WARN_ON_ONCE(pte_present(pteval) && !pte_huge(pteval)); } diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index ef2eb12906da..0fe9d23aa062 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -726,7 +726,7 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned lon= g hmask, struct mem_size_stats *mss =3D walk->private; struct vm_area_struct *vma =3D walk->vma; struct page *page =3D NULL; - pte_t ptent =3D ptep_get(pte); + pte_t ptent =3D huge_ptep_get(pte); =20 if (pte_present(ptent)) { page =3D vm_normal_page(vma, addr, ptent); diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h index 6dcf4d576970..52c299db971a 100644 --- a/include/asm-generic/hugetlb.h +++ b/include/asm-generic/hugetlb.h @@ -150,6 +150,13 @@ static inline pte_t huge_ptep_get(pte_t *ptep) } #endif =20 +#ifndef __HAVE_ARCH_HUGE_PTEP_GET_LOCKLESS +static inline pte_t huge_ptep_get_lockless(pte_t *ptep) +{ + return huge_ptep_get(ptep); +} +#endif + #ifndef __HAVE_ARCH_GIGANTIC_PAGE_RUNTIME_SUPPORTED static inline bool gigantic_page_runtime_supported(void) { diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 1169ef2f2176..9f773eb95b3b 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -7406,7 +7406,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm= _area_struct *vma, } =20 if (pte) { - pte_t pteval =3D ptep_get_lockless(pte); + pte_t pteval =3D huge_ptep_get_lockless(pte); =20 BUG_ON(pte_present(pteval) && !pte_huge(pteval)); } diff --git a/mm/migrate.c b/mm/migrate.c index 35a88334bb3c..d0daf58e486e 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -210,7 +210,10 @@ static bool remove_migration_pte(struct folio *folio, =20 folio_get(folio); pte =3D mk_pte(new, READ_ONCE(vma->vm_page_prot)); - old_pte =3D ptep_get(pvmw.pte); + if (folio_test_hugetlb(folio)) + old_pte =3D huge_ptep_get(pvmw.pte); + else + old_pte =3D ptep_get(pvmw.pte); if (pte_swp_soft_dirty(old_pte)) pte =3D pte_mksoft_dirty(pte); =20 diff --git a/mm/mprotect.c b/mm/mprotect.c index 81991102f785..b9129c03f451 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -555,7 +555,7 @@ static int prot_none_hugetlb_entry(pte_t *pte, unsigned= long hmask, unsigned long addr, unsigned long next, struct mm_walk *walk) { - return pfn_modify_allowed(pte_pfn(ptep_get(pte)), + return pfn_modify_allowed(pte_pfn(huge_ptep_get(pte)), *(pgprot_t *)(walk->private)) ? 0 : -EACCES; } diff --git a/mm/rmap.c b/mm/rmap.c index 7a27a2b41802..d93c6dabbdf4 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1577,7 +1577,10 @@ static bool try_to_unmap_one(struct folio *folio, st= ruct vm_area_struct *vma, break; } =20 - pfn =3D pte_pfn(ptep_get(pvmw.pte)); + if (folio_test_hugetlb(folio)) + pfn =3D pte_pfn(huge_ptep_get(pvmw.pte)); + else + pfn =3D pte_pfn(ptep_get(pvmw.pte)); subpage =3D folio_page(folio, pfn - folio_pfn(folio)); address =3D pvmw.address; anon_exclusive =3D folio_test_anon(folio) && @@ -1931,7 +1934,10 @@ static bool try_to_migrate_one(struct folio *folio, = struct vm_area_struct *vma, /* Unexpected PMD-mapped THP? */ VM_BUG_ON_FOLIO(!pvmw.pte, folio); =20 - pfn =3D pte_pfn(ptep_get(pvmw.pte)); + if (folio_test_hugetlb(folio)) + pfn =3D pte_pfn(huge_ptep_get(pvmw.pte)); + else + pfn =3D pte_pfn(ptep_get(pvmw.pte)); =20 if (folio_is_zone_device(folio)) { /* diff --git a/mm/vmalloc.c b/mm/vmalloc.c index d12a17fc0c17..1a451b82a7ac 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -103,7 +103,6 @@ static int vmap_pte_range(pmd_t *pmd, unsigned long add= r, unsigned long end, if (!pte) return -ENOMEM; do { - BUG_ON(!pte_none(ptep_get(pte))); =20 #ifdef CONFIG_HUGETLB_PAGE size =3D arch_vmap_pte_range_map_size(addr, end, pfn, max_page_shift); @@ -111,11 +110,13 @@ static int vmap_pte_range(pmd_t *pmd, unsigned long a= ddr, unsigned long end, pte_t entry =3D pfn_pte(pfn, prot); =20 entry =3D arch_make_huge_pte(entry, ilog2(size), 0); + BUG_ON(!pte_none(huge_ptep_get(pte))); set_huge_pte_at(&init_mm, addr, pte, entry, size); pfn +=3D PFN_DOWN(size); continue; } #endif + BUG_ON(!pte_none(ptep_get(pte))); set_pte_at(&init_mm, addr, pte, pfn_pte(pfn, prot)); pfn++; } while (pte +=3D PFN_DOWN(size), addr +=3D size, addr !=3D end); --=20 2.20.1 From nobody Mon Dec 29 20:12:48 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C385EC5AD4C for ; Thu, 23 Nov 2023 06:58:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235274AbjKWG6H (ORCPT ); Thu, 23 Nov 2023 01:58:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34454 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233910AbjKWG5w (ORCPT ); Thu, 23 Nov 2023 01:57:52 -0500 Received: from mail-ot1-x32f.google.com (mail-ot1-x32f.google.com [IPv6:2607:f8b0:4864:20::32f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 19721C1 for ; Wed, 22 Nov 2023 22:57:37 -0800 (PST) Received: by mail-ot1-x32f.google.com with SMTP id 46e09a7af769-6cd1918afb2so354593a34.0 for ; Wed, 22 Nov 2023 22:57:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1700722656; x=1701327456; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=x1SlIwZQZIPlzGLzwkEo+aJE2Wdd8/z5GIv+0pytAfk=; b=EriVfF3GALqZWQpXcdzO5RGhDIp/1dj8zic4vod/3EWriJKS99Fv59zri3r68B4p+a Wwm+CEmYJXS6n6O9ibbYCIovzeEXaSxPELtzpUzxvpdgNOYgCyipwq/hCZgSdVwIvHrE EpswHvlGK3TKtJWfyXOPbbjf1sdmOOSBAJ1UVEFUs6HWpzUc4Ff0/eaCa92pAmdGC1ip vAF3XkiR7v/QiKTJ/2zHwGnjhxixTRnGpD4F+AQbZdYay9/b1so+HCs0wvMJgKjm0nUn 5NfvHAYJEpxUcKZ6zgUHzd6Hw6L9I7DX1a4Lc6179Gsq4+W15qrH5x5zNPVQ1O5AVY00 bFTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700722656; x=1701327456; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=x1SlIwZQZIPlzGLzwkEo+aJE2Wdd8/z5GIv+0pytAfk=; b=YH9kmusGu8qgzNog0L2p5wwhv+6M9WR8CTxsQvnKMq5XkcOfPH+itRjzfDMJPpjg1U KS6bYkCN1XOcJlHdyn+ZsQJCaDRp+aeMXiIyjKoYN2zZoxfaU6tQOU9xT08z8c9Dk1uX qIpRxAgRwOSN2UfkNyTCrJE0hUbhqJ5lm5bJvQHWPZbJcPT42e3b3GAue58dDsXv9+8h Bl66ej6nbz+f8jdtZypcGVU1itrkA6EH44IDnGDBatuJk1h78nhH5edDK7Eztg8chGEj WaKZNvXxMG5LdqgNkLnzjvNBVvor+BEc26NtTlxtXBRfO9V5EgmqMtvTpvuMwEGPaXpN l73w== X-Gm-Message-State: AOJu0YwHDwynhWikWgyeNK/NvwUxdAZxPH76rb/u737Lw4EKFRd9r31S F2sqUiqyvc/eCkkja5hTAjKE3w== X-Google-Smtp-Source: AGHT+IEzA3h3BSIYMfAOV+OKk+4/Lh1uORa+ogLGTQQeOgfArkr2f4ATFwv21Gu8GGvcH6ELupclgg== X-Received: by 2002:a9d:6481:0:b0:6d7:f02f:dcc3 with SMTP id g1-20020a9d6481000000b006d7f02fdcc3mr3690469otl.28.1700722656378; Wed, 22 Nov 2023 22:57:36 -0800 (PST) Received: from J9GPGXL7NT.bytedance.net ([139.177.225.230]) by smtp.gmail.com with ESMTPSA id w37-20020a634765000000b005bd2b3a03eesm615437pgk.6.2023.11.22.22.57.31 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 22 Nov 2023 22:57:35 -0800 (PST) From: Xu Lu To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, ardb@kernel.org, anup@brainfault.org, atishp@atishpatra.org Cc: dengliang.1214@bytedance.com, xieyongji@bytedance.com, lihangjing@bytedance.com, songmuchun@bytedance.com, punit.agrawal@bytedance.com, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, Xu Lu Subject: [RFC PATCH V1 02/11] riscv: Introduce concept of hardware base page Date: Thu, 23 Nov 2023 14:56:59 +0800 Message-Id: <20231123065708.91345-3-luxu.kernel@bytedance.com> X-Mailer: git-send-email 2.39.3 (Apple Git-145) In-Reply-To: <20231123065708.91345-1-luxu.kernel@bytedance.com> References: <20231123065708.91345-1-luxu.kernel@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The key idea to implement larger base page based on MMU that only supports 4K page is to decouple the MMU page from the software page in view of kernel mm. In contrary to software page, we denote the MMU page as hardware page. To decouple these two kinds of pages, we should manage, allocate and map memory at a granularity of software page, which is exactly what existing mm code does. The page table operations, however, should configure page table entries at a granularity of hardware page, which is the responsibility of arch code. This commit introduces the concept of hardware base page for RISCV. Signed-off-by: Xu Lu --- arch/riscv/Kconfig | 8 ++++++++ arch/riscv/include/asm/page.h | 6 +++++- 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 95a2a06acc6a..105cbb3ca797 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -221,6 +221,14 @@ config PAGE_OFFSET default 0x80000000 if !MMU default 0xff60000000000000 if 64BIT =20 +config RISCV_HW_PAGE_SHIFT + int + default 12 + +config RISCV_PAGE_SHIFT + int + default 12 + config KASAN_SHADOW_OFFSET hex depends on KASAN_GENERIC diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h index 57e887bfa34c..a8c59d80683c 100644 --- a/arch/riscv/include/asm/page.h +++ b/arch/riscv/include/asm/page.h @@ -12,7 +12,11 @@ #include #include =20 -#define PAGE_SHIFT (12) +#define HW_PAGE_SHIFT CONFIG_RISCV_HW_PAGE_SHIFT +#define HW_PAGE_SIZE (_AC(1, UL) << HW_PAGE_SHIFT) +#define HW_PAGE_MASK (~(HW_PAGE_SIZE - 1)) + +#define PAGE_SHIFT CONFIG_RISCV_PAGE_SHIFT #define PAGE_SIZE (_AC(1, UL) << PAGE_SHIFT) #define PAGE_MASK (~(PAGE_SIZE - 1)) =20 --=20 2.20.1 From nobody Mon Dec 29 20:12:48 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6095BC61D97 for ; Thu, 23 Nov 2023 06:57:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344801AbjKWG5i (ORCPT ); Thu, 23 Nov 2023 01:57:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40736 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231127AbjKWG5g (ORCPT ); Thu, 23 Nov 2023 01:57:36 -0500 Received: from mail-pf1-x429.google.com (mail-pf1-x429.google.com [IPv6:2607:f8b0:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 814ECCB for ; Wed, 22 Nov 2023 22:57:42 -0800 (PST) Received: by mail-pf1-x429.google.com with SMTP id d2e1a72fcca58-6c4eaa5202aso586254b3a.1 for ; Wed, 22 Nov 2023 22:57:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1700722662; x=1701327462; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=1pQabAU2EIOA7IYsCP/RDHpXwqUYYVOXCKbRpkKm/sI=; b=imvVlEkdCHGINEp9oXrxfmOPgVlOX3AXWYNxSz68dgmgiyArycZV+9roGOvIPb2kpC D+d8mVzRUKBSTUFODOdCkT08hXK+w+fDLQYGhAJKy3eL9Z+u2DTrd2TcqLqj8ZXaJXpW 1wA5GkoceiUfTtZ2l94AuXxjoszexvk04pHf6uVqV1KRHzMOObx4D7wOnfMBEIkE7xdK fFITJhkS97kG7RlwB/qAzB+ZVAPFAFx6kLpP8cw0IoIHIOe6AANH0mC0SVQorx6sOTM2 mJfASCupZurV/P/byguK7W1D8xH2HRmX408W4zPhSHDjc3n/HJEQh3kKVZskPkvEKfb7 KK4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700722662; x=1701327462; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1pQabAU2EIOA7IYsCP/RDHpXwqUYYVOXCKbRpkKm/sI=; b=mhZ4pn/fgCRvmuEcwbEwjuUYqog2ceC+O+R0uqVsBFoFx6ov1h10eDal7Pdp/HUYlG zxNjCWgeZ5EBc9ezxFUPNyD+V02z9/6MCHX7WW0hXZzMQevd7us2pvz/o9bcf5g2QSig jmD6fAm1fUOOaw5wEV5XKS7assEyinfXYoRLHdzesnLpxxkJXq64wRxrq/ZBjU8HMjRI 2XIWV2Ad15DsuNpWznLgSkLYddqtnfY4xrBZsxEx+YWotnubwXXCt6zKujH4BswMEURj iKqYKci94it9sVYQWLCs3GRn1EFndt+lD7YDwgg/z7dj8P40treW3PLyIgF8aGLmjlM+ iSsw== X-Gm-Message-State: AOJu0YzBmCmG7SMMyV91I2wmt8qk3Dh1caqDkunv7HIoNPR6MyDr1gDW KvJB3WZ69Xb9SGUe1zl8Z8BopQ== X-Google-Smtp-Source: AGHT+IEjrB3TX75kVy8EQWnz59Mr28Ta2QXhohdgO7PmW2QllvYs+v4WN85i+Msea7MqDkKVKQnXQg== X-Received: by 2002:a05:6a00:8d97:b0:68f:bb02:fdf with SMTP id im23-20020a056a008d9700b0068fbb020fdfmr5401752pfb.27.1700722661962; Wed, 22 Nov 2023 22:57:41 -0800 (PST) Received: from J9GPGXL7NT.bytedance.net ([139.177.225.230]) by smtp.gmail.com with ESMTPSA id w37-20020a634765000000b005bd2b3a03eesm615437pgk.6.2023.11.22.22.57.36 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 22 Nov 2023 22:57:41 -0800 (PST) From: Xu Lu To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, ardb@kernel.org, anup@brainfault.org, atishp@atishpatra.org Cc: dengliang.1214@bytedance.com, xieyongji@bytedance.com, lihangjing@bytedance.com, songmuchun@bytedance.com, punit.agrawal@bytedance.com, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, Xu Lu Subject: [RFC PATCH V1 03/11] riscv: Adapt pte struct to gap between hw page and sw page Date: Thu, 23 Nov 2023 14:57:00 +0800 Message-Id: <20231123065708.91345-4-luxu.kernel@bytedance.com> X-Mailer: git-send-email 2.39.3 (Apple Git-145) In-Reply-To: <20231123065708.91345-1-luxu.kernel@bytedance.com> References: <20231123065708.91345-1-luxu.kernel@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The pte_t struct maps a virtual page to a physical page, both of which are the concept of software page in view of kernel mm. In contrary, per page table entry refers to a single hardware page. When sw page is larger than hw page, the existing pte_t struct with only one page table entry can not represent a software page anymore. This commit extends pte_t struct to contain an array of page table entries. The pte_t struct now maps a software page to an exact number of hardware pages, the total size of which still matches sw page size. Signed-off-by: Xu Lu --- arch/riscv/include/asm/page.h | 7 ++- arch/riscv/include/asm/pgtable-64.h | 3 +- arch/riscv/include/asm/pgtable.h | 91 ++++++++++++++++++++++++++--- arch/riscv/kernel/efi.c | 2 +- arch/riscv/mm/hugetlbpage.c | 2 +- arch/riscv/mm/pageattr.c | 2 +- 6 files changed, 92 insertions(+), 15 deletions(-) diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h index a8c59d80683c..cbaa7e027f9a 100644 --- a/arch/riscv/include/asm/page.h +++ b/arch/riscv/include/asm/page.h @@ -68,9 +68,11 @@ typedef struct { unsigned long pgd; } pgd_t; =20 +#define PTES_PER_PAGE (1 << (PAGE_SHIFT - HW_PAGE_SHIFT)) + /* Page Table entry */ typedef struct { - unsigned long pte; + unsigned long ptes[PTES_PER_PAGE]; } pte_t; =20 typedef struct { @@ -79,11 +81,10 @@ typedef struct { =20 typedef struct page *pgtable_t; =20 -#define pte_val(x) ((x).pte) +#define pte_val(x) ((x).ptes[0]) #define pgd_val(x) ((x).pgd) #define pgprot_val(x) ((x).pgprot) =20 -#define __pte(x) ((pte_t) { (x) }) #define __pgd(x) ((pgd_t) { (x) }) #define __pgprot(x) ((pgprot_t) { (x) }) =20 diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/p= gtable-64.h index 9a2c780a11e9..c08db54594a9 100644 --- a/arch/riscv/include/asm/pgtable-64.h +++ b/arch/riscv/include/asm/pgtable-64.h @@ -99,7 +99,8 @@ enum napot_cont_order { #define for_each_napot_order_rev(order) \ for (order =3D NAPOT_ORDER_MAX - 1; \ order >=3D NAPOT_CONT_ORDER_BASE; order--) -#define napot_cont_order(val) (__builtin_ctzl((val.pte >> _PAGE_PFN_SHIFT)= << 1)) +#define napot_cont_order(val) \ + (__builtin_ctzl((pte_val(val) >> _PAGE_PFN_SHIFT) << 1)) =20 #define napot_cont_shift(order) ((order) + PAGE_SHIFT) #define napot_cont_size(order) BIT(napot_cont_shift(order)) diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgta= ble.h index 294044429e8e..342be2112fd2 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -212,6 +212,31 @@ extern pgd_t swapper_pg_dir[]; extern pgd_t trampoline_pg_dir[]; extern pgd_t early_pg_dir[]; =20 +static __always_inline int __pte_present(unsigned long pteval) +{ + return (pteval & (_PAGE_PRESENT | _PAGE_PROT_NONE)); +} + +static __always_inline unsigned long __pte_napot(unsigned long pteval) +{ + return pteval & _PAGE_NAPOT; +} + +static inline pte_t __pte(unsigned long pteval) +{ + pte_t pte; + unsigned int i; + + for (i =3D 0; i < PTES_PER_PAGE; i++) { + pte.ptes[i] =3D pteval; + if (__pte_present(pteval) && !__pte_napot(pteval)) + pteval +=3D 1 << _PAGE_PFN_SHIFT; + } + + return pte; +} +#define __pte __pte + #ifdef CONFIG_TRANSPARENT_HUGEPAGE static inline int pmd_present(pmd_t pmd) { @@ -300,7 +325,7 @@ static __always_inline bool has_svnapot(void) =20 static inline unsigned long pte_napot(pte_t pte) { - return pte_val(pte) & _PAGE_NAPOT; + return __pte_napot(pte_val(pte)); } =20 static inline pte_t pte_mknapot(pte_t pte, unsigned int order) @@ -350,7 +375,7 @@ static inline pte_t pfn_pte(unsigned long pfn, pgprot_t= prot) =20 static inline int pte_present(pte_t pte) { - return (pte_val(pte) & (_PAGE_PRESENT | _PAGE_PROT_NONE)); + return __pte_present(pte_val(pte)); } =20 static inline int pte_none(pte_t pte) @@ -439,6 +464,36 @@ static inline pte_t pte_mkhuge(pte_t pte) return pte; } =20 +static inline pte_t ptep_get(pte_t *ptep) +{ + unsigned int i; + pte_t pte =3D *ptep; + + for (i =3D 0; i < PTES_PER_PAGE; i++) { + if (pte.ptes[i] & _PAGE_DIRTY) { + pte =3D pte_mkdirty(pte); + break; + } + } + for (i =3D 0; i < PTES_PER_PAGE; i++) { + if (pte.ptes[i] & _PAGE_ACCESSED) { + pte =3D pte_mkyoung(pte); + break; + } + } + + return pte; +} +#define ptep_get ptep_get + +static inline pte_t ptep_get_lockless(pte_t *ptep) +{ + unsigned long pteval =3D READ_ONCE(ptep->ptes[0]); + + return __pte(pteval); +} +#define ptep_get_lockless ptep_get_lockless + #ifdef CONFIG_NUMA_BALANCING /* * See the comment in include/asm-generic/pgtable.h @@ -526,6 +581,8 @@ static inline void __set_pte_at(pte_t *ptep, pte_t ptev= al) static inline void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pteval, unsigned int nr) { + unsigned int i; + page_table_check_ptes_set(mm, ptep, pteval, nr); =20 for (;;) { @@ -533,7 +590,10 @@ static inline void set_ptes(struct mm_struct *mm, unsi= gned long addr, if (--nr =3D=3D 0) break; ptep++; - pte_val(pteval) +=3D 1 << _PAGE_PFN_SHIFT; + if (pte_present(pteval) && !pte_napot(pteval)) { + for (i =3D 0; i < PTES_PER_PAGE; i++) + pteval.ptes[i] +=3D PTES_PER_PAGE << _PAGE_PFN_SHIFT; + } } } #define set_ptes set_ptes @@ -562,7 +622,11 @@ static inline int ptep_set_access_flags(struct vm_area= _struct *vma, static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long address, pte_t *ptep) { - pte_t pte =3D __pte(atomic_long_xchg((atomic_long_t *)ptep, 0)); + pte_t pte; + unsigned int i; + + for (i =3D 0; i < PTES_PER_PAGE; i++) + pte.ptes[i] =3D atomic_long_xchg((atomic_long_t *)(&ptep->ptes[i]), 0); =20 page_table_check_pte_clear(mm, pte); =20 @@ -574,16 +638,27 @@ static inline int ptep_test_and_clear_young(struct vm= _area_struct *vma, unsigned long address, pte_t *ptep) { - if (!pte_young(*ptep)) + int ret =3D 0; + unsigned int i; + + if (!pte_young(ptep_get(ptep))) return 0; - return test_and_clear_bit(_PAGE_ACCESSED_OFFSET, &pte_val(*ptep)); + + for (i =3D 0; i < PTES_PER_PAGE; i++) + ret |=3D test_and_clear_bit(_PAGE_ACCESSED_OFFSET, &ptep->ptes[i]); + + return ret; } =20 #define __HAVE_ARCH_PTEP_SET_WRPROTECT static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long address, pte_t *ptep) { - atomic_long_and(~(unsigned long)_PAGE_WRITE, (atomic_long_t *)ptep); + unsigned int i; + + for (i =3D 0; i < PTES_PER_PAGE; i++) + atomic_long_and(~(unsigned long)_PAGE_WRITE, + (atomic_long_t *)(&ptep->ptes[i])); } =20 #define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH @@ -829,7 +904,7 @@ extern pmd_t pmdp_collapse_flush(struct vm_area_struct = *vma, ((offset) << __SWP_OFFSET_SHIFT) }) =20 #define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) }) -#define __swp_entry_to_pte(x) ((pte_t) { (x).val }) +#define __swp_entry_to_pte(x) __pte((x).val) =20 static inline int pte_swp_exclusive(pte_t pte) { diff --git a/arch/riscv/kernel/efi.c b/arch/riscv/kernel/efi.c index aa6209a74c83..b64bf1624a05 100644 --- a/arch/riscv/kernel/efi.c +++ b/arch/riscv/kernel/efi.c @@ -60,7 +60,7 @@ int __init efi_create_mapping(struct mm_struct *mm, efi_m= emory_desc_t *md) static int __init set_permissions(pte_t *ptep, unsigned long addr, void *d= ata) { efi_memory_desc_t *md =3D data; - pte_t pte =3D READ_ONCE(*ptep); + pte_t pte =3D ptep_get(ptep); unsigned long val; =20 if (md->attribute & EFI_MEMORY_RO) { diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c index d7cf8e2d3c5b..67fd71c36853 100644 --- a/arch/riscv/mm/hugetlbpage.c +++ b/arch/riscv/mm/hugetlbpage.c @@ -293,7 +293,7 @@ void huge_pte_clear(struct mm_struct *mm, pte_t *ptep, unsigned long sz) { - pte_t pte =3D READ_ONCE(*ptep); + pte_t pte =3D ptep_get(ptep); int i, pte_num; =20 if (!pte_napot(pte)) { diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c index fc5fc4f785c4..b8e30df2e7df 100644 --- a/arch/riscv/mm/pageattr.c +++ b/arch/riscv/mm/pageattr.c @@ -68,7 +68,7 @@ static int pageattr_pmd_entry(pmd_t *pmd, unsigned long a= ddr, static int pageattr_pte_entry(pte_t *pte, unsigned long addr, unsigned long next, struct mm_walk *walk) { - pte_t val =3D READ_ONCE(*pte); + pte_t val =3D ptep_get(pte); =20 val =3D __pte(set_pageattr_masks(pte_val(val), walk)); set_pte(pte, val); --=20 2.20.1 From nobody Mon Dec 29 20:12:48 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A021FC5AD4C for ; Thu, 23 Nov 2023 06:58:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344841AbjKWG6R (ORCPT ); Thu, 23 Nov 2023 01:58:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40736 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234118AbjKWG6E (ORCPT ); Thu, 23 Nov 2023 01:58:04 -0500 Received: from mail-pf1-x433.google.com (mail-pf1-x433.google.com [IPv6:2607:f8b0:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D7D84D4F for ; Wed, 22 Nov 2023 22:57:47 -0800 (PST) Received: by mail-pf1-x433.google.com with SMTP id d2e1a72fcca58-6cb66fbc63dso448122b3a.0 for ; Wed, 22 Nov 2023 22:57:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1700722667; x=1701327467; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=FqqTxLyku7Pb+ctpWN2ROyQtTus2hT3Qjo5g/GoNKE8=; b=bSdmBoyatsC70jP2qVbTwOQEQL5FewEnZDj5ewfzd9O+GYmgfcwMO4DfWW4LsEdzpE 17S5rtxQYklrnmejg5nIPAVqbBQ3fbhJqunPlzyzQATDDArIyAjgA+Qrhc0LOqO3ODPL d5EVLAhSnTYO0D5cXmlw1cKJxgnQXKoi37bwGcQcIzfZALrt/tv9gFh/lbIzXBqjW6LG kD67MbGfDZpw30db3aHHcyyQg+Dcr71paYHrCxc1KwFKnbE/t0tI8C7JndgqRG8ntx6I RWa/o8UQA/GEXt8Pu6NAP5s72r0nNR6D5FD1uCrfBo5jgZGQjMg/8mtcopHFw5u0Dq47 Wa1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700722667; x=1701327467; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FqqTxLyku7Pb+ctpWN2ROyQtTus2hT3Qjo5g/GoNKE8=; b=VGvfpned0xVJ1AOONUKNnK0qYPPxDiwaylC5wzFudF9uPoRcs/3gIARmE2W07hmOvv CMckJ/lZsqwI74RBTmu6VvqV0waIMfK5zo58JfPNj4pQ30qZZ7lgSJxorUh2GhjskmgA JcMun9nEgoIcGoVA7+/YckGnroRLbBvhK9NXnUlSQIP2ZsaxA1R+rOm4e1sVujNqh1Ax sruhb4aZx33IyI/JBIp1tbxpi8qyWVxAt6Dmp7P44nhGh7OrN6GhKO3qpTvehBJni/4S /8KdjWmsrGF8+DRVqcfVGHtsIufJbNzqUcyxHLidjOd/ur/wQUzxOIs+iI1DK+mEgxdx IkCQ== X-Gm-Message-State: AOJu0YwoxvPAEgJuDVqM/fyysxMRWVvO+REyQEvawKHu3XDu0UtA/6ug 9SpELMqlq+qaEvATAgcoReV8Aw== X-Google-Smtp-Source: AGHT+IFPkQpg+ey3ertHh/TcNBgUDX7fnhdQHqcFtkkyI3tzskgpcz/a4p2oxDN3tA6VRXfdP6SYzg== X-Received: by 2002:a05:6a00:a11:b0:68f:c078:b0c9 with SMTP id p17-20020a056a000a1100b0068fc078b0c9mr3267435pfh.11.1700722667294; Wed, 22 Nov 2023 22:57:47 -0800 (PST) Received: from J9GPGXL7NT.bytedance.net ([139.177.225.230]) by smtp.gmail.com with ESMTPSA id w37-20020a634765000000b005bd2b3a03eesm615437pgk.6.2023.11.22.22.57.42 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 22 Nov 2023 22:57:46 -0800 (PST) From: Xu Lu To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, ardb@kernel.org, anup@brainfault.org, atishp@atishpatra.org Cc: dengliang.1214@bytedance.com, xieyongji@bytedance.com, lihangjing@bytedance.com, songmuchun@bytedance.com, punit.agrawal@bytedance.com, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, Xu Lu Subject: [RFC PATCH V1 04/11] riscv: Adapt pte operations to gap between hw page and sw page Date: Thu, 23 Nov 2023 14:57:01 +0800 Message-Id: <20231123065708.91345-5-luxu.kernel@bytedance.com> X-Mailer: git-send-email 2.39.3 (Apple Git-145) In-Reply-To: <20231123065708.91345-1-luxu.kernel@bytedance.com> References: <20231123065708.91345-1-luxu.kernel@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" MMU handles pages at a granularity of hardware page. That is, for 4K MMU, the pfn decoded from page table entry will be regarded as 4K page frame number, no matter how large the software page is. Thus, page table entries should always be encoded at the granularity of hardware page. This commit makes pte operations aware of the gap between hw page and sw page. All pte operations now configure page table entries via hardware page frame number. Signed-off-by: Xu Lu --- arch/riscv/include/asm/page.h | 3 +++ arch/riscv/include/asm/pgalloc.h | 21 ++++++++++----- arch/riscv/include/asm/pgtable-32.h | 2 +- arch/riscv/include/asm/pgtable-64.h | 40 ++++++++++++++++++----------- arch/riscv/include/asm/pgtable.h | 19 +++++++------- arch/riscv/mm/init.c | 18 ++++++------- 6 files changed, 62 insertions(+), 41 deletions(-) diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h index cbaa7e027f9a..12f2e73ed55b 100644 --- a/arch/riscv/include/asm/page.h +++ b/arch/riscv/include/asm/page.h @@ -177,6 +177,9 @@ extern phys_addr_t __phys_addr_symbol(unsigned long x); #define __pa(x) __virt_to_phys((unsigned long)(x)) #define __va(x) ((void *)__pa_to_va_nodebug((phys_addr_t)(x))) =20 +#define pfn_to_hwpfn(pfn) (pfn << (PAGE_SHIFT - HW_PAGE_SHIFT)) +#define hwpfn_to_pfn(hwpfn) (hwpfn >> (PAGE_SHIFT - HW_PAGE_SHIFT)) + #define phys_to_pfn(phys) (PFN_DOWN(phys)) #define pfn_to_phys(pfn) (PFN_PHYS(pfn)) =20 diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgal= loc.h index d169a4f41a2e..eab75d5f7093 100644 --- a/arch/riscv/include/asm/pgalloc.h +++ b/arch/riscv/include/asm/pgalloc.h @@ -19,32 +19,36 @@ static inline void pmd_populate_kernel(struct mm_struct= *mm, pmd_t *pmd, pte_t *pte) { unsigned long pfn =3D virt_to_pfn(pte); + unsigned long hwpfn =3D pfn_to_hwpfn(pfn); =20 - set_pmd(pmd, __pmd((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE)); + set_pmd(pmd, __pmd((hwpfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE)); } =20 static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd, pgtable_t pte) { unsigned long pfn =3D virt_to_pfn(page_address(pte)); + unsigned long hwpfn =3D pfn_to_hwpfn(pfn); =20 - set_pmd(pmd, __pmd((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE)); + set_pmd(pmd, __pmd((hwpfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE)); } =20 #ifndef __PAGETABLE_PMD_FOLDED static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *p= md) { unsigned long pfn =3D virt_to_pfn(pmd); + unsigned long hwpfn =3D pfn_to_hwpfn(pfn); =20 - set_pud(pud, __pud((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE)); + set_pud(pud, __pud((hwpfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE)); } =20 static inline void p4d_populate(struct mm_struct *mm, p4d_t *p4d, pud_t *p= ud) { if (pgtable_l4_enabled) { unsigned long pfn =3D virt_to_pfn(pud); + unsigned long hwpfn =3D pfn_to_hwpfn(pfn); =20 - set_p4d(p4d, __p4d((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE)); + set_p4d(p4d, __p4d((hwpfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE)); } } =20 @@ -53,9 +57,10 @@ static inline void p4d_populate_safe(struct mm_struct *m= m, p4d_t *p4d, { if (pgtable_l4_enabled) { unsigned long pfn =3D virt_to_pfn(pud); + unsigned long hwpfn =3D pfn_to_hwpfn(pfn); =20 set_p4d_safe(p4d, - __p4d((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE)); + __p4d((hwpfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE)); } } =20 @@ -63,8 +68,9 @@ static inline void pgd_populate(struct mm_struct *mm, pgd= _t *pgd, p4d_t *p4d) { if (pgtable_l5_enabled) { unsigned long pfn =3D virt_to_pfn(p4d); + unsigned long hwpfn =3D pfn_to_hwpfn(pfn); =20 - set_pgd(pgd, __pgd((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE)); + set_pgd(pgd, __pgd((hwpfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE)); } } =20 @@ -73,9 +79,10 @@ static inline void pgd_populate_safe(struct mm_struct *m= m, pgd_t *pgd, { if (pgtable_l5_enabled) { unsigned long pfn =3D virt_to_pfn(p4d); + unsigned long hwpfn =3D pfn_to_hwpfn(pfn); =20 set_pgd_safe(pgd, - __pgd((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE)); + __pgd((hwpfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE)); } } =20 diff --git a/arch/riscv/include/asm/pgtable-32.h b/arch/riscv/include/asm/p= gtable-32.h index 00f3369570a8..dec436e146ae 100644 --- a/arch/riscv/include/asm/pgtable-32.h +++ b/arch/riscv/include/asm/pgtable-32.h @@ -20,7 +20,7 @@ /* * rv32 PTE format: * | XLEN-1 10 | 9 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 - * PFN reserved for SW D A G U X W R V + * HW_PFN reserved for SW D A G U X W R V */ #define _PAGE_PFN_MASK GENMASK(31, 10) =20 diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/p= gtable-64.h index c08db54594a9..1926727698fc 100644 --- a/arch/riscv/include/asm/pgtable-64.h +++ b/arch/riscv/include/asm/pgtable-64.h @@ -50,7 +50,7 @@ typedef struct { =20 #define p4d_val(x) ((x).p4d) #define __p4d(x) ((p4d_t) { (x) }) -#define PTRS_PER_P4D (PAGE_SIZE / sizeof(p4d_t)) +#define PTRS_PER_P4D (HW_PAGE_SIZE / sizeof(p4d_t)) =20 /* Page Upper Directory entry */ typedef struct { @@ -59,7 +59,7 @@ typedef struct { =20 #define pud_val(x) ((x).pud) #define __pud(x) ((pud_t) { (x) }) -#define PTRS_PER_PUD (PAGE_SIZE / sizeof(pud_t)) +#define PTRS_PER_PUD (HW_PAGE_SIZE / sizeof(pud_t)) =20 /* Page Middle Directory entry */ typedef struct { @@ -69,12 +69,12 @@ typedef struct { #define pmd_val(x) ((x).pmd) #define __pmd(x) ((pmd_t) { (x) }) =20 -#define PTRS_PER_PMD (PAGE_SIZE / sizeof(pmd_t)) +#define PTRS_PER_PMD (HW_PAGE_SIZE / sizeof(pmd_t)) =20 /* * rv64 PTE format: * | 63 | 62 61 | 60 54 | 53 10 | 9 8 | 7 | 6 | 5 | 4 | 3 | 2= | 1 | 0 - * N MT RSV PFN reserved for SW D A G U X W= R V + * N MT RSV HW_PFN reserved for SW D A G U X W= R V */ #define _PAGE_PFN_MASK GENMASK(53, 10) =20 @@ -94,13 +94,23 @@ enum napot_cont_order { NAPOT_ORDER_MAX, }; =20 -#define for_each_napot_order(order) \ - for (order =3D NAPOT_CONT_ORDER_BASE; order < NAPOT_ORDER_MAX; order++) -#define for_each_napot_order_rev(order) \ - for (order =3D NAPOT_ORDER_MAX - 1; \ - order >=3D NAPOT_CONT_ORDER_BASE; order--) -#define napot_cont_order(val) \ - (__builtin_ctzl((pte_val(val) >> _PAGE_PFN_SHIFT) << 1)) +#define NAPOT_PAGE_ORDER_BASE \ + ((NAPOT_CONT_ORDER_BASE >=3D (PAGE_SHIFT - HW_PAGE_SHIFT)) ? \ + (NAPOT_CONT_ORDER_BASE - (PAGE_SHIFT - HW_PAGE_SHIFT)) : 1) +#define NAPOT_PAGE_ORDER_MAX \ + ((NAPOT_ORDER_MAX > (PAGE_SHIFT - HW_PAGE_SHIFT)) ? \ + (NAPOT_ORDER_MAX - (PAGE_SHIFT - HW_PAGE_SHIFT)) : \ + NAPOT_PAGE_ORDER_BASE) + +#define for_each_napot_order(order) \ + for (order =3D NAPOT_PAGE_ORDER_BASE; \ + order < NAPOT_PAGE_ORDER_MAX; order++) +#define for_each_napot_order_rev(order) \ + for (order =3D NAPOT_PAGE_ORDER_MAX - 1; \ + order >=3D NAPOT_PAGE_ORDER_BASE; order--) +#define napot_cont_order(val) \ + (__builtin_ctzl((pte_val(val) >> _PAGE_PFN_SHIFT) << 1) \ + - (PAGE_SHIFT - HW_PAGE_SHIFT)) =20 #define napot_cont_shift(order) ((order) + PAGE_SHIFT) #define napot_cont_size(order) BIT(napot_cont_shift(order)) @@ -108,7 +118,7 @@ enum napot_cont_order { #define napot_pte_num(order) BIT(order) =20 #ifdef CONFIG_RISCV_ISA_SVNAPOT -#define HUGE_MAX_HSTATE (2 + (NAPOT_ORDER_MAX - NAPOT_CONT_ORDER_BASE)) +#define HUGE_MAX_HSTATE (2 + (NAPOT_ORDER_MAX - NAPOT_PAGE_ORDER_BASE)) #else #define HUGE_MAX_HSTATE 2 #endif @@ -213,7 +223,7 @@ static inline void pud_clear(pud_t *pudp) =20 static inline pud_t pfn_pud(unsigned long pfn, pgprot_t prot) { - return __pud((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot)); + return __pud((pfn_to_hwpfn(pfn) << _PAGE_PFN_SHIFT) | pgprot_val(prot)); } =20 static inline unsigned long _pud_pfn(pud_t pud) @@ -257,7 +267,7 @@ static inline pmd_t pfn_pmd(unsigned long pfn, pgprot_t= prot) =20 ALT_THEAD_PMA(prot_val); =20 - return __pmd((pfn << _PAGE_PFN_SHIFT) | prot_val); + return __pmd((pfn_to_hwpfn(pfn) << _PAGE_PFN_SHIFT) | prot_val); } =20 static inline unsigned long _pmd_pfn(pmd_t pmd) @@ -316,7 +326,7 @@ static inline void p4d_clear(p4d_t *p4d) =20 static inline p4d_t pfn_p4d(unsigned long pfn, pgprot_t prot) { - return __p4d((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot)); + return __p4d((pfn_to_hwpfn(pfn) << _PAGE_PFN_SHIFT) | pgprot_val(prot)); } =20 static inline unsigned long _p4d_pfn(p4d_t p4d) diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgta= ble.h index 342be2112fd2..d50c4588c1ed 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -26,9 +26,9 @@ #endif =20 /* Number of entries in the page global directory */ -#define PTRS_PER_PGD (PAGE_SIZE / sizeof(pgd_t)) +#define PTRS_PER_PGD (HW_PAGE_SIZE / sizeof(pgd_t)) /* Number of entries in the page table */ -#define PTRS_PER_PTE (PAGE_SIZE / sizeof(pte_t)) +#define PTRS_PER_PTE (HW_PAGE_SIZE / sizeof(pte_t)) =20 /* * Half of the kernel address space (1/4 of the entries of the page global @@ -118,7 +118,8 @@ #include #include =20 -#define __page_val_to_pfn(_val) (((_val) & _PAGE_PFN_MASK) >> _PAGE_PFN_S= HIFT) +#define __page_val_to_hwpfn(_val) (((_val) & _PAGE_PFN_MASK) >> _PAGE_PFN_= SHIFT) +#define __page_val_to_pfn(_val) hwpfn_to_pfn(__page_val_to_hwpfn(_val)) =20 #ifdef CONFIG_64BIT #include @@ -287,7 +288,7 @@ static inline pgd_t pfn_pgd(unsigned long pfn, pgprot_t= prot) =20 ALT_THEAD_PMA(prot_val); =20 - return __pgd((pfn << _PAGE_PFN_SHIFT) | prot_val); + return __pgd((pfn_to_hwpfn(pfn) << _PAGE_PFN_SHIFT) | prot_val); } =20 static inline unsigned long _pgd_pfn(pgd_t pgd) @@ -351,12 +352,12 @@ static inline unsigned long pte_napot(pte_t pte) /* Yields the page frame number (PFN) of a page table entry */ static inline unsigned long pte_pfn(pte_t pte) { - unsigned long res =3D __page_val_to_pfn(pte_val(pte)); + unsigned long res =3D __page_val_to_hwpfn(pte_val(pte)); =20 if (has_svnapot() && pte_napot(pte)) res =3D res & (res - 1UL); =20 - return res; + return hwpfn_to_pfn(res); } =20 #define pte_page(x) pfn_to_page(pte_pfn(x)) @@ -368,7 +369,7 @@ static inline pte_t pfn_pte(unsigned long pfn, pgprot_t= prot) =20 ALT_THEAD_PMA(prot_val); =20 - return __pte((pfn << _PAGE_PFN_SHIFT) | prot_val); + return __pte((pfn_to_hwpfn(pfn) << _PAGE_PFN_SHIFT) | prot_val); } =20 #define mk_pte(page, prot) pfn_pte(page_to_pfn(page), prot) @@ -723,14 +724,14 @@ static inline pmd_t pmd_mkinvalid(pmd_t pmd) return __pmd(pmd_val(pmd) & ~(_PAGE_PRESENT|_PAGE_PROT_NONE)); } =20 -#define __pmd_to_phys(pmd) (__page_val_to_pfn(pmd_val(pmd)) << PAGE_SHIFT) +#define __pmd_to_phys(pmd) (__page_val_to_hwpfn(pmd_val(pmd)) << HW_PAGE_= SHIFT) =20 static inline unsigned long pmd_pfn(pmd_t pmd) { return ((__pmd_to_phys(pmd) & PMD_MASK) >> PAGE_SHIFT); } =20 -#define __pud_to_phys(pud) (__page_val_to_pfn(pud_val(pud)) << PAGE_SHIFT) +#define __pud_to_phys(pud) (__page_val_to_hwpfn(pud_val(pud)) << HW_PAGE_= SHIFT) =20 static inline unsigned long pud_pfn(pud_t pud) { diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c index 2e011cbddf3a..a768b2b3ff05 100644 --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -466,7 +466,7 @@ static void __init create_pmd_mapping(pmd_t *pmdp, pte_phys =3D pt_ops.alloc_pte(va); pmdp[pmd_idx] =3D pfn_pmd(PFN_DOWN(pte_phys), PAGE_TABLE); ptep =3D pt_ops.get_pte_virt(pte_phys); - memset(ptep, 0, PAGE_SIZE); + memset(ptep, 0, PTRS_PER_PTE * sizeof(pte_t)); } else { pte_phys =3D PFN_PHYS(_pmd_pfn(pmdp[pmd_idx])); ptep =3D pt_ops.get_pte_virt(pte_phys); @@ -569,7 +569,7 @@ static void __init create_pud_mapping(pud_t *pudp, next_phys =3D pt_ops.alloc_pmd(va); pudp[pud_index] =3D pfn_pud(PFN_DOWN(next_phys), PAGE_TABLE); nextp =3D pt_ops.get_pmd_virt(next_phys); - memset(nextp, 0, PAGE_SIZE); + memset(nextp, 0, PTRS_PER_PMD * sizeof(pmd_t)); } else { next_phys =3D PFN_PHYS(_pud_pfn(pudp[pud_index])); nextp =3D pt_ops.get_pmd_virt(next_phys); @@ -596,7 +596,7 @@ static void __init create_p4d_mapping(p4d_t *p4dp, next_phys =3D pt_ops.alloc_pud(va); p4dp[p4d_index] =3D pfn_p4d(PFN_DOWN(next_phys), PAGE_TABLE); nextp =3D pt_ops.get_pud_virt(next_phys); - memset(nextp, 0, PAGE_SIZE); + memset(nextp, 0, PTRS_PER_PUD * sizeof(pud_t)); } else { next_phys =3D PFN_PHYS(_p4d_pfn(p4dp[p4d_index])); nextp =3D pt_ops.get_pud_virt(next_phys); @@ -654,7 +654,7 @@ void __init create_pgd_mapping(pgd_t *pgdp, next_phys =3D alloc_pgd_next(va); pgdp[pgd_idx] =3D pfn_pgd(PFN_DOWN(next_phys), PAGE_TABLE); nextp =3D get_pgd_next_virt(next_phys); - memset(nextp, 0, PAGE_SIZE); + memset(nextp, 0, PTRS_PER_P4D * sizeof(p4d_t)); } else { next_phys =3D PFN_PHYS(_pgd_pfn(pgdp[pgd_idx])); nextp =3D get_pgd_next_virt(next_phys); @@ -815,16 +815,16 @@ static __init void set_satp_mode(uintptr_t dtb_pa) if (hw_satp !=3D identity_satp) { if (pgtable_l5_enabled) { disable_pgtable_l5(); - memset(early_pg_dir, 0, PAGE_SIZE); + memset(early_pg_dir, 0, PTRS_PER_PGD * sizeof(pgd_t)); goto retry; } disable_pgtable_l4(); } =20 - memset(early_pg_dir, 0, PAGE_SIZE); - memset(early_p4d, 0, PAGE_SIZE); - memset(early_pud, 0, PAGE_SIZE); - memset(early_pmd, 0, PAGE_SIZE); + memset(early_pg_dir, 0, PTRS_PER_PGD * sizeof(pgd_t)); + memset(early_p4d, 0, PTRS_PER_P4D * sizeof(p4d_t)); + memset(early_pud, 0, PTRS_PER_PUD * sizeof(pud_t)); + memset(early_pmd, 0, PTRS_PER_PMD * sizeof(pmd_t)); } #endif =20 --=20 2.20.1 From nobody Mon Dec 29 20:12:48 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B82AEC5AD4C for ; Thu, 23 Nov 2023 06:58:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344838AbjKWG5y (ORCPT ); Thu, 23 Nov 2023 01:57:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48270 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234724AbjKWG5s (ORCPT ); Thu, 23 Nov 2023 01:57:48 -0500 Received: from mail-oi1-x231.google.com (mail-oi1-x231.google.com [IPv6:2607:f8b0:4864:20::231]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B6232B0 for ; Wed, 22 Nov 2023 22:57:53 -0800 (PST) Received: by mail-oi1-x231.google.com with SMTP id 5614622812f47-3b83fc26e4cso383936b6e.2 for ; Wed, 22 Nov 2023 22:57:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1700722673; x=1701327473; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=EBzEAwR1u2ognGkokbPIcjTus0Wt1Liku1PeiEAYDPA=; b=S/H2AlAr/1MUQFwrBdW46YqR5RbP5zbByBEvYryy9hz3u34Pv81eJKK+qi02859+jS DbQ9UIxxydshX0/q9N/kON4EYMyICgjCgMLAk7KSVHJ8sl9HvbYou44iyks68izHqW8B EgWOt6XoDLIdvtqp59Gz9Mcaz2YDO7TjGMyU5pn8LCZrON7MVhWPVgY2gGuPvqOnpoh9 NagoWjczCmci290KpkMSaChqhGGMvEXMTCxuuw9EW9u+/5FiU1rVZVzQhbneFHhMiy8j Av90HkcANyMvai646X2uoLV0lYa5Zv0T0ayvmckt3kC4+rQPqRE0ayPFIpJK3HmcrRxL Z+QQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700722673; x=1701327473; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=EBzEAwR1u2ognGkokbPIcjTus0Wt1Liku1PeiEAYDPA=; b=DNAWKjlMJIerkWtHpKDm+al3mB/Pnjm590MsprFSKAN+rrMmdPsT5DlSp985qCwg8q E7ZdhVkekKXdRS806Gd18NbC/+zuJFZz06QW64cFvvaDuIV1g5xPK+C4fVN1lCF84iKB uAl49DV/FU7y3dQvraCfSZf+Y6TGARaEQ63vo08XfMcuy3i3VhynZDN4bUSOTCij7fNQ DFExRb8Z237zA9hPv8FYqjLVkj6teaSJy01530QJjdwgMERAvvTCgnvuVxrd2L8gdvKF uRkCw7frTp+97KvfahCzioDLFqlNZhNSvWvATidutVSROrTQ3qWKYyX+ioRWIxJZAiKz XyqA== X-Gm-Message-State: AOJu0YyUdCz2YWeOTteUGnOk5vA+mJZ2X8+PhpSUUzpm3qOJOaEmbkiE shv+GcRv5959x+tdusE3Kg6rUw== X-Google-Smtp-Source: AGHT+IGHKuk0EJS7EQTCzidInz2QrgpsRuY4bxBytkNm9OOr3Q4F5OcNYuewzXuJd5Dzw7DfQuJfmg== X-Received: by 2002:a54:4885:0:b0:3af:b6d3:cda0 with SMTP id r5-20020a544885000000b003afb6d3cda0mr5487295oic.40.1700722673045; Wed, 22 Nov 2023 22:57:53 -0800 (PST) Received: from J9GPGXL7NT.bytedance.net ([139.177.225.230]) by smtp.gmail.com with ESMTPSA id w37-20020a634765000000b005bd2b3a03eesm615437pgk.6.2023.11.22.22.57.47 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 22 Nov 2023 22:57:52 -0800 (PST) From: Xu Lu To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, ardb@kernel.org, anup@brainfault.org, atishp@atishpatra.org Cc: dengliang.1214@bytedance.com, xieyongji@bytedance.com, lihangjing@bytedance.com, songmuchun@bytedance.com, punit.agrawal@bytedance.com, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, Xu Lu Subject: [RFC PATCH V1 05/11] riscv: Decouple pmd operations and pte operations Date: Thu, 23 Nov 2023 14:57:02 +0800 Message-Id: <20231123065708.91345-6-luxu.kernel@bytedance.com> X-Mailer: git-send-email 2.39.3 (Apple Git-145) In-Reply-To: <20231123065708.91345-1-luxu.kernel@bytedance.com> References: <20231123065708.91345-1-luxu.kernel@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Existing pmd operations are usually implemented via pte operations. For example, the pmd_mkdirty function, which is used to mark a pmd_t struct as dirty, will transfer pmd_t struct to pte_t struct via pmd_pte first, mark the generated pte_t as dirty then, and finally transfer it back to pmd_t struct via pte_pmd function. Such implementation introduces unnecessary overhead of struct transferring. Also, Now that pte_t struct is a number of page table entries, which can be larger than pmd_t struct, functions like set_pmd_at implemented via set_pte_at will cause write amplifications. This commit decouples pmd operations and pte operations. Pmd operations are now implemented independently of pte operations. Signed-off-by: Xu Lu --- arch/riscv/include/asm/pgtable-64.h | 6 ++ arch/riscv/include/asm/pgtable.h | 124 +++++++++++++++++++++------- include/asm-generic/pgtable-nopmd.h | 1 + include/linux/pgtable.h | 6 ++ 4 files changed, 108 insertions(+), 29 deletions(-) diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/p= gtable-64.h index 1926727698fc..95e785f2160c 100644 --- a/arch/riscv/include/asm/pgtable-64.h +++ b/arch/riscv/include/asm/pgtable-64.h @@ -206,6 +206,12 @@ static inline int pud_leaf(pud_t pud) return pud_present(pud) && (pud_val(pud) & _PAGE_LEAF); } =20 +#define pud_exec pud_exec +static inline int pud_exec(pud_t pud) +{ + return pud_val(pud) & _PAGE_EXEC; +} + static inline int pud_user(pud_t pud) { return pud_val(pud) & _PAGE_USER; diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgta= ble.h index d50c4588c1ed..9f81fe046cb8 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -272,6 +272,18 @@ static inline int pmd_leaf(pmd_t pmd) return pmd_present(pmd) && (pmd_val(pmd) & _PAGE_LEAF); } =20 +#define pmd_exec pmd_exec +static inline int pmd_exec(pmd_t pmd) +{ + return pmd_val(pmd) & _PAGE_EXEC; +} + +#define __HAVE_ARCH_PMD_SAME +static inline int pmd_same(pmd_t pmd_a, pmd_t pmd_b) +{ + return pmd_val(pmd_a) =3D=3D pmd_val(pmd_b); +} + static inline void set_pmd(pmd_t *pmdp, pmd_t pmd) { *pmdp =3D pmd; @@ -506,7 +518,7 @@ static inline int pte_protnone(pte_t pte) =20 static inline int pmd_protnone(pmd_t pmd) { - return pte_protnone(pmd_pte(pmd)); + return (pmd_val(pmd) & (_PAGE_PRESENT | _PAGE_PROT_NONE)) =3D=3D _PAGE_PR= OT_NONE; } #endif =20 @@ -740,73 +752,95 @@ static inline unsigned long pud_pfn(pud_t pud) =20 static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot) { - return pte_pmd(pte_modify(pmd_pte(pmd), newprot)); + unsigned long newprot_val =3D pgprot_val(newprot); + + ALT_THEAD_PMA(newprot_val); + + return __pmd((pmd_val(pmd) & _PAGE_CHG_MASK) | newprot_val); } =20 #define pmd_write pmd_write static inline int pmd_write(pmd_t pmd) { - return pte_write(pmd_pte(pmd)); + return pmd_val(pmd) & _PAGE_WRITE; } =20 static inline int pmd_dirty(pmd_t pmd) { - return pte_dirty(pmd_pte(pmd)); + return pmd_val(pmd) & _PAGE_DIRTY; } =20 #define pmd_young pmd_young static inline int pmd_young(pmd_t pmd) { - return pte_young(pmd_pte(pmd)); + return pmd_val(pmd) & _PAGE_ACCESSED; } =20 static inline int pmd_user(pmd_t pmd) { - return pte_user(pmd_pte(pmd)); + return pmd_val(pmd) & _PAGE_USER; } =20 static inline pmd_t pmd_mkold(pmd_t pmd) { - return pte_pmd(pte_mkold(pmd_pte(pmd))); + return __pmd(pmd_val(pmd) & ~(_PAGE_ACCESSED)); } =20 static inline pmd_t pmd_mkyoung(pmd_t pmd) { - return pte_pmd(pte_mkyoung(pmd_pte(pmd))); + return __pmd(pmd_val(pmd) | _PAGE_ACCESSED); } =20 static inline pmd_t pmd_mkwrite_novma(pmd_t pmd) { - return pte_pmd(pte_mkwrite_novma(pmd_pte(pmd))); + return __pmd(pmd_val(pmd) | _PAGE_WRITE); } =20 static inline pmd_t pmd_wrprotect(pmd_t pmd) { - return pte_pmd(pte_wrprotect(pmd_pte(pmd))); + return __pmd(pmd_val(pmd) & (~_PAGE_WRITE)); } =20 static inline pmd_t pmd_mkclean(pmd_t pmd) { - return pte_pmd(pte_mkclean(pmd_pte(pmd))); + return __pmd(pmd_val(pmd) & (~_PAGE_DIRTY)); } =20 static inline pmd_t pmd_mkdirty(pmd_t pmd) { - return pte_pmd(pte_mkdirty(pmd_pte(pmd))); + return __pmd(pmd_val(pmd) | _PAGE_DIRTY); +} + +#define pmd_accessible(mm, pmd) ((void)(pmd), 1) + +static inline void __set_pmd_at(pmd_t *pmdp, pmd_t pmd) +{ + if (pmd_present(pmd) && pmd_exec(pmd)) + flush_icache_pte(pmd_pte(pmd)); + + set_pmd(pmdp, pmd); } =20 static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp, pmd_t pmd) { page_table_check_pmd_set(mm, pmdp, pmd); - return __set_pte_at((pte_t *)pmdp, pmd_pte(pmd)); + return __set_pmd_at(pmdp, pmd); +} + +static inline void __set_pud_at(pud_t *pudp, pud_t pud) +{ + if (pud_present(pud) && pud_exec(pud)) + flush_icache_pte(pud_pte(pud)); + + set_pud(pudp, pud); } =20 static inline void set_pud_at(struct mm_struct *mm, unsigned long addr, pud_t *pudp, pud_t pud) { page_table_check_pud_set(mm, pudp, pud); - return __set_pte_at((pte_t *)pudp, pud_pte(pud)); + return __set_pud_at(pudp, pud); } =20 #ifdef CONFIG_PAGE_TABLE_CHECK @@ -826,25 +860,64 @@ static inline bool pud_user_accessible_page(pud_t pud) } #endif =20 -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -static inline int pmd_trans_huge(pmd_t pmd) -{ - return pmd_leaf(pmd); -} - #define __HAVE_ARCH_PMDP_SET_ACCESS_FLAGS static inline int pmdp_set_access_flags(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp, pmd_t entry, int dirty) { - return ptep_set_access_flags(vma, address, (pte_t *)pmdp, pmd_pte(entry),= dirty); + if (!pmd_same(*pmdp, entry)) + set_pmd_at(vma->vm_mm, address, pmdp, entry); + /* + * update_mmu_cache will unconditionally execute, handling both + * the case that the PMD changed and the spurious fault case. + */ + return true; +} + +#define __HAVE_ARCH_PMDP_GET_AND_CLEAR +static inline pmd_t pmdp_get_and_clear(struct mm_struct *mm, + unsigned long address, pmd_t *pmdp) +{ + pmd_t pmd =3D __pmd(atomic_long_xchg((atomic_long_t *)pmdp, 0)); + + page_table_check_pmd_clear(mm, pmd); + + return pmd; +} + +#define __HAVE_ARCH_PMDP_SET_WRPROTECT +static inline void pmdp_set_wrprotect(struct mm_struct *mm, + unsigned long address, pmd_t *pmdp) +{ + atomic_long_and(~(unsigned long)_PAGE_WRITE, (atomic_long_t *)pmdp); +} + +#define __HAVE_ARCH_PMDP_CLEAR_FLUSH +static inline pmd_t pmdp_clear_flush(struct vm_area_struct *vma, + unsigned long address, pmd_t *pmdp) +{ + struct mm_struct *mm =3D (vma)->vm_mm; + pmd_t pmd =3D pmdp_get_and_clear(mm, address, pmdp); + + if (pmd_accessible(mm, pmd)) + flush_tlb_page(vma, address); + + return pmd; } =20 #define __HAVE_ARCH_PMDP_TEST_AND_CLEAR_YOUNG static inline int pmdp_test_and_clear_young(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp) { - return ptep_test_and_clear_young(vma, address, (pte_t *)pmdp); + if (!pmd_young(*pmdp)) + return 0; + return test_and_clear_bit(_PAGE_ACCESSED_OFFSET, &pmd_val(*pmdp)); +} + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static inline int pmd_trans_huge(pmd_t pmd) +{ + return pmd_leaf(pmd); } =20 #define __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR @@ -858,13 +931,6 @@ static inline pmd_t pmdp_huge_get_and_clear(struct mm_= struct *mm, return pmd; } =20 -#define __HAVE_ARCH_PMDP_SET_WRPROTECT -static inline void pmdp_set_wrprotect(struct mm_struct *mm, - unsigned long address, pmd_t *pmdp) -{ - ptep_set_wrprotect(mm, address, (pte_t *)pmdp); -} - #define pmdp_establish pmdp_establish static inline pmd_t pmdp_establish(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp, pmd_t pmd) diff --git a/include/asm-generic/pgtable-nopmd.h b/include/asm-generic/pgta= ble-nopmd.h index 8ffd64e7a24c..acef201b29f5 100644 --- a/include/asm-generic/pgtable-nopmd.h +++ b/include/asm-generic/pgtable-nopmd.h @@ -32,6 +32,7 @@ static inline int pud_bad(pud_t pud) { return 0; } static inline int pud_present(pud_t pud) { return 1; } static inline int pud_user(pud_t pud) { return 0; } static inline int pud_leaf(pud_t pud) { return 0; } +static inline int pud_exec(pud_t pud) { return 0; } static inline void pud_clear(pud_t *pud) { } #define pmd_ERROR(pmd) (pud_ERROR((pmd).pud)) =20 diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index af7639c3b0a3..b8d6e39fefc2 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1630,9 +1630,15 @@ typedef unsigned int pgtbl_mod_mask; #ifndef pud_leaf #define pud_leaf(x) 0 #endif +#ifndef pud_exec +#define pud_exec(x) 0 +#endif #ifndef pmd_leaf #define pmd_leaf(x) 0 #endif +#ifndef pmd_exec +#define pmd_exec(x) 0 +#endif =20 #ifndef pgd_leaf_size #define pgd_leaf_size(x) (1ULL << PGDIR_SHIFT) --=20 2.20.1 From nobody Mon Dec 29 20:12:48 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88AF4C5AD4C for ; Thu, 23 Nov 2023 06:58:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235327AbjKWG6M (ORCPT ); Thu, 23 Nov 2023 01:58:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59340 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344827AbjKWG5x (ORCPT ); Thu, 23 Nov 2023 01:57:53 -0500 Received: from mail-pf1-x429.google.com (mail-pf1-x429.google.com [IPv6:2607:f8b0:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2432DD41 for ; Wed, 22 Nov 2023 22:57:59 -0800 (PST) Received: by mail-pf1-x429.google.com with SMTP id d2e1a72fcca58-6cb66fbc63dso448210b3a.0 for ; Wed, 22 Nov 2023 22:57:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1700722678; x=1701327478; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=C94qAqj8zbtSDuRHlmOHD8Q2VGFM2PRqxiBp3mJrf0U=; b=hRS1ozPttVwREi3cgIqo1ZBMGs5YBFKMRccL5C9fsNzxoPMop6qlfHH0n+R0WBmpRj eeYCPEHF+0cdUAiyHoDmisc4prGPN3ODdbNuE97ETxv7iHVKdbOUcylIqqK0HeoUKzm3 OfhmU0jiJ58HscrObFIRxscE4TN00avEhoqKWlo1+EydGrAmTtQbrzGmzfP6sFUq6lkd V/b/nqF6rQeutXZzNbaCBIfVOwCgTwPyfytewz3wl3BrXsc7YFjKzAYNhVWhc3701evm lg98r1jj1HG90SMlatiNQjthYgXJsGrZuT5Q65xfJ+HkuG6D9gFifhMUQF2VnYPwsyVb UycQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700722678; x=1701327478; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=C94qAqj8zbtSDuRHlmOHD8Q2VGFM2PRqxiBp3mJrf0U=; b=Sfz0ZNPsdqOTvblPrC1PLi5aAV4USOSBPXgdt70E9xWuTZD1SEpbc3KmpRdEHN4x7y DagJs5FdFz+Cuhq7MI8OtJlwy8My7m2n194Ly1Lxu2e6OquILEDBuKm09T6vPv5a2ZFk U5ywKOkry6ZIDkDGRgytz4E8JLXGhviEOkts8aNzNDbtWTS0/dY5VrdHWakKJLUo2zDL CNNaqMe2fJpNf2QYx2YNfMZuPyyKkWaeQcn7jpoyZhHwCzio3ATJamHLO1b3IOLY6QqJ z7janF82+117caiyXiojNu5Y/jXlqzBFL5yHxDYAxujMp+D/hJCxOZCrCE0zkTCukDEt kW5A== X-Gm-Message-State: AOJu0Yw0P/GhKBIYAR3Xft4Z1OtlLPSlQV4SD3DHfya8coPJsbqOY1Dv aQIrw72DwOl0xZz3ucz/N7rv2w== X-Google-Smtp-Source: AGHT+IGnHCkqYYh5NefAhg3MFCZ+EXB4xUd+vxXu7NalDhM8+VhB4KPw88X/a6YywB+kIuA/EhnA0w== X-Received: by 2002:a05:6a00:2d94:b0:6cb:db40:4568 with SMTP id fb20-20020a056a002d9400b006cbdb404568mr2701949pfb.17.1700722678597; Wed, 22 Nov 2023 22:57:58 -0800 (PST) Received: from J9GPGXL7NT.bytedance.net ([139.177.225.230]) by smtp.gmail.com with ESMTPSA id w37-20020a634765000000b005bd2b3a03eesm615437pgk.6.2023.11.22.22.57.53 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 22 Nov 2023 22:57:58 -0800 (PST) From: Xu Lu To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, ardb@kernel.org, anup@brainfault.org, atishp@atishpatra.org Cc: dengliang.1214@bytedance.com, xieyongji@bytedance.com, lihangjing@bytedance.com, songmuchun@bytedance.com, punit.agrawal@bytedance.com, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, Xu Lu Subject: [RFC PATCH V1 06/11] riscv: Distinguish pmd huge pte and napot huge pte Date: Thu, 23 Nov 2023 14:57:03 +0800 Message-Id: <20231123065708.91345-7-luxu.kernel@bytedance.com> X-Mailer: git-send-email 2.39.3 (Apple Git-145) In-Reply-To: <20231123065708.91345-1-luxu.kernel@bytedance.com> References: <20231123065708.91345-1-luxu.kernel@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" There exist two kinds of huge pte on RISC-V: pmd and napot pte. For pmd kind huge pte, the huge page it represents is much larger than base page. Thus pmd kind huge pte can be represented via a single pmd entry and needs no special handling. For napot kind huge pte, it is actually several normal pte entries encoded in Svnapot format. Thus napot kind huge pte should be represented via pte_t struct and handled via pte operations. This commit distinguishes these two kinds of huge pte and handles them with different operations. Signed-off-by: Xu Lu --- arch/riscv/include/asm/hugetlb.h | 71 +++++++++++++++++++++++++++++++- arch/riscv/mm/hugetlbpage.c | 40 ++++++++++++------ 2 files changed, 97 insertions(+), 14 deletions(-) diff --git a/arch/riscv/include/asm/hugetlb.h b/arch/riscv/include/asm/huge= tlb.h index 4c5b0e929890..1cdd5a26e6d4 100644 --- a/arch/riscv/include/asm/hugetlb.h +++ b/arch/riscv/include/asm/hugetlb.h @@ -4,6 +4,7 @@ =20 #include #include +#include =20 static inline void arch_clear_hugepage_flags(struct page *page) { @@ -12,6 +13,7 @@ static inline void arch_clear_hugepage_flags(struct page = *page) #define arch_clear_hugepage_flags arch_clear_hugepage_flags =20 #ifdef CONFIG_RISCV_ISA_SVNAPOT + #define __HAVE_ARCH_HUGE_PTE_CLEAR void huge_pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep, unsigned long sz); @@ -41,10 +43,77 @@ int huge_ptep_set_access_flags(struct vm_area_struct *v= ma, #define __HAVE_ARCH_HUGE_PTEP_GET pte_t huge_ptep_get(pte_t *ptep); =20 +#define __HAVE_ARCH_HUGE_PTEP_GET_LOCKLESS +static inline pte_t huge_ptep_get_lockless(pte_t *ptep) +{ + unsigned long pteval =3D READ_ONCE(ptep->ptes[0]); + + return __pte(pteval); +} + pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags= ); #define arch_make_huge_pte arch_make_huge_pte =20 -#endif /*CONFIG_RISCV_ISA_SVNAPOT*/ +#else /* CONFIG_RISCV_ISA_SVNAPOT */ + +#define __HAVE_ARCH_HUGE_PTEP_GET +static inline pte_t huge_ptep_get(pte_t *ptep) +{ + pmd_t *pmdp =3D (pmd_t *)ptep; + + return pmd_pte(pmdp_get(pdmp)); +} + +#define __HAVE_ARCH_HUGE_PTEP_GET_LOCKLESS +static inline pte_t huge_ptep_get_lockless(pte_t *ptep) +{ + return huge_ptep_get(ptep); +} + +#define __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT +static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long add= r, + pte_t *ptep, pte_t pte) +{ + set_pmd_at(mm, addr, (pmd_t *)ptep, pte_pmd(pte)); +} + +#define __HAVE_ARCH_HUGE_PTEP_SET_ACCESS_FLAGS +static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep, + pte_t pte, int dirty) +{ + return pmdp_set_access_flags(vma, addr, (pmd_t *)ptep, pte_pmd(pte), dirt= y); +} + +#define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR +static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm, + unsigned long addr, pte_t *ptep) +{ + return pmd_pte(pmdp_get_and_clear(mm, addr, (pmd_t *)ptep)); +} + +#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT +static inline void huge_ptep_set_wrprotect(struct mm_struct *mm, + unsigned long addr, pte_t *ptep) +{ + pmdp_set_wrprotect(mm, addr, (pmd_t *)ptep); +} + +#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH +static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) +{ + return pmd_pte(pmdp_clear_flush(vma, addr, (pmd_t *)ptep)); +} + +#define __HAVE_ARCH_HUGE_PTE_CLEAR +static inline void huge_pte_clear(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, unsigned long sz) +{ + pmd_clear((pmd_t *)ptep); +} + +#endif /* CONFIG_RISCV_ISA_SVNAPOT */ =20 #include =20 diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c index 67fd71c36853..4a2ad8657502 100644 --- a/arch/riscv/mm/hugetlbpage.c +++ b/arch/riscv/mm/hugetlbpage.c @@ -7,8 +7,13 @@ pte_t huge_ptep_get(pte_t *ptep) { unsigned long pte_num; int i; - pte_t orig_pte =3D ptep_get(ptep); + pmd_t *pmdp =3D (pmd_t *)ptep; + pte_t orig_pte =3D pmd_pte(pmdp_get(pmdp)); =20 + /* + * Non napot pte indicates a middle page table entry and + * should be treated as a pmd. + */ if (!pte_present(orig_pte) || !pte_napot(orig_pte)) return orig_pte; =20 @@ -198,6 +203,8 @@ void set_huge_pte_at(struct mm_struct *mm, hugepage_shift =3D PAGE_SHIFT; =20 pte_num =3D sz >> hugepage_shift; + if (pte_num =3D=3D 1) + set_pmd_at(mm, addr, (pmd_t *)ptep, pte_pmd(pte)); for (i =3D 0; i < pte_num; i++, ptep++, addr +=3D (1 << hugepage_shift)) set_pte_at(mm, addr, ptep, pte); } @@ -214,7 +221,8 @@ int huge_ptep_set_access_flags(struct vm_area_struct *v= ma, int i, pte_num; =20 if (!pte_napot(pte)) - return ptep_set_access_flags(vma, addr, ptep, pte, dirty); + return pmdp_set_access_flags(vma, addr, (pmd_t *)ptep, + pte_pmd(pte), dirty); =20 order =3D napot_cont_order(pte); pte_num =3D napot_pte_num(order); @@ -237,11 +245,12 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { - pte_t orig_pte =3D ptep_get(ptep); + pmd_t *pmdp =3D (pmd_t *)ptep; + pte_t orig_pte =3D pmd_pte(pmdp_get(pmdp)); int pte_num; =20 if (!pte_napot(orig_pte)) - return ptep_get_and_clear(mm, addr, ptep); + return pmd_pte(pmdp_get_and_clear(mm, addr, pmdp)); =20 pte_num =3D napot_pte_num(napot_cont_order(orig_pte)); =20 @@ -252,13 +261,14 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { - pte_t pte =3D ptep_get(ptep); + pmd_t *pmdp =3D (pmd_t *)ptep; + pte_t pte =3D pmd_pte(pmdp_get(pmdp)); unsigned long order; pte_t orig_pte; int i, pte_num; =20 if (!pte_napot(pte)) { - ptep_set_wrprotect(mm, addr, ptep); + pmdp_set_wrprotect(mm, addr, pmdp); return; } =20 @@ -277,11 +287,12 @@ pte_t huge_ptep_clear_flush(struct vm_area_struct *vm= a, unsigned long addr, pte_t *ptep) { - pte_t pte =3D ptep_get(ptep); + pmd_t *pmdp =3D (pmd_t *)ptep; + pte_t pte =3D pmd_pte(pmdp_get(pmdp)); int pte_num; =20 if (!pte_napot(pte)) - return ptep_clear_flush(vma, addr, ptep); + return pmd_pte(pmdp_clear_flush(vma, addr, pmdp)); =20 pte_num =3D napot_pte_num(napot_cont_order(pte)); =20 @@ -293,11 +304,12 @@ void huge_pte_clear(struct mm_struct *mm, pte_t *ptep, unsigned long sz) { - pte_t pte =3D ptep_get(ptep); + pmd_t *pmdp =3D (pmd_t *)ptep; + pte_t pte =3D pmd_pte(pmdp_get(pmdp)); int i, pte_num; =20 if (!pte_napot(pte)) { - pte_clear(mm, addr, ptep); + pmd_clear(pmdp); return; } =20 @@ -325,8 +337,10 @@ static __init int napot_hugetlbpages_init(void) if (has_svnapot()) { unsigned long order; =20 - for_each_napot_order(order) - hugetlb_add_hstate(order); + for_each_napot_order(order) { + if (napot_cont_shift(order) > PAGE_SHIFT) + hugetlb_add_hstate(order); + } } return 0; } @@ -357,7 +371,7 @@ bool __init arch_hugetlb_valid_size(unsigned long size) return true; else if (IS_ENABLED(CONFIG_64BIT) && size =3D=3D PUD_SIZE) return true; - else if (is_napot_size(size)) + else if (is_napot_size(size) && size > PAGE_SIZE) return true; else return false; --=20 2.20.1 From nobody Mon Dec 29 20:12:48 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7CFAC61D85 for ; Thu, 23 Nov 2023 06:58:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235318AbjKWG6s (ORCPT ); Thu, 23 Nov 2023 01:58:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49192 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235234AbjKWG6a (ORCPT ); Thu, 23 Nov 2023 01:58:30 -0500 Received: from mail-pf1-x434.google.com (mail-pf1-x434.google.com [IPv6:2607:f8b0:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 889C710EA for ; Wed, 22 Nov 2023 22:58:04 -0800 (PST) Received: by mail-pf1-x434.google.com with SMTP id d2e1a72fcca58-6cbb71c3020so1197838b3a.1 for ; Wed, 22 Nov 2023 22:58:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1700722684; x=1701327484; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=WlPuhP739U6c+OrO0VAY/Be/8JPEzg6mBNajKv/ssXQ=; b=MWj3t1HpN0T63MipFgSnipVRxf1ih7zKLOOlkA5Rwxh9CZjwq5u7RaduC6I9/qGRtD L/yWLpP0aZra7tH1NAsWobtz2rjVXPUNeyzj/YcmXAE0tifgBi+G1f37pLfn+1cGdueR /lX/uS0OK+aY47RnifR3kV2j2hfvIoGe36DceubcBeS5VlxLgu8jGhlS/oZJFQgRN+lF diqnAvAYnXMwItkqhaaB5DKklvm8T9vCAK01awmTf9XCI/Cei7bOzvL+XV9rdEmfVyMv Pej3sZcOvPGT4NNZ8Cokgne+Wsr06ZAWfRNqRuKzr/DFHxCzgeig/A8U2Ukwv+N3VFMj TJ9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700722684; x=1701327484; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WlPuhP739U6c+OrO0VAY/Be/8JPEzg6mBNajKv/ssXQ=; b=uzDzyhoPZDc3toOcX5tcowGzfEAZlgfI2fYovgO2T5Tpx61SBMJl4jIOh9wBNq4dyA cAs4GgzI+0jxshfm1DlE+qHmMjPxLMLvaCpCjiS0QlYeEtKNr814sevWDszxliEh2oOm sZE994X4DRQgzBcXMFF9vQQmuYKwHBL//BRtNW1YeptZKmcjNPK1/ikF+aV1N/l5021v RkoweRpzBdPI+6ApQEG7S92rXbvza10v3WrCG5eS4uWU2wdHKozHywrwF5hH6oU56Hh+ uKyhElSaQTNjK2ocqboqt8cEXJGqyhdBB7DnOXjyfH1fOS4yq93KIyJ21SM/LgzpkNLe VTCw== X-Gm-Message-State: AOJu0YzUp93/gr2BX7GjmfjgrPK+aCYPeTF6mp0z047Pph9m3TJXnHF2 8hC9s5O/zbM58jrIjIxaaU+Hu1LirFyemLUpf6I= X-Google-Smtp-Source: AGHT+IE8riuXo5L/cjCywKIfFVpKle7h5wIh28b47x4mdmNZCzuyeGv68fbHbrOiwUZ6PsJ9tMJIhw== X-Received: by 2002:a05:6a20:7484:b0:187:72e7:6d98 with SMTP id p4-20020a056a20748400b0018772e76d98mr2523467pzd.3.1700722683901; Wed, 22 Nov 2023 22:58:03 -0800 (PST) Received: from J9GPGXL7NT.bytedance.net ([139.177.225.230]) by smtp.gmail.com with ESMTPSA id w37-20020a634765000000b005bd2b3a03eesm615437pgk.6.2023.11.22.22.57.59 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 22 Nov 2023 22:58:03 -0800 (PST) From: Xu Lu To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, ardb@kernel.org, anup@brainfault.org, atishp@atishpatra.org Cc: dengliang.1214@bytedance.com, xieyongji@bytedance.com, lihangjing@bytedance.com, songmuchun@bytedance.com, punit.agrawal@bytedance.com, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, Xu Lu Subject: [RFC PATCH V1 07/11] riscv: Adapt satp operations to gap between hw page and sw page Date: Thu, 23 Nov 2023 14:57:04 +0800 Message-Id: <20231123065708.91345-8-luxu.kernel@bytedance.com> X-Mailer: git-send-email 2.39.3 (Apple Git-145) In-Reply-To: <20231123065708.91345-1-luxu.kernel@bytedance.com> References: <20231123065708.91345-1-luxu.kernel@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The control register CSR_SATP on RISC-V, which points to the root page table page, is used by MMU to translate va to pa when TLB miss happens. Thus it should be encoded at a granularity of hardware page, while existing code usually encodes it via software page frame number. This commit corrects encoding operations of CSR_SATP register. To get developers rid of the annoying encoding format of CSR_SATP and the conversion between sw pfn and hw pfn, we abstract the encoding operations of CSR_SATP into a specific function. Signed-off-by: Xu Lu --- arch/riscv/include/asm/pgtable.h | 7 +++++++ arch/riscv/kernel/head.S | 4 ++-- arch/riscv/kernel/hibernate.c | 3 ++- arch/riscv/mm/context.c | 7 +++---- arch/riscv/mm/fault.c | 1 + arch/riscv/mm/init.c | 7 +++++-- arch/riscv/mm/kasan_init.c | 7 +++++-- 7 files changed, 25 insertions(+), 11 deletions(-) diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgta= ble.h index 9f81fe046cb8..56366f07985d 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -213,6 +213,13 @@ extern pgd_t swapper_pg_dir[]; extern pgd_t trampoline_pg_dir[]; extern pgd_t early_pg_dir[]; =20 +static inline unsigned long make_satp(unsigned long pfn, + unsigned long asid, unsigned long satp_mode) +{ + return (pfn_to_hwpfn(pfn) | + ((asid & SATP_ASID_MASK) << SATP_ASID_SHIFT) | satp_mode); +} + static __always_inline int __pte_present(unsigned long pteval) { return (pteval & (_PAGE_PRESENT | _PAGE_PROT_NONE)); diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S index b77397432403..dace2e4e6164 100644 --- a/arch/riscv/kernel/head.S +++ b/arch/riscv/kernel/head.S @@ -87,7 +87,7 @@ relocate_enable_mmu: csrw CSR_TVEC, a2 =20 /* Compute satp for kernel page tables, but don't load it yet */ - srl a2, a0, PAGE_SHIFT + srl a2, a0, HW_PAGE_SHIFT la a1, satp_mode REG_L a1, 0(a1) or a2, a2, a1 @@ -100,7 +100,7 @@ relocate_enable_mmu: */ la a0, trampoline_pg_dir XIP_FIXUP_OFFSET a0 - srl a0, a0, PAGE_SHIFT + srl a0, a0, HW_PAGE_SHIFT or a0, a0, a1 sfence.vma csrw CSR_SATP, a0 diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c index 671b686c0158..155be6b1d32c 100644 --- a/arch/riscv/kernel/hibernate.c +++ b/arch/riscv/kernel/hibernate.c @@ -395,7 +395,8 @@ int swsusp_arch_resume(void) if (ret) return ret; =20 - hibernate_restore_image(resume_hdr.saved_satp, (PFN_DOWN(__pa(resume_pg_d= ir)) | satp_mode), + hibernate_restore_image(resume_hdr.saved_satp, + make_satp(PFN_DOWN(__pa(resume_pg_dir)), 0, satp_mode), resume_hdr.restore_cpu_addr); =20 return 0; diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c index 217fd4de6134..2ecf87433dfc 100644 --- a/arch/riscv/mm/context.c +++ b/arch/riscv/mm/context.c @@ -190,9 +190,8 @@ static void set_mm_asid(struct mm_struct *mm, unsigned = int cpu) raw_spin_unlock_irqrestore(&context_lock, flags); =20 switch_mm_fast: - csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | - ((cntx & asid_mask) << SATP_ASID_SHIFT) | - satp_mode); + csr_write(CSR_SATP, make_satp(virt_to_pfn(mm->pgd), (cntx & asid_mask), + satp_mode)); =20 if (need_flush_tlb) local_flush_tlb_all(); @@ -201,7 +200,7 @@ static void set_mm_asid(struct mm_struct *mm, unsigned = int cpu) static void set_mm_noasid(struct mm_struct *mm) { /* Switch the page table and blindly nuke entire local TLB */ - csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | satp_mode); + csr_write(CSR_SATP, make_satp(virt_to_pfn(mm->pgd), 0, satp_mode)); local_flush_tlb_all(); } =20 diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c index 90d4ba36d1d0..026ac007febf 100644 --- a/arch/riscv/mm/fault.c +++ b/arch/riscv/mm/fault.c @@ -133,6 +133,7 @@ static inline void vmalloc_fault(struct pt_regs *regs, = int code, unsigned long a */ index =3D pgd_index(addr); pfn =3D csr_read(CSR_SATP) & SATP_PPN; + pfn =3D hwpfn_to_pfn(pfn); pgd =3D (pgd_t *)pfn_to_virt(pfn) + index; pgd_k =3D init_mm.pgd + index; =20 diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c index a768b2b3ff05..c33a90d0c51d 100644 --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -805,7 +805,7 @@ static __init void set_satp_mode(uintptr_t dtb_pa) (uintptr_t)early_p4d : (uintptr_t)early_pud, PGDIR_SIZE, PAGE_TABLE); =20 - identity_satp =3D PFN_DOWN((uintptr_t)&early_pg_dir) | satp_mode; + identity_satp =3D make_satp(PFN_DOWN((uintptr_t)&early_pg_dir), 0, satp_m= ode); =20 local_flush_tlb_all(); csr_write(CSR_SATP, identity_satp); @@ -1285,6 +1285,8 @@ static void __init create_linear_mapping_page_table(v= oid) =20 static void __init setup_vm_final(void) { + unsigned long satp; + /* Setup swapper PGD for fixmap */ #if !defined(CONFIG_64BIT) /* @@ -1318,7 +1320,8 @@ static void __init setup_vm_final(void) clear_fixmap(FIX_P4D); =20 /* Move to swapper page table */ - csr_write(CSR_SATP, PFN_DOWN(__pa_symbol(swapper_pg_dir)) | satp_mode); + satp =3D make_satp(PFN_DOWN(__pa_symbol(swapper_pg_dir)), 0, satp_mode); + csr_write(CSR_SATP, satp); local_flush_tlb_all(); =20 pt_ops_set_late(); diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c index 5e39dcf23fdb..72269e9f1964 100644 --- a/arch/riscv/mm/kasan_init.c +++ b/arch/riscv/mm/kasan_init.c @@ -471,11 +471,13 @@ static void __init create_tmp_mapping(void) =20 void __init kasan_init(void) { + unsigned long satp; phys_addr_t p_start, p_end; u64 i; =20 create_tmp_mapping(); - csr_write(CSR_SATP, PFN_DOWN(__pa(tmp_pg_dir)) | satp_mode); + satp =3D make_satp(PFN_DOWN(__pa(tmp_pg_dir)), 0, satp_mode); + csr_write(CSR_SATP, satp); =20 kasan_early_clear_pgd(pgd_offset_k(KASAN_SHADOW_START), KASAN_SHADOW_START, KASAN_SHADOW_END); @@ -520,6 +522,7 @@ void __init kasan_init(void) memset(kasan_early_shadow_page, KASAN_SHADOW_INIT, PAGE_SIZE); init_task.kasan_depth =3D 0; =20 - csr_write(CSR_SATP, PFN_DOWN(__pa(swapper_pg_dir)) | satp_mode); + satp =3D make_satp(PFN_DOWN(__pa(swapper_pg_dir)), 0, satp_mode); + csr_write(CSR_SATP, satp); local_flush_tlb_all(); } --=20 2.20.1 From nobody Mon Dec 29 20:12:48 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7B385C5AD4C for ; Thu, 23 Nov 2023 06:59:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235322AbjKWG6y (ORCPT ); Thu, 23 Nov 2023 01:58:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49480 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234509AbjKWG6f (ORCPT ); Thu, 23 Nov 2023 01:58:35 -0500 Received: from mail-pf1-x431.google.com (mail-pf1-x431.google.com [IPv6:2607:f8b0:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DB109D53 for ; Wed, 22 Nov 2023 22:58:09 -0800 (PST) Received: by mail-pf1-x431.google.com with SMTP id d2e1a72fcca58-6cbe716b511so212930b3a.3 for ; Wed, 22 Nov 2023 22:58:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1700722689; x=1701327489; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=LItBiu3njgwOLf534DBFrKwLVF0r3rYUowPJh3TGf00=; b=UzGdsOSz9JSDfHRi5KKnT4dlZtIi2nhA7QGLNCAgBQOMZQKID2pPvTO29f/rPAFOXq EK7NkQN3aL+S8AtA/bpAD9ehTJAz+JwOGcsKBapLl81gCz/IpCOqVRS9Oi9JaJyUbHCP Sn127m6mYRPrTCbXtvIoFwqaHDm0aDV4f1SnTmtsOFYQmw2vC6Nfi/RX+BclfoyX3OB8 mCAqes1YYDgFwLzs0v4U2UnvejN/E+9kNTdBa5sgFgdRUNsWGCFzS8rAYx6SfQs5rPcp 6sv0RJu36k5rxfLIJaKG1ZfBbQIUBJkSxTrYnPUB9ty8+3/athaoDftvkM/9vUyzNl0+ FGGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700722689; x=1701327489; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LItBiu3njgwOLf534DBFrKwLVF0r3rYUowPJh3TGf00=; b=PEnjSCEyovcy3U8z3dEC2i+m3prDBY1EWc+7aFk/FJzL/tjiSRCvGs2b5yE1Dj+CcC DjUuvECFMH3OefINqFpsh0h5SBOflxKW4VSqLfySUGAXk8b6EXG0x8sQOVefPYCQGcIk miJcghsgcX9R2ciwXYe8maXfnYi0yrKBeVhuQ/mDNlsVSNA+9csSv69cL6wjsj+WGnuR EnnjDr8zB4fNfEqWE7EmMu+JxrpFdDRapdt5D6sYhr5Tg/W3Da3dRd7F3QZTJNONFfJ+ s8tknts+I0dueCLyZidFvm8uB/xbFdYLPntCLC2ffcIIz95H41aS0OGbTTL7dNqzOFRT 4ckw== X-Gm-Message-State: AOJu0Yy2qwwNEiLqj8R4jsXlv1Obf6cE2dFUJm8eKWTGUAR/6nR/FjDN c2n26LoZUXzU9z8jhg/atIjJgw== X-Google-Smtp-Source: AGHT+IHDftjTLrWgyN3MNpT75XgaN32WdbFQKAWGC+MLKZT2N1z3VTeVkMs+OclMTvua0pcY3LDEGg== X-Received: by 2002:a05:6a00:35cf:b0:6cb:8c70:4790 with SMTP id dc15-20020a056a0035cf00b006cb8c704790mr5824334pfb.1.1700722689341; Wed, 22 Nov 2023 22:58:09 -0800 (PST) Received: from J9GPGXL7NT.bytedance.net ([139.177.225.230]) by smtp.gmail.com with ESMTPSA id w37-20020a634765000000b005bd2b3a03eesm615437pgk.6.2023.11.22.22.58.04 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 22 Nov 2023 22:58:09 -0800 (PST) From: Xu Lu To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, ardb@kernel.org, anup@brainfault.org, atishp@atishpatra.org Cc: dengliang.1214@bytedance.com, xieyongji@bytedance.com, lihangjing@bytedance.com, songmuchun@bytedance.com, punit.agrawal@bytedance.com, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, Xu Lu Subject: [RFC PATCH V1 08/11] riscv: Apply Svnapot for base page mapping Date: Thu, 23 Nov 2023 14:57:05 +0800 Message-Id: <20231123065708.91345-9-luxu.kernel@bytedance.com> X-Mailer: git-send-email 2.39.3 (Apple Git-145) In-Reply-To: <20231123065708.91345-1-luxu.kernel@bytedance.com> References: <20231123065708.91345-1-luxu.kernel@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The Svnapot extension on RISC-V is like contiguous PTE on ARM64. It allows ptes of a naturally aligned power-of 2 (NAPOT) memory range be encoded in the same format to save the TLB space. This commit applies Svnapot for each base page's mapping. This commit is the key to achieving larger base page's performance optimization. Signed-off-by: Xu Lu --- arch/riscv/include/asm/pgtable.h | 34 +++++++++++++++++++++++++++----- 1 file changed, 29 insertions(+), 5 deletions(-) diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgta= ble.h index 56366f07985d..803dc5fb6314 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -230,6 +230,16 @@ static __always_inline unsigned long __pte_napot(unsig= ned long pteval) return pteval & _PAGE_NAPOT; } =20 +static __always_inline unsigned long __pte_mknapot(unsigned long pteval, + unsigned int order) +{ + int pos =3D order - 1 + _PAGE_PFN_SHIFT; + unsigned long napot_bit =3D BIT(pos); + unsigned long napot_mask =3D ~GENMASK(pos, _PAGE_PFN_SHIFT); + + return (pteval & napot_mask) | napot_bit | _PAGE_NAPOT; +} + static inline pte_t __pte(unsigned long pteval) { pte_t pte; @@ -348,13 +358,11 @@ static inline unsigned long pte_napot(pte_t pte) return __pte_napot(pte_val(pte)); } =20 -static inline pte_t pte_mknapot(pte_t pte, unsigned int order) +static inline pte_t pte_mknapot(pte_t pte, unsigned int page_order) { - int pos =3D order - 1 + _PAGE_PFN_SHIFT; - unsigned long napot_bit =3D BIT(pos); - unsigned long napot_mask =3D ~GENMASK(pos, _PAGE_PFN_SHIFT); + unsigned int hw_page_order =3D page_order + (PAGE_SHIFT - HW_PAGE_SHIFT); =20 - return __pte((pte_val(pte) & napot_mask) | napot_bit | _PAGE_NAPOT); + return __pte(__pte_mknapot(pte_val(pte), hw_page_order)); } =20 #else @@ -366,6 +374,11 @@ static inline unsigned long pte_napot(pte_t pte) return 0; } =20 +static inline pte_t pte_mknapot(pte_t pte, unsigned int page_order) +{ + return pte; +} + #endif /* CONFIG_RISCV_ISA_SVNAPOT */ =20 /* Yields the page frame number (PFN) of a page table entry */ @@ -585,6 +598,17 @@ static inline int pte_same(pte_t pte_a, pte_t pte_b) */ static inline void set_pte(pte_t *ptep, pte_t pteval) { + unsigned long order; + + /* + * has_svnapot() always return false before riscv_isa is initialized. + */ + if (has_svnapot() && pte_present(pteval) && !pte_napot(pteval)) { + for_each_napot_order(order) { + if (napot_cont_shift(order) =3D=3D PAGE_SHIFT) + pteval =3D pte_mknapot(pteval, order); + } + } *ptep =3D pteval; } =20 --=20 2.20.1 From nobody Mon Dec 29 20:12:48 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E4B4C5AD4C for ; Thu, 23 Nov 2023 06:59:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233019AbjKWG7J (ORCPT ); Thu, 23 Nov 2023 01:59:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50044 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233005AbjKWG6m (ORCPT ); Thu, 23 Nov 2023 01:58:42 -0500 Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 89E8D173A for ; Wed, 22 Nov 2023 22:58:15 -0800 (PST) Received: by mail-oi1-x22b.google.com with SMTP id 5614622812f47-3b2f4a5ccebso397698b6e.3 for ; Wed, 22 Nov 2023 22:58:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1700722695; x=1701327495; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Xy98WgGNwj9rOwWzhsI6VzEfMnk+JFUolM0E3kCQh54=; b=MB/rSC/3YZADVKTafFmvFV3jrY93Z0k8QNx3OSQcfLT/4ksxpUwtrwR3SmnG8TPxYx fq/ayY6XLaToLZ7QVLJoqphbgRrgYVtjqHEA7j2kvLBi+2OwhIt7fK3kUnBGv4z36PYb RTj2fTGlHpY/hExfVb3KJOhTIZS9Kxb8mqa+Ow7NKwov2UHX8M/6FCjZmI4k4yIdmHsP E38IHohxg3UvBdQoP78c8wiNIltyvITUyh8euXvqbxNzEU6G7ZK5F665jEK9VYgb5z33 XtnAKDtuKujnojVMEaV+YSsmiN2gbRiiNpFV4EXAHcXeuSS8nNVw7cFY0R9Aa9+QkLV8 1ekQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700722695; x=1701327495; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Xy98WgGNwj9rOwWzhsI6VzEfMnk+JFUolM0E3kCQh54=; b=EW8C7hTHuHb/XJLbGkliqKV5aVZFSfyp68KrSx6rEoYMY1rZy/3OjnLrgF78ebaQ8K DFMUtaf1S/erYqWb/198Iaz4S7xGpvd9Oj/1G/aBzwf5poOF2ca9pIks+s8lQVA2hVeK Y3/0vd/iembImm5JJ34t7ctxYdTxp/O6SeSWv/RNtUFY/KPgKKyDS3Pjtr47a8rCjBid rmjFsKtAOV2SlftZMsWfBy6eG25YVtUw6+dlm5dhjlSJGFUzACokD/8EZGXhfWRuYE5g rFWvCPhyCuB1Uymz9MG2KSgZi1Zrxt1Vmv/r6G0BpwKYLuUxsjpRjYXW6xoLJE6wJHlG kEcA== X-Gm-Message-State: AOJu0Yw5+79+K70raMuyMfkVBBncbC/gWbpgbSJIVQoXtO2fmiuHNn3S evxJ/iHscl/yaNJxHncK8jCJNg== X-Google-Smtp-Source: AGHT+IEh9UjPGyaaQjuaIBLcYWiSd7ay7EOeP118dTtKlJ+CAM34oaGt+k986nYAGOfCpnNSdObjog== X-Received: by 2002:a05:6808:ab8:b0:3b2:f557:666e with SMTP id r24-20020a0568080ab800b003b2f557666emr5093225oij.19.1700722694829; Wed, 22 Nov 2023 22:58:14 -0800 (PST) Received: from J9GPGXL7NT.bytedance.net ([139.177.225.230]) by smtp.gmail.com with ESMTPSA id w37-20020a634765000000b005bd2b3a03eesm615437pgk.6.2023.11.22.22.58.09 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 22 Nov 2023 22:58:14 -0800 (PST) From: Xu Lu To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, ardb@kernel.org, anup@brainfault.org, atishp@atishpatra.org Cc: dengliang.1214@bytedance.com, xieyongji@bytedance.com, lihangjing@bytedance.com, songmuchun@bytedance.com, punit.agrawal@bytedance.com, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, Xu Lu Subject: [RFC PATCH V1 09/11] riscv: Adjust fix_btmap slots number to match variable page size Date: Thu, 23 Nov 2023 14:57:06 +0800 Message-Id: <20231123065708.91345-10-luxu.kernel@bytedance.com> X-Mailer: git-send-email 2.39.3 (Apple Git-145) In-Reply-To: <20231123065708.91345-1-luxu.kernel@bytedance.com> References: <20231123065708.91345-1-luxu.kernel@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The existing fixmap slot number will cause the fixmap size to exceed FIX_FDT_SIZE when base page becomes larger than 4K. This patch adjusts the slot number to make them always match. Signed-off-by: Xu Lu --- arch/riscv/include/asm/fixmap.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/riscv/include/asm/fixmap.h b/arch/riscv/include/asm/fixma= p.h index 0a55099bb734..17bf31334bd5 100644 --- a/arch/riscv/include/asm/fixmap.h +++ b/arch/riscv/include/asm/fixmap.h @@ -44,7 +44,8 @@ enum fixed_addresses { * before ioremap() is functional. */ #define NR_FIX_BTMAPS (SZ_256K / PAGE_SIZE) -#define FIX_BTMAPS_SLOTS 7 +#define FIX_BTMAPS_SIZE (FIXADDR_SIZE - ((FIX_BTMAP_END + 1) << PAGE_SHIF= T)) +#define FIX_BTMAPS_SLOTS (FIX_BTMAPS_SIZE / SZ_256K) #define TOTAL_FIX_BTMAPS (NR_FIX_BTMAPS * FIX_BTMAPS_SLOTS) =20 FIX_BTMAP_END =3D __end_of_permanent_fixed_addresses, --=20 2.20.1 From nobody Mon Dec 29 20:12:48 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60405C61D85 for ; Thu, 23 Nov 2023 06:59:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235352AbjKWG7Y (ORCPT ); Thu, 23 Nov 2023 01:59:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49378 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235294AbjKWG6x (ORCPT ); Thu, 23 Nov 2023 01:58:53 -0500 Received: from mail-ot1-x335.google.com (mail-ot1-x335.google.com [IPv6:2607:f8b0:4864:20::335]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 019141991 for ; Wed, 22 Nov 2023 22:58:20 -0800 (PST) Received: by mail-ot1-x335.google.com with SMTP id 46e09a7af769-6d7f225819eso357307a34.1 for ; Wed, 22 Nov 2023 22:58:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1700722700; x=1701327500; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=VzxcE+HP292pep10tUlqLGBeuvx8vBA9uQw9Wbs1mQo=; b=TjgpT4e6kpbUdqTL+xtyfLfShYeFP0N8kCdceChql3R4ffJwzf8MbEJlru1egG+Yk2 OG4wx7A6QCv2HP75asEE/KGp9f9OSzJg1+C0PsJGljdDDTZAkpIPUhBhwTOrYz5qn5ni P8sJ/es0IXS+xGp1Znc2ZlA/JYTxmBG0U1gBwO+zh+qegN4T7A+Fu0BAcZ5FI7Yt2Svi 2K3rLLbM2dSaiTURQAbiQ7YqpDBegL+uvH5aZQIVsd20A80rsipoBMbPTjMRJMPT/Y2R +/ZqZbLXxhM5/CE75GOQegp1UnOa767ZGPKZaqgX16sV9HAymoHv/lHmEhfZWgwI/Sst UC4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700722700; x=1701327500; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VzxcE+HP292pep10tUlqLGBeuvx8vBA9uQw9Wbs1mQo=; b=b+WSHz1duj6+5I+0K2cv+F9TXKvtzKccpd64izO926wBhgmW1iDAZglPgK0fg/pG3G N5ADsfTqoZ2pFMTq/3CUHyZFHJtkx0pdKFwGyXw++n4SdRp6NBY+kQi50bIUVmgkDS7a zfR1tqozkeSMSpLhqEXM0khNwErZHqa905C61k1ezKg/6+gt98XfWwLw67rV+5DOelEe k05bpxrTMdfNnGxOsbebIHzR5LODwxKKeMbcL0oZExwygVZJnelrKVT335S6EOFNoiWf G1TPF/TmHy+LwK8WVeH9+lVWpbAmMqQ1zM9LxVTwFb6+t0NjrWl0TXXqwefAZozWfswW +gDQ== X-Gm-Message-State: AOJu0YzAoKtUJmIuPvBGKfqM4GKKpGOCgn/miAmD7/u1ViPOY3Cvtqjy t1ykDBJ4Gxnoj0A4gte5odstbQ== X-Google-Smtp-Source: AGHT+IHfN56opjKVRtmf8nkhgYvS+GzTzNtB590B3ZTYABAvfpqWqq/B6z3vd8AMP9SjHn+EpQsRvw== X-Received: by 2002:a05:6830:2646:b0:6d7:f8c1:e473 with SMTP id f6-20020a056830264600b006d7f8c1e473mr1973461otu.19.1700722700289; Wed, 22 Nov 2023 22:58:20 -0800 (PST) Received: from J9GPGXL7NT.bytedance.net ([139.177.225.230]) by smtp.gmail.com with ESMTPSA id w37-20020a634765000000b005bd2b3a03eesm615437pgk.6.2023.11.22.22.58.15 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 22 Nov 2023 22:58:19 -0800 (PST) From: Xu Lu To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, ardb@kernel.org, anup@brainfault.org, atishp@atishpatra.org Cc: dengliang.1214@bytedance.com, xieyongji@bytedance.com, lihangjing@bytedance.com, songmuchun@bytedance.com, punit.agrawal@bytedance.com, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, Xu Lu Subject: [RFC PATCH V1 10/11] riscv: kvm: Adapt kvm to gap between hw page and sw page Date: Thu, 23 Nov 2023 14:57:07 +0800 Message-Id: <20231123065708.91345-11-luxu.kernel@bytedance.com> X-Mailer: git-send-email 2.39.3 (Apple Git-145) In-Reply-To: <20231123065708.91345-1-luxu.kernel@bytedance.com> References: <20231123065708.91345-1-luxu.kernel@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Existing mmu code in kvm handles middle level page table entry and the last level page table entry in the same way, which is insufficient when base page becomes larger. For example, for 64K base page, per pte_t contains 16 page table entries while per pmd_t still contains one and thus needs to be handled in different ways. This commit refines kvm mmu code to handle middle level page table entries and last level page table entries distinctively. Signed-off-by: Xu Lu --- arch/riscv/include/asm/pgtable.h | 7 ++ arch/riscv/kvm/mmu.c | 198 +++++++++++++++++++++---------- 2 files changed, 145 insertions(+), 60 deletions(-) diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgta= ble.h index 803dc5fb6314..9bed1512b3d2 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -220,6 +220,13 @@ static inline unsigned long make_satp(unsigned long pf= n, ((asid & SATP_ASID_MASK) << SATP_ASID_SHIFT) | satp_mode); } =20 +static inline unsigned long make_hgatp(unsigned long pfn, + unsigned long vmid, unsigned long hgatp_mode) +{ + return ((pfn_to_hwpfn(pfn) & HGATP_PPN) | + ((vmid << HGATP_VMID_SHIFT) & HGATP_VMID) | hgatp_mode); +} + static __always_inline int __pte_present(unsigned long pteval) { return (pteval & (_PAGE_PRESENT | _PAGE_PROT_NONE)); diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c index 068c74593871..f26d3e94fe17 100644 --- a/arch/riscv/kvm/mmu.c +++ b/arch/riscv/kvm/mmu.c @@ -36,22 +36,36 @@ static unsigned long gstage_pgd_levels __ro_after_init = =3D 2; gstage_pgd_xbits) #define gstage_gpa_size ((gpa_t)(1ULL << gstage_gpa_bits)) =20 +#define gstage_pmd_leaf(__pmdp) \ + (pmd_val(pmdp_get(__pmdp)) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)) + #define gstage_pte_leaf(__ptep) \ - (pte_val(*(__ptep)) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)) + (pte_val(ptep_get(__ptep)) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)) =20 -static inline unsigned long gstage_pte_index(gpa_t addr, u32 level) +static inline unsigned long gstage_pmd_index(gpa_t addr, u32 level) { unsigned long mask; unsigned long shift =3D HGATP_PAGE_SHIFT + (gstage_index_bits * level); =20 + BUG_ON(level =3D=3D 0); if (level =3D=3D (gstage_pgd_levels - 1)) - mask =3D (PTRS_PER_PTE * (1UL << gstage_pgd_xbits)) - 1; + mask =3D (PTRS_PER_PMD * (1UL << gstage_pgd_xbits)) - 1; else - mask =3D PTRS_PER_PTE - 1; + mask =3D PTRS_PER_PMD - 1; =20 return (addr >> shift) & mask; } =20 +static inline unsigned long gstage_pte_index(gpa_t addr, u32 level) +{ + return (addr >> PAGE_SHIFT) & (PTRS_PER_PTE - 1); +} + +static inline unsigned long gstage_pmd_page_vaddr(pmd_t pmd) +{ + return (unsigned long)pfn_to_virt(__page_val_to_pfn(pmd_val(pmd))); +} + static inline unsigned long gstage_pte_page_vaddr(pte_t pte) { return (unsigned long)pfn_to_virt(__page_val_to_pfn(pte_val(pte))); @@ -60,9 +74,13 @@ static inline unsigned long gstage_pte_page_vaddr(pte_t = pte) static int gstage_page_size_to_level(unsigned long page_size, u32 *out_lev= el) { u32 i; - unsigned long psz =3D 1UL << 12; + unsigned long psz =3D 1UL << HW_PAGE_SHIFT; =20 - for (i =3D 0; i < gstage_pgd_levels; i++) { + if (page_size =3D=3D PAGE_SIZE) { + *out_level =3D 0; + return 0; + } + for (i =3D 1; i < gstage_pgd_levels; i++) { if (page_size =3D=3D (psz << (i * gstage_index_bits))) { *out_level =3D i; return 0; @@ -77,7 +95,11 @@ static int gstage_level_to_page_order(u32 level, unsigne= d long *out_pgorder) if (gstage_pgd_levels < level) return -EINVAL; =20 - *out_pgorder =3D 12 + (level * gstage_index_bits); + if (level =3D=3D 0) + *out_pgorder =3D PAGE_SHIFT; + else + *out_pgorder =3D HW_PAGE_SHIFT + (level * gstage_index_bits); + return 0; } =20 @@ -95,30 +117,40 @@ static int gstage_level_to_page_size(u32 level, unsign= ed long *out_pgsize) } =20 static bool gstage_get_leaf_entry(struct kvm *kvm, gpa_t addr, - pte_t **ptepp, u32 *ptep_level) + void **ptepp, u32 *ptep_level) { - pte_t *ptep; + pmd_t *pmdp =3D NULL; + pte_t *ptep =3D NULL; u32 current_level =3D gstage_pgd_levels - 1; =20 *ptep_level =3D current_level; - ptep =3D (pte_t *)kvm->arch.pgd; - ptep =3D &ptep[gstage_pte_index(addr, current_level)]; - while (ptep && pte_val(*ptep)) { - if (gstage_pte_leaf(ptep)) { + pmdp =3D (pmd_t *)kvm->arch.pgd; + pmdp =3D &pmdp[gstage_pmd_index(addr, current_level)]; + while (current_level && pmdp && pmd_val(pmdp_get(pmdp))) { + if (gstage_pmd_leaf(pmdp)) { *ptep_level =3D current_level; - *ptepp =3D ptep; + *ptepp =3D (void *)pmdp; return true; } =20 + current_level--; + *ptep_level =3D current_level; + pmdp =3D (pmd_t *)gstage_pmd_page_vaddr(pmdp_get(pmdp)); if (current_level) { - current_level--; - *ptep_level =3D current_level; - ptep =3D (pte_t *)gstage_pte_page_vaddr(*ptep); - ptep =3D &ptep[gstage_pte_index(addr, current_level)]; + pmdp =3D &pmdp[gstage_pmd_index(addr, current_level)]; } else { - ptep =3D NULL; + ptep =3D (pte_t *)pmdp; + ptep =3D &ptep[gstage_pte_index(addr, current_level)]; } } + if (ptep && pte_val(ptep_get(ptep))) { + if (gstage_pte_leaf(ptep)) { + *ptep_level =3D current_level; + *ptepp =3D (void *)ptep; + return true; + } + ptep =3D NULL; + } =20 return false; } @@ -136,40 +168,53 @@ static void gstage_remote_tlb_flush(struct kvm *kvm, = u32 level, gpa_t addr) =20 static int gstage_set_pte(struct kvm *kvm, u32 level, struct kvm_mmu_memory_cache *pcache, - gpa_t addr, const pte_t *new_pte) + gpa_t addr, const void *new_pte) { u32 current_level =3D gstage_pgd_levels - 1; - pte_t *next_ptep =3D (pte_t *)kvm->arch.pgd; - pte_t *ptep =3D &next_ptep[gstage_pte_index(addr, current_level)]; + pmd_t *next_pmdp =3D (pmd_t *)kvm->arch.pgd; + pmd_t *pmdp =3D &next_pmdp[gstage_pmd_index(addr, current_level)]; + pte_t *next_ptep =3D NULL; + pte_t *ptep =3D NULL; =20 if (current_level < level) return -EINVAL; =20 while (current_level !=3D level) { - if (gstage_pte_leaf(ptep)) + if (gstage_pmd_leaf(pmdp)) return -EEXIST; =20 - if (!pte_val(*ptep)) { + if (!pmd_val(pmdp_get(pmdp))) { if (!pcache) return -ENOMEM; - next_ptep =3D kvm_mmu_memory_cache_alloc(pcache); - if (!next_ptep) + next_pmdp =3D kvm_mmu_memory_cache_alloc(pcache); + if (!next_pmdp) return -ENOMEM; - *ptep =3D pfn_pte(PFN_DOWN(__pa(next_ptep)), - __pgprot(_PAGE_TABLE)); + set_pmd(pmdp, pfn_pmd(PFN_DOWN(__pa(next_pmdp)), + __pgprot(_PAGE_TABLE))); } else { - if (gstage_pte_leaf(ptep)) + if (gstage_pmd_leaf(pmdp)) return -EEXIST; - next_ptep =3D (pte_t *)gstage_pte_page_vaddr(*ptep); + next_pmdp =3D (pmd_t *)gstage_pmd_page_vaddr(pmdp_get(pmdp)); } =20 current_level--; - ptep =3D &next_ptep[gstage_pte_index(addr, current_level)]; + if (current_level) { + pmdp =3D &next_pmdp[gstage_pmd_index(addr, current_level)]; + } else { + next_ptep =3D (pte_t *)next_pmdp; + ptep =3D &next_ptep[gstage_pte_index(addr, current_level)]; + } } =20 - *ptep =3D *new_pte; - if (gstage_pte_leaf(ptep)) - gstage_remote_tlb_flush(kvm, current_level, addr); + if (current_level) { + set_pmd(pmdp, pmdp_get((pmd_t *)new_pte)); + if (gstage_pmd_leaf(pmdp)) + gstage_remote_tlb_flush(kvm, current_level, addr); + } else { + set_pte(ptep, ptep_get((pte_t *)new_pte)); + if (gstage_pte_leaf(ptep)) + gstage_remote_tlb_flush(kvm, current_level, addr); + } =20 return 0; } @@ -182,6 +227,7 @@ static int gstage_map_page(struct kvm *kvm, { int ret; u32 level =3D 0; + pmd_t new_pmd; pte_t new_pte; pgprot_t prot; =20 @@ -213,10 +259,15 @@ static int gstage_map_page(struct kvm *kvm, else prot =3D PAGE_WRITE; } - new_pte =3D pfn_pte(PFN_DOWN(hpa), prot); - new_pte =3D pte_mkdirty(new_pte); - - return gstage_set_pte(kvm, level, pcache, gpa, &new_pte); + if (level) { + new_pmd =3D pfn_pmd(PFN_DOWN(hpa), prot); + new_pmd =3D pmd_mkdirty(new_pmd); + return gstage_set_pte(kvm, level, pcache, gpa, &new_pmd); + } else { + new_pte =3D pfn_pte(PFN_DOWN(hpa), prot); + new_pte =3D pte_mkdirty(new_pte); + return gstage_set_pte(kvm, level, pcache, gpa, &new_pte); + } } =20 enum gstage_op { @@ -226,9 +277,12 @@ enum gstage_op { }; =20 static void gstage_op_pte(struct kvm *kvm, gpa_t addr, - pte_t *ptep, u32 ptep_level, enum gstage_op op) + void *__ptep, u32 ptep_level, enum gstage_op op) { int i, ret; + pmd_t *pmdp =3D (pmd_t *)__ptep; + pte_t *ptep =3D (pte_t *)__ptep; + pmd_t *next_pmdp; pte_t *next_ptep; u32 next_ptep_level; unsigned long next_page_size, page_size; @@ -239,11 +293,13 @@ static void gstage_op_pte(struct kvm *kvm, gpa_t addr, =20 BUG_ON(addr & (page_size - 1)); =20 - if (!pte_val(*ptep)) + if (ptep_level && !pmd_val(pmdp_get(pmdp))) + return; + if (!ptep_level && !pte_val(ptep_get(ptep))) return; =20 - if (ptep_level && !gstage_pte_leaf(ptep)) { - next_ptep =3D (pte_t *)gstage_pte_page_vaddr(*ptep); + if (ptep_level && !gstage_pmd_leaf(pmdp)) { + next_pmdp =3D (pmd_t *)gstage_pmd_page_vaddr(pmdp_get(pmdp)); next_ptep_level =3D ptep_level - 1; ret =3D gstage_level_to_page_size(next_ptep_level, &next_page_size); @@ -251,17 +307,33 @@ static void gstage_op_pte(struct kvm *kvm, gpa_t addr, return; =20 if (op =3D=3D GSTAGE_OP_CLEAR) - set_pte(ptep, __pte(0)); - for (i =3D 0; i < PTRS_PER_PTE; i++) - gstage_op_pte(kvm, addr + i * next_page_size, - &next_ptep[i], next_ptep_level, op); + set_pmd(pmdp, __pmd(0)); + if (next_ptep_level) { + for (i =3D 0; i < PTRS_PER_PMD; i++) + gstage_op_pte(kvm, addr + i * next_page_size, + &next_pmdp[i], next_ptep_level, op); + } else { + next_ptep =3D (pte_t *)next_pmdp; + for (i =3D 0; i < PTRS_PER_PTE; i++) + gstage_op_pte(kvm, addr + i * next_page_size, + &next_ptep[i], next_ptep_level, op); + } if (op =3D=3D GSTAGE_OP_CLEAR) - put_page(virt_to_page(next_ptep)); + put_page(virt_to_page(next_pmdp)); } else { - if (op =3D=3D GSTAGE_OP_CLEAR) - set_pte(ptep, __pte(0)); - else if (op =3D=3D GSTAGE_OP_WP) - set_pte(ptep, __pte(pte_val(*ptep) & ~_PAGE_WRITE)); + if (ptep_level) { + if (op =3D=3D GSTAGE_OP_CLEAR) + set_pmd(pmdp, __pmd(0)); + else if (op =3D=3D GSTAGE_OP_WP) + set_pmd(pmdp, + __pmd(pmd_val(pmdp_get(pmdp)) & ~_PAGE_WRITE)); + } else { + if (op =3D=3D GSTAGE_OP_CLEAR) + set_pte(ptep, __pte(0)); + else if (op =3D=3D GSTAGE_OP_WP) + set_pte(ptep, + __pte(pte_val(ptep_get(ptep)) & ~_PAGE_WRITE)); + } gstage_remote_tlb_flush(kvm, ptep_level, addr); } } @@ -270,7 +342,7 @@ static void gstage_unmap_range(struct kvm *kvm, gpa_t s= tart, gpa_t size, bool may_block) { int ret; - pte_t *ptep; + void *ptep; u32 ptep_level; bool found_leaf; unsigned long page_size; @@ -305,7 +377,7 @@ static void gstage_unmap_range(struct kvm *kvm, gpa_t s= tart, static void gstage_wp_range(struct kvm *kvm, gpa_t start, gpa_t end) { int ret; - pte_t *ptep; + void *ptep; u32 ptep_level; bool found_leaf; gpa_t addr =3D start; @@ -572,7 +644,7 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_r= ange *range) =20 bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { - pte_t *ptep; + void *ptep; u32 ptep_level =3D 0; u64 size =3D (range->end - range->start) << PAGE_SHIFT; =20 @@ -585,12 +657,15 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_rang= e *range) &ptep, &ptep_level)) return false; =20 - return ptep_test_and_clear_young(NULL, 0, ptep); + if (ptep_level) + return pmdp_test_and_clear_young(NULL, 0, (pmd_t *)ptep); + else + return ptep_test_and_clear_young(NULL, 0, (pte_t *)ptep); } =20 bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { - pte_t *ptep; + void *ptep; u32 ptep_level =3D 0; u64 size =3D (range->end - range->start) << PAGE_SHIFT; =20 @@ -603,7 +678,10 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_= range *range) &ptep, &ptep_level)) return false; =20 - return pte_young(*ptep); + if (ptep_level) + return pmd_young(pmdp_get((pmd_t *)ptep)); + else + return pte_young(ptep_get((pte_t *)ptep)); } =20 int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu, @@ -746,11 +824,11 @@ void kvm_riscv_gstage_free_pgd(struct kvm *kvm) =20 void kvm_riscv_gstage_update_hgatp(struct kvm_vcpu *vcpu) { - unsigned long hgatp =3D gstage_mode; + unsigned long hgatp; struct kvm_arch *k =3D &vcpu->kvm->arch; =20 - hgatp |=3D (READ_ONCE(k->vmid.vmid) << HGATP_VMID_SHIFT) & HGATP_VMID; - hgatp |=3D (k->pgd_phys >> PAGE_SHIFT) & HGATP_PPN; + hgatp =3D make_hgatp(PFN_DOWN(k->pgd_phys), READ_ONCE(k->vmid.vmid), + gstage_mode); =20 csr_write(CSR_HGATP, hgatp); =20 --=20 2.20.1 From nobody Mon Dec 29 20:12:48 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 138A3C5AD4C for ; Thu, 23 Nov 2023 06:59:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344839AbjKWG67 (ORCPT ); Thu, 23 Nov 2023 01:58:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59456 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344835AbjKWG6i (ORCPT ); Thu, 23 Nov 2023 01:58:38 -0500 Received: from mail-ot1-x331.google.com (mail-ot1-x331.google.com [IPv6:2607:f8b0:4864:20::331]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 194CA10F3 for ; Wed, 22 Nov 2023 22:58:26 -0800 (PST) Received: by mail-ot1-x331.google.com with SMTP id 46e09a7af769-6cd1918afb2so354864a34.0 for ; Wed, 22 Nov 2023 22:58:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1700722706; x=1701327506; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=cXVVh5tHQq6H+NSQA4JeCtH/rNgHUAw+NLKBVun1FJI=; b=VGhRI+k+plxBcZ9NjAKo1Pa3AIXjp4IxipMYWF86Cr1uZn/dpu4vysm4ntKJmUrb4f r5QVTvbRzpcSQvr7YjHtGeuh7fqQtzESqpSZrkVM1NAEliMIvNRAQWMLlVzciDPc8Smo 2v9kjpRMQLicJtG3n1vtSoaZCsXN7WFJ9RTbzy8xLc4UYxZ+VqgPzbOSNRPB155/pnpH PotifhDVZIfwQg0l0S+fj8beoF+gnHtjMTxWNKRXU8Gdf1VQXCYnAX5XfeIjZeoMBuGz Db8Kc1KQF4/q19T3HJUFh7qiyC2906RKI32JEef5KN7nr6Rt7N7ENKul79nQ7UWYEA0I +bFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700722706; x=1701327506; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cXVVh5tHQq6H+NSQA4JeCtH/rNgHUAw+NLKBVun1FJI=; b=NixS9RyIa7+uavdw8yTGMunHoZBvYRee722TUJFjP2VSNhu1jZ6a1xFydrqD+mdf6c lL40wbK0wL0ZlvogaF+PIrdZdtf1rdIA2LG5h6DGFx8KfAuzx7gi6QTCHftI/YLavL2q P1/NM1ABoSVhjJae/yCql32d/CYIQFUECtzGZlTXsB6PeovfVnGZrEacL4Zx7h8/yB3c sWFPQTOF8oUaKtgWFSccHiEo/0XShM8FmdDaX1P0Q0SM0aX3TAqDDV6UVFYvGXuaUAgf oVpo7bsGL4Iv5+TwAlu9CEZM4B6wJuEzC5P3CtXbuUWcGQP3evUyLcTk3XRmVksaN3Tc 9xhw== X-Gm-Message-State: AOJu0YybQizTxVtC+2C8GAgpc4el4IdiubDSlO9iQPcX4b24Iu8K5T8/ ExfMwJI0/vZpxCjeLAbr4SLsIg== X-Google-Smtp-Source: AGHT+IHjXbEiIjdrtzBQes9pWsmlDe44Anh7ZX4igq4dgVqzJwwbqJ+5ff/GxnQby0i2Dm9GfBQciw== X-Received: by 2002:a9d:620d:0:b0:6cd:8c3:5b40 with SMTP id g13-20020a9d620d000000b006cd08c35b40mr4649798otj.36.1700722705824; Wed, 22 Nov 2023 22:58:25 -0800 (PST) Received: from J9GPGXL7NT.bytedance.net ([139.177.225.230]) by smtp.gmail.com with ESMTPSA id w37-20020a634765000000b005bd2b3a03eesm615437pgk.6.2023.11.22.22.58.20 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 22 Nov 2023 22:58:25 -0800 (PST) From: Xu Lu To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, ardb@kernel.org, anup@brainfault.org, atishp@atishpatra.org Cc: dengliang.1214@bytedance.com, xieyongji@bytedance.com, lihangjing@bytedance.com, songmuchun@bytedance.com, punit.agrawal@bytedance.com, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, Xu Lu Subject: [RFC PATCH V1 11/11] riscv: Introduce 64K page size Date: Thu, 23 Nov 2023 14:57:08 +0800 Message-Id: <20231123065708.91345-12-luxu.kernel@bytedance.com> X-Mailer: git-send-email 2.39.3 (Apple Git-145) In-Reply-To: <20231123065708.91345-1-luxu.kernel@bytedance.com> References: <20231123065708.91345-1-luxu.kernel@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This patch introduces new config to control whether enabling the 64K base page feature on RISC-V. Signed-off-by: Xu Lu --- arch/Kconfig | 1 + arch/riscv/Kconfig | 20 ++++++++++++++++++++ 2 files changed, 21 insertions(+) diff --git a/arch/Kconfig b/arch/Kconfig index f4b210ab0612..66f64450d409 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -1087,6 +1087,7 @@ config HAVE_ARCH_COMPAT_MMAP_BASES =20 config PAGE_SIZE_LESS_THAN_64KB def_bool y + depends on !RISCV_64K_PAGES depends on !ARM64_64K_PAGES depends on !PAGE_SIZE_64KB depends on !PARISC_PAGE_SIZE_64KB diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 105cbb3ca797..d561f9f7f9b4 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -227,6 +227,7 @@ config RISCV_HW_PAGE_SHIFT =20 config RISCV_PAGE_SHIFT int + default 16 if RISCV_64K_PAGES default 12 =20 config KASAN_SHADOW_OFFSET @@ -692,6 +693,25 @@ config RISCV_BOOT_SPINWAIT =20 If unsure what to do here, say N. =20 +choice + prompt "Page size" + default RISCV_4K_PAGES + help + Page size (translation granule) configuration. + +config RISCV_4K_PAGES + bool "4KB" + help + This feature enables 4KB pages support. + +config RISCV_64K_PAGES + bool "64KB" + depends on ARCH_HAS_STRICT_KERNEL_RWX && 64BIT + help + This feature enables 64KB pages support (4KB by default) + +endchoice + config ARCH_SUPPORTS_KEXEC def_bool MMU =20 --=20 2.20.1