arch/arm64/Kconfig | 1 + arch/arm64/include/asm/hugetlb.h | 22 +-- arch/arm64/include/asm/pgtable.h | 68 ++++++- arch/arm64/mm/hugetlbpage.c | 294 +--------------------------- arch/riscv/Kconfig | 1 + arch/riscv/include/asm/hugetlb.h | 36 +--- arch/riscv/include/asm/pgtable-64.h | 11 ++ arch/riscv/include/asm/pgtable.h | 222 ++++++++++++++++++--- arch/riscv/mm/hugetlbpage.c | 243 +---------------------- arch/riscv/mm/pgtable.c | 6 +- include/linux/hugetlb_contpte.h | 39 ++++ mm/Kconfig | 3 + mm/Makefile | 1 + mm/hugetlb_contpte.c | 258 ++++++++++++++++++++++++ 14 files changed, 583 insertions(+), 622 deletions(-) create mode 100644 include/linux/hugetlb_contpte.h create mode 100644 mm/hugetlb_contpte.c
This patchset intends to merge the contiguous ptes hugetlbfs implementation
of arm64 and riscv.
Both arm64 and riscv support the use of contiguous ptes to map pages that
are larger than the default page table size, respectively called contpte
and svnapot.
The riscv implementation differs from the arm64's in that the LSBs of the
pfn of a svnapot pte are used to store the size of the mapping, allowing
for future sizes to be added (for now only 64KB is supported). That's an
issue for the core mm code which expects to find the *real* pfn a pte points
to. Patch 1 fixes that by always returning svnapot ptes with the real pfn
and restores the size of the mapping when it is written to a page table.
The following patches are just merges of the 2 different implementations
that currently exist in arm64 and riscv which are very similar. It paves
the way to the reuse of the recent contpte THP work by Ryan [1] to avoid
reimplementing the same in riscv.
This patchset was tested by running the libhugetlbfs testsuite with 64KB
and 2MB pages on both architectures (on a 4KB base page size arm64 kernel).
[1] https://lore.kernel.org/linux-arm-kernel/20240215103205.2607016-1-ryan.roberts@arm.com/
v4: https://lore.kernel.org/linux-riscv/20250127093530.19548-1-alexghiti@rivosinc.com/
v3: https://lore.kernel.org/all/20240802151430.99114-1-alexghiti@rivosinc.com/
v2: https://lore.kernel.org/linux-riscv/20240508113419.18620-1-alexghiti@rivosinc.com/
v1: https://lore.kernel.org/linux-riscv/20240301091455.246686-1-alexghiti@rivosinc.com/
Changes in v5:
- Fix "int i" unused variable in patch 2 (as reported by PW)
- Fix !svnapot build
- Fix arch_make_huge_pte() which returned a real napot pte
- Make __ptep_get(), ptep_get_and_clear() and __set_ptes() napot aware to
avoid leaking real napot pfns to core mm
- Fix arch_contpte_get_num_contig() that used to always try to get the
mapping size from the ptep, which does not work if the ptep comes the core mm
- Rebase on top of 6.14-rc7 + fix for
huge_ptep_get_and_clear()/huge_pte_clear()
https://lore.kernel.org/linux-riscv/20250317072551.572169-1-alexghiti@rivosinc.com/
Changes in v4:
- Rebase on top of 6.13
Changes in v3:
- Split set_ptes and ptep_get into internal and external API (Ryan)
- Rename ARCH_HAS_CONTPTE into ARCH_WANT_GENERAL_HUGETLB_CONTPTE so that
we split hugetlb functions from contpte functions (actually riscv contpte
functions to support THP will come into another series) (Ryan)
- Rebase on top of 6.11-rc1
Changes in v2:
- Rebase on top of 6.9-rc3
Alexandre Ghiti (9):
riscv: Safely remove huge_pte_offset() when manipulating NAPOT ptes
riscv: Restore the pfn in a NAPOT pte when manipulated by core mm code
mm: Use common huge_ptep_get() function for riscv/arm64
mm: Use common set_huge_pte_at() function for riscv/arm64
mm: Use common huge_pte_clear() function for riscv/arm64
mm: Use common huge_ptep_get_and_clear() function for riscv/arm64
mm: Use common huge_ptep_set_access_flags() function for riscv/arm64
mm: Use common huge_ptep_set_wrprotect() function for riscv/arm64
mm: Use common huge_ptep_clear_flush() function for riscv/arm64
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/hugetlb.h | 22 +--
arch/arm64/include/asm/pgtable.h | 68 ++++++-
arch/arm64/mm/hugetlbpage.c | 294 +---------------------------
arch/riscv/Kconfig | 1 +
arch/riscv/include/asm/hugetlb.h | 36 +---
arch/riscv/include/asm/pgtable-64.h | 11 ++
arch/riscv/include/asm/pgtable.h | 222 ++++++++++++++++++---
arch/riscv/mm/hugetlbpage.c | 243 +----------------------
arch/riscv/mm/pgtable.c | 6 +-
include/linux/hugetlb_contpte.h | 39 ++++
mm/Kconfig | 3 +
mm/Makefile | 1 +
mm/hugetlb_contpte.c | 258 ++++++++++++++++++++++++
14 files changed, 583 insertions(+), 622 deletions(-)
create mode 100644 include/linux/hugetlb_contpte.h
create mode 100644 mm/hugetlb_contpte.c
--
2.39.2
Can someone from arm64 review this? I think it's preferable to share the same implementation between riscv and arm64. The end goal is the support of mTHP using svnapot on riscv, which we want soon, so if that patchset does not gain any traction, I'll just copy/paste the arm64 implementation into riscv. Thanks, Alex On 21/03/2025 14:06, Alexandre Ghiti wrote: > This patchset intends to merge the contiguous ptes hugetlbfs implementation > of arm64 and riscv. > > Both arm64 and riscv support the use of contiguous ptes to map pages that > are larger than the default page table size, respectively called contpte > and svnapot. > > The riscv implementation differs from the arm64's in that the LSBs of the > pfn of a svnapot pte are used to store the size of the mapping, allowing > for future sizes to be added (for now only 64KB is supported). That's an > issue for the core mm code which expects to find the *real* pfn a pte points > to. Patch 1 fixes that by always returning svnapot ptes with the real pfn > and restores the size of the mapping when it is written to a page table. > > The following patches are just merges of the 2 different implementations > that currently exist in arm64 and riscv which are very similar. It paves > the way to the reuse of the recent contpte THP work by Ryan [1] to avoid > reimplementing the same in riscv. > > This patchset was tested by running the libhugetlbfs testsuite with 64KB > and 2MB pages on both architectures (on a 4KB base page size arm64 kernel). > > [1] https://lore.kernel.org/linux-arm-kernel/20240215103205.2607016-1-ryan.roberts@arm.com/ > > v4: https://lore.kernel.org/linux-riscv/20250127093530.19548-1-alexghiti@rivosinc.com/ > v3: https://lore.kernel.org/all/20240802151430.99114-1-alexghiti@rivosinc.com/ > v2: https://lore.kernel.org/linux-riscv/20240508113419.18620-1-alexghiti@rivosinc.com/ > v1: https://lore.kernel.org/linux-riscv/20240301091455.246686-1-alexghiti@rivosinc.com/ > > Changes in v5: > - Fix "int i" unused variable in patch 2 (as reported by PW) > - Fix !svnapot build > - Fix arch_make_huge_pte() which returned a real napot pte > - Make __ptep_get(), ptep_get_and_clear() and __set_ptes() napot aware to > avoid leaking real napot pfns to core mm > - Fix arch_contpte_get_num_contig() that used to always try to get the > mapping size from the ptep, which does not work if the ptep comes the core mm > - Rebase on top of 6.14-rc7 + fix for > huge_ptep_get_and_clear()/huge_pte_clear() > https://lore.kernel.org/linux-riscv/20250317072551.572169-1-alexghiti@rivosinc.com/ > > Changes in v4: > - Rebase on top of 6.13 > > Changes in v3: > - Split set_ptes and ptep_get into internal and external API (Ryan) > - Rename ARCH_HAS_CONTPTE into ARCH_WANT_GENERAL_HUGETLB_CONTPTE so that > we split hugetlb functions from contpte functions (actually riscv contpte > functions to support THP will come into another series) (Ryan) > - Rebase on top of 6.11-rc1 > > Changes in v2: > - Rebase on top of 6.9-rc3 > > Alexandre Ghiti (9): > riscv: Safely remove huge_pte_offset() when manipulating NAPOT ptes > riscv: Restore the pfn in a NAPOT pte when manipulated by core mm code > mm: Use common huge_ptep_get() function for riscv/arm64 > mm: Use common set_huge_pte_at() function for riscv/arm64 > mm: Use common huge_pte_clear() function for riscv/arm64 > mm: Use common huge_ptep_get_and_clear() function for riscv/arm64 > mm: Use common huge_ptep_set_access_flags() function for riscv/arm64 > mm: Use common huge_ptep_set_wrprotect() function for riscv/arm64 > mm: Use common huge_ptep_clear_flush() function for riscv/arm64 > > arch/arm64/Kconfig | 1 + > arch/arm64/include/asm/hugetlb.h | 22 +-- > arch/arm64/include/asm/pgtable.h | 68 ++++++- > arch/arm64/mm/hugetlbpage.c | 294 +--------------------------- > arch/riscv/Kconfig | 1 + > arch/riscv/include/asm/hugetlb.h | 36 +--- > arch/riscv/include/asm/pgtable-64.h | 11 ++ > arch/riscv/include/asm/pgtable.h | 222 ++++++++++++++++++--- > arch/riscv/mm/hugetlbpage.c | 243 +---------------------- > arch/riscv/mm/pgtable.c | 6 +- > include/linux/hugetlb_contpte.h | 39 ++++ > mm/Kconfig | 3 + > mm/Makefile | 1 + > mm/hugetlb_contpte.c | 258 ++++++++++++++++++++++++ > 14 files changed, 583 insertions(+), 622 deletions(-) > create mode 100644 include/linux/hugetlb_contpte.h > create mode 100644 mm/hugetlb_contpte.c >
Hi Alexandre, On 07/04/2025 13:04, Alexandre Ghiti wrote: > Can someone from arm64 review this? I think it's preferable to share the same > implementation between riscv and arm64. I've been thinking about this for a while and had some conversations internally. This patchset has both pros and cons. In the pros column, it increases code reuse in an area that has had quite of few bugs popping up lately; so this would bring more eyes and hopefully higher quality in the long run. But in the cons column, we have seen HW errata in similar areas in the past and I'm nervous that by hoisting this code to mm, we make it harder to workaround any future errata. Additionally I can imagine that this change could make it harder to support future Arm architecture enhancements. I appreciate the cons are not strong *technical* arguments but nevertheless they are winning out in this case; My opinion is that we should keep the arm64 implementations of huge_pte_ (and contpte_ too - I know you have a separate series for this) private to arm64. Sorry about that. > > The end goal is the support of mTHP using svnapot on riscv, which we want soon, > so if that patchset does not gain any traction, I'll just copy/paste the arm64 > implementation into riscv. This copy/paste approach would be my preference. Thanks, Ryan > > Thanks, > > Alex > > On 21/03/2025 14:06, Alexandre Ghiti wrote: >> This patchset intends to merge the contiguous ptes hugetlbfs implementation >> of arm64 and riscv. >> >> Both arm64 and riscv support the use of contiguous ptes to map pages that >> are larger than the default page table size, respectively called contpte >> and svnapot. >> >> The riscv implementation differs from the arm64's in that the LSBs of the >> pfn of a svnapot pte are used to store the size of the mapping, allowing >> for future sizes to be added (for now only 64KB is supported). That's an >> issue for the core mm code which expects to find the *real* pfn a pte points >> to. Patch 1 fixes that by always returning svnapot ptes with the real pfn >> and restores the size of the mapping when it is written to a page table. >> >> The following patches are just merges of the 2 different implementations >> that currently exist in arm64 and riscv which are very similar. It paves >> the way to the reuse of the recent contpte THP work by Ryan [1] to avoid >> reimplementing the same in riscv. >> >> This patchset was tested by running the libhugetlbfs testsuite with 64KB >> and 2MB pages on both architectures (on a 4KB base page size arm64 kernel). >> >> [1] https://lore.kernel.org/linux-arm-kernel/20240215103205.2607016-1- >> ryan.roberts@arm.com/ >> >> v4: https://lore.kernel.org/linux-riscv/20250127093530.19548-1- >> alexghiti@rivosinc.com/ >> v3: https://lore.kernel.org/all/20240802151430.99114-1-alexghiti@rivosinc.com/ >> v2: https://lore.kernel.org/linux-riscv/20240508113419.18620-1- >> alexghiti@rivosinc.com/ >> v1: https://lore.kernel.org/linux-riscv/20240301091455.246686-1- >> alexghiti@rivosinc.com/ >> >> Changes in v5: >> - Fix "int i" unused variable in patch 2 (as reported by PW) >> - Fix !svnapot build >> - Fix arch_make_huge_pte() which returned a real napot pte >> - Make __ptep_get(), ptep_get_and_clear() and __set_ptes() napot aware to >> avoid leaking real napot pfns to core mm >> - Fix arch_contpte_get_num_contig() that used to always try to get the >> mapping size from the ptep, which does not work if the ptep comes the >> core mm >> - Rebase on top of 6.14-rc7 + fix for >> huge_ptep_get_and_clear()/huge_pte_clear() >> https://lore.kernel.org/linux-riscv/20250317072551.572169-1- >> alexghiti@rivosinc.com/ >> >> Changes in v4: >> - Rebase on top of 6.13 >> >> Changes in v3: >> - Split set_ptes and ptep_get into internal and external API (Ryan) >> - Rename ARCH_HAS_CONTPTE into ARCH_WANT_GENERAL_HUGETLB_CONTPTE so that >> we split hugetlb functions from contpte functions (actually riscv contpte >> functions to support THP will come into another series) (Ryan) >> - Rebase on top of 6.11-rc1 >> >> Changes in v2: >> - Rebase on top of 6.9-rc3 >> >> Alexandre Ghiti (9): >> riscv: Safely remove huge_pte_offset() when manipulating NAPOT ptes >> riscv: Restore the pfn in a NAPOT pte when manipulated by core mm code >> mm: Use common huge_ptep_get() function for riscv/arm64 >> mm: Use common set_huge_pte_at() function for riscv/arm64 >> mm: Use common huge_pte_clear() function for riscv/arm64 >> mm: Use common huge_ptep_get_and_clear() function for riscv/arm64 >> mm: Use common huge_ptep_set_access_flags() function for riscv/arm64 >> mm: Use common huge_ptep_set_wrprotect() function for riscv/arm64 >> mm: Use common huge_ptep_clear_flush() function for riscv/arm64 >> >> arch/arm64/Kconfig | 1 + >> arch/arm64/include/asm/hugetlb.h | 22 +-- >> arch/arm64/include/asm/pgtable.h | 68 ++++++- >> arch/arm64/mm/hugetlbpage.c | 294 +--------------------------- >> arch/riscv/Kconfig | 1 + >> arch/riscv/include/asm/hugetlb.h | 36 +--- >> arch/riscv/include/asm/pgtable-64.h | 11 ++ >> arch/riscv/include/asm/pgtable.h | 222 ++++++++++++++++++--- >> arch/riscv/mm/hugetlbpage.c | 243 +---------------------- >> arch/riscv/mm/pgtable.c | 6 +- >> include/linux/hugetlb_contpte.h | 39 ++++ >> mm/Kconfig | 3 + >> mm/Makefile | 1 + >> mm/hugetlb_contpte.c | 258 ++++++++++++++++++++++++ >> 14 files changed, 583 insertions(+), 622 deletions(-) >> create mode 100644 include/linux/hugetlb_contpte.h >> create mode 100644 mm/hugetlb_contpte.c >> > > From mboxrd@z Thu Jan 1 00:00:00 1970 > Return-Path: <linux-riscv-bounces+linux- > riscv=archiver.kernel.org@lists.infradead.org> > X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on > aws-us-west-2-korg-lkml-1.web.codeaurora.org > Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) > (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) > (No client certificate requested) > by smtp.lore.kernel.org (Postfix) with ESMTPS id A4D94C3601E > for <linux-riscv@archiver.kernel.org>; Mon, 7 Apr 2025 12:35:59 +0000 (UTC) > DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; > d=lists.infradead.org; s=bombadil.20210309; h=Sender:Content-Type: > Content-Transfer-Encoding:List-Subscribe:List-Help:List-Post:List-Archive: > List-Unsubscribe:List-Id:In-Reply-To:From:References:To:Subject:MIME-Version: > Date:Message-ID:Reply-To:Cc:Content-ID:Content-Description:Resent-Date: > Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; > bh=QGtw44ZccGhXZHG0gus8jo8nditsIsPYxbfRUYIB+hU=; b=TuC4N8bBiqSCZqINAlCMfr1aa0 > HKCtL5AM0VsHJ36rTV1TZCiAN0tKuI4mbGKMbrvNUcKXaa0IaZGgplHJXZPCwfiRmK51dvr1ndwc+ > x4+UfoK5lEB2HNBzTjcA9nH164vMm8lu0bitMWB+QzfpYT0nprO+11bFlBPqZVI35bwer5bTytL/w > 2PtmHktDSGJXgSCnDKefpnBo+yiIKU2uq7dhR713fLa1hzLYi5f0+2trqJXfZ5ADJSOBaZc6h2RQo > Hfb0DRyNJsiBjuBYn3H1+RCnv6lZwV1eVbltqj1BIjrb0C32Zmnb7FxqUYECyH4vEhWbmYgbwpAKI > 8BYmZxbA==; > Received: from localhost ([::1] helo=bombadil.infradead.org) > by bombadil.infradead.org with esmtp (Exim 4.98.1 #2 (Red Hat Linux)) > id 1u1lhh-00000000H0X-3INP; > Mon, 07 Apr 2025 12:35:53 +0000 > Received: from relay2-d.mail.gandi.net ([2001:4b98:dc4:8::222]) > by bombadil.infradead.org with esmtps (Exim 4.98.1 #2 (Red Hat Linux)) > id 1u1lDQ-000000009MS-3LfF; > Mon, 07 Apr 2025 12:04:39 +0000 > Received: by mail.gandi.net (Postfix) with ESMTPSA id E350243163; > Mon, 7 Apr 2025 12:04:28 +0000 (UTC) > Message-ID: <4dd5d187-f977-4f27-9937-8608991797b5@ghiti.fr> > Date: Mon, 7 Apr 2025 14:04:27 +0200 > MIME-Version: 1.0 > User-Agent: Mozilla Thunderbird > Subject: Re: [PATCH v5 0/9] Merge arm64/riscv hugetlbfs contpte support > Content-Language: en-US > To: Alexandre Ghiti <alexghiti@rivosinc.com>, > Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>, > Ryan Roberts <ryan.roberts@arm.com>, Mark Rutland <mark.rutland@arm.com>, > Matthew Wilcox <willy@infradead.org>, > Paul Walmsley <paul.walmsley@sifive.com>, Palmer Dabbelt > <palmer@dabbelt.com>, Andrew Morton <akpm@linux-foundation.org>, > linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, > linux-riscv@lists.infradead.org, linux-mm@kvack.org > References: <20250321130635.227011-1-alexghiti@rivosinc.com> > From: Alexandre Ghiti <alex@ghiti.fr> > In-Reply-To: <20250321130635.227011-1-alexghiti@rivosinc.com> > X-GND-State: clean > X-GND-Score: -100 > X-GND-Cause: > gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvtddtudegucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuifetpfffkfdpucggtfgfnhhsuhgsshgtrhhisggvnecuuegrihhlohhuthemuceftddunecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjughrpefkffggfgfuvfhfhfgjtgfgsehtjeertddtvdejnecuhfhrohhmpeetlhgvgigrnhgurhgvucfihhhithhiuceorghlvgigsehghhhithhirdhfrheqnecuggftrfgrthhtvghrnhepveetvdfhvdeuheekvdettdegheetgeejiefgjeetvedtfeeuvddvtefhjeffgeevnecuffhomhgrihhnpehkvghrnhgvlhdrohhrghenucfkphepvddttddumeekiedumeeffeekvdemvghfledtmeeisgemvdefgeekmeduvgduheemvdgrgeehnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehinhgvthepvddttddumeekiedumeeffeekvdemvghfledtmeeisgemvdefgeekmeduvgduheemvdgrgeehpdhhvghloheplgfkrfggieemvddttddumeekiedumeeffeekvdemvghfledtmeeisgemvdefgeekmeduvgduheemvdgrgeehngdpmhgrihhlfhhrohhmpegrlhgvgiesghhhihhtihdrfhhrpdhnsggprhgtphhtthhopedufedprhgtphhtthhopegrlhgvgihghhhithhisehrihhvohhsihhntgdrtghomhdprhgtphhtthhopegtrghtrghlihhnrdhmrghrihhnrghssegrrhhmrdgtohhmpdhrtghpthhtohepfihilhhls > ehkvghrnhgvlhdrohhrghdprhgtphhtthhopehrhigrnhdrrhhosggvrhhtshesrghrmhdrtghomhdprhgtphhtthhopehmrghrkhdrrhhuthhlrghnugesrghrmhdrtghomhdprhgtphhtthhopeifihhllhihsehinhhfrhgruggvrggurdhorhhgpdhrtghpthhtohepphgruhhlrdifrghlmhhslhgvhiesshhifhhivhgvrdgtohhmpdhrtghpthhtohepphgrlhhmvghrsegurggssggvlhhtrdgtohhm > X-GND-Sasl: alex@ghiti.fr > X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X- > CRM114-CacheID: sfid-20250407_050436_994014_8B16F654 X-CRM114-Status: GOOD ( > 23.24 ) > X-BeenThere: linux-riscv@lists.infradead.org > X-Mailman-Version: 2.1.34 > Precedence: list > List-Id: <linux-riscv.lists.infradead.org> > List-Unsubscribe: <http://lists.infradead.org/mailman/options/linux-riscv>, > <mailto:linux-riscv-request@lists.infradead.org?subject=unsubscribe> > List-Archive: <http://lists.infradead.org/pipermail/linux-riscv/> > List-Post: <mailto:linux-riscv@lists.infradead.org> > List-Help: <mailto:linux-riscv-request@lists.infradead.org?subject=help> > List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-riscv>, > <mailto:linux-riscv-request@lists.infradead.org?subject=subscribe> > Content-Transfer-Encoding: 7bit > Content-Type: text/plain; charset="us-ascii"; Format="flowed" > Sender: "linux-riscv" <linux-riscv-bounces@lists.infradead.org> > Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org > > Can someone from arm64 review this? I think it's preferable to share the same > implementation between riscv and arm64. > > The end goal is the support of mTHP using svnapot on riscv, which we want soon, > so if that patchset does not gain any traction, I'll just copy/paste the arm64 > implementation into riscv. > > Thanks, > > Alex > > On 21/03/2025 14:06, Alexandre Ghiti wrote: >> This patchset intends to merge the contiguous ptes hugetlbfs implementation >> of arm64 and riscv. >> >> Both arm64 and riscv support the use of contiguous ptes to map pages that >> are larger than the default page table size, respectively called contpte >> and svnapot. >> >> The riscv implementation differs from the arm64's in that the LSBs of the >> pfn of a svnapot pte are used to store the size of the mapping, allowing >> for future sizes to be added (for now only 64KB is supported). That's an >> issue for the core mm code which expects to find the *real* pfn a pte points >> to. Patch 1 fixes that by always returning svnapot ptes with the real pfn >> and restores the size of the mapping when it is written to a page table. >> >> The following patches are just merges of the 2 different implementations >> that currently exist in arm64 and riscv which are very similar. It paves >> the way to the reuse of the recent contpte THP work by Ryan [1] to avoid >> reimplementing the same in riscv. >> >> This patchset was tested by running the libhugetlbfs testsuite with 64KB >> and 2MB pages on both architectures (on a 4KB base page size arm64 kernel). >> >> [1] https://lore.kernel.org/linux-arm-kernel/20240215103205.2607016-1- >> ryan.roberts@arm.com/ >> >> v4: https://lore.kernel.org/linux-riscv/20250127093530.19548-1- >> alexghiti@rivosinc.com/ >> v3: https://lore.kernel.org/all/20240802151430.99114-1-alexghiti@rivosinc.com/ >> v2: https://lore.kernel.org/linux-riscv/20240508113419.18620-1- >> alexghiti@rivosinc.com/ >> v1: https://lore.kernel.org/linux-riscv/20240301091455.246686-1- >> alexghiti@rivosinc.com/ >> >> Changes in v5: >> - Fix "int i" unused variable in patch 2 (as reported by PW) >> - Fix !svnapot build >> - Fix arch_make_huge_pte() which returned a real napot pte >> - Make __ptep_get(), ptep_get_and_clear() and __set_ptes() napot aware to >> avoid leaking real napot pfns to core mm >> - Fix arch_contpte_get_num_contig() that used to always try to get the >> mapping size from the ptep, which does not work if the ptep comes the >> core mm >> - Rebase on top of 6.14-rc7 + fix for >> huge_ptep_get_and_clear()/huge_pte_clear() >> https://lore.kernel.org/linux-riscv/20250317072551.572169-1- >> alexghiti@rivosinc.com/ >> >> Changes in v4: >> - Rebase on top of 6.13 >> >> Changes in v3: >> - Split set_ptes and ptep_get into internal and external API (Ryan) >> - Rename ARCH_HAS_CONTPTE into ARCH_WANT_GENERAL_HUGETLB_CONTPTE so that >> we split hugetlb functions from contpte functions (actually riscv contpte >> functions to support THP will come into another series) (Ryan) >> - Rebase on top of 6.11-rc1 >> >> Changes in v2: >> - Rebase on top of 6.9-rc3 >> >> Alexandre Ghiti (9): >> riscv: Safely remove huge_pte_offset() when manipulating NAPOT ptes >> riscv: Restore the pfn in a NAPOT pte when manipulated by core mm code >> mm: Use common huge_ptep_get() function for riscv/arm64 >> mm: Use common set_huge_pte_at() function for riscv/arm64 >> mm: Use common huge_pte_clear() function for riscv/arm64 >> mm: Use common huge_ptep_get_and_clear() function for riscv/arm64 >> mm: Use common huge_ptep_set_access_flags() function for riscv/arm64 >> mm: Use common huge_ptep_set_wrprotect() function for riscv/arm64 >> mm: Use common huge_ptep_clear_flush() function for riscv/arm64 >> >> arch/arm64/Kconfig | 1 + >> arch/arm64/include/asm/hugetlb.h | 22 +-- >> arch/arm64/include/asm/pgtable.h | 68 ++++++- >> arch/arm64/mm/hugetlbpage.c | 294 +--------------------------- >> arch/riscv/Kconfig | 1 + >> arch/riscv/include/asm/hugetlb.h | 36 +--- >> arch/riscv/include/asm/pgtable-64.h | 11 ++ >> arch/riscv/include/asm/pgtable.h | 222 ++++++++++++++++++--- >> arch/riscv/mm/hugetlbpage.c | 243 +---------------------- >> arch/riscv/mm/pgtable.c | 6 +- >> include/linux/hugetlb_contpte.h | 39 ++++ >> mm/Kconfig | 3 + >> mm/Makefile | 1 + >> mm/hugetlb_contpte.c | 258 ++++++++++++++++++++++++ >> 14 files changed, 583 insertions(+), 622 deletions(-) >> create mode 100644 include/linux/hugetlb_contpte.h >> create mode 100644 mm/hugetlb_contpte.c >> > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv >
Hi Ryan, On 29/04/2025 16:09, Ryan Roberts wrote: > Hi Alexandre, > > On 07/04/2025 13:04, Alexandre Ghiti wrote: >> Can someone from arm64 review this? I think it's preferable to share the same >> implementation between riscv and arm64. > I've been thinking about this for a while and had some conversations internally. > This patchset has both pros and cons. > > In the pros column, it increases code reuse in an area that has had quite of few > bugs popping up lately; so this would bring more eyes and hopefully higher > quality in the long run. > > But in the cons column, we have seen HW errata in similar areas in the past and > I'm nervous that by hoisting this code to mm, we make it harder to workaround > any future errata. Additionally I can imagine that this change could make it > harder to support future Arm architecture enhancements. > > I appreciate the cons are not strong *technical* arguments but nevertheless they > are winning out in this case; My opinion is that we should keep the arm64 > implementations of huge_pte_ (and contpte_ too - I know you have a separate > series for this) private to arm64. > > Sorry about that. > >> The end goal is the support of mTHP using svnapot on riscv, which we want soon, >> so if that patchset does not gain any traction, I'll just copy/paste the arm64 >> implementation into riscv. > This copy/paste approach would be my preference. I have to admit that I disagree with this approach, the riscv and arm64 implementations are *exactly* the same so it sounds weird to duplicate code, the pros you mention outweigh the cons. Unless I'm missing something about the erratas? To me, that's easily fixed by providing arch specific overrides no? Can you describe what sort of erratas would not fit then? Thanks, Alex > > Thanks, > Ryan > >> Thanks, >> >> Alex >> >> On 21/03/2025 14:06, Alexandre Ghiti wrote: >>> This patchset intends to merge the contiguous ptes hugetlbfs implementation >>> of arm64 and riscv. >>> >>> Both arm64 and riscv support the use of contiguous ptes to map pages that >>> are larger than the default page table size, respectively called contpte >>> and svnapot. >>> >>> The riscv implementation differs from the arm64's in that the LSBs of the >>> pfn of a svnapot pte are used to store the size of the mapping, allowing >>> for future sizes to be added (for now only 64KB is supported). That's an >>> issue for the core mm code which expects to find the *real* pfn a pte points >>> to. Patch 1 fixes that by always returning svnapot ptes with the real pfn >>> and restores the size of the mapping when it is written to a page table. >>> >>> The following patches are just merges of the 2 different implementations >>> that currently exist in arm64 and riscv which are very similar. It paves >>> the way to the reuse of the recent contpte THP work by Ryan [1] to avoid >>> reimplementing the same in riscv. >>> >>> This patchset was tested by running the libhugetlbfs testsuite with 64KB >>> and 2MB pages on both architectures (on a 4KB base page size arm64 kernel). >>> >>> [1] https://lore.kernel.org/linux-arm-kernel/20240215103205.2607016-1- >>> ryan.roberts@arm.com/ >>> >>> v4: https://lore.kernel.org/linux-riscv/20250127093530.19548-1- >>> alexghiti@rivosinc.com/ >>> v3: https://lore.kernel.org/all/20240802151430.99114-1-alexghiti@rivosinc.com/ >>> v2: https://lore.kernel.org/linux-riscv/20240508113419.18620-1- >>> alexghiti@rivosinc.com/ >>> v1: https://lore.kernel.org/linux-riscv/20240301091455.246686-1- >>> alexghiti@rivosinc.com/ >>> >>> Changes in v5: >>> - Fix "int i" unused variable in patch 2 (as reported by PW) >>> - Fix !svnapot build >>> - Fix arch_make_huge_pte() which returned a real napot pte >>> - Make __ptep_get(), ptep_get_and_clear() and __set_ptes() napot aware to >>> avoid leaking real napot pfns to core mm >>> - Fix arch_contpte_get_num_contig() that used to always try to get the >>> mapping size from the ptep, which does not work if the ptep comes the >>> core mm >>> - Rebase on top of 6.14-rc7 + fix for >>> huge_ptep_get_and_clear()/huge_pte_clear() >>> https://lore.kernel.org/linux-riscv/20250317072551.572169-1- >>> alexghiti@rivosinc.com/ >>> >>> Changes in v4: >>> - Rebase on top of 6.13 >>> >>> Changes in v3: >>> - Split set_ptes and ptep_get into internal and external API (Ryan) >>> - Rename ARCH_HAS_CONTPTE into ARCH_WANT_GENERAL_HUGETLB_CONTPTE so that >>> we split hugetlb functions from contpte functions (actually riscv contpte >>> functions to support THP will come into another series) (Ryan) >>> - Rebase on top of 6.11-rc1 >>> >>> Changes in v2: >>> - Rebase on top of 6.9-rc3 >>> >>> Alexandre Ghiti (9): >>> riscv: Safely remove huge_pte_offset() when manipulating NAPOT ptes >>> riscv: Restore the pfn in a NAPOT pte when manipulated by core mm code >>> mm: Use common huge_ptep_get() function for riscv/arm64 >>> mm: Use common set_huge_pte_at() function for riscv/arm64 >>> mm: Use common huge_pte_clear() function for riscv/arm64 >>> mm: Use common huge_ptep_get_and_clear() function for riscv/arm64 >>> mm: Use common huge_ptep_set_access_flags() function for riscv/arm64 >>> mm: Use common huge_ptep_set_wrprotect() function for riscv/arm64 >>> mm: Use common huge_ptep_clear_flush() function for riscv/arm64 >>> >>> arch/arm64/Kconfig | 1 + >>> arch/arm64/include/asm/hugetlb.h | 22 +-- >>> arch/arm64/include/asm/pgtable.h | 68 ++++++- >>> arch/arm64/mm/hugetlbpage.c | 294 +--------------------------- >>> arch/riscv/Kconfig | 1 + >>> arch/riscv/include/asm/hugetlb.h | 36 +--- >>> arch/riscv/include/asm/pgtable-64.h | 11 ++ >>> arch/riscv/include/asm/pgtable.h | 222 ++++++++++++++++++--- >>> arch/riscv/mm/hugetlbpage.c | 243 +---------------------- >>> arch/riscv/mm/pgtable.c | 6 +- >>> include/linux/hugetlb_contpte.h | 39 ++++ >>> mm/Kconfig | 3 + >>> mm/Makefile | 1 + >>> mm/hugetlb_contpte.c | 258 ++++++++++++++++++++++++ >>> 14 files changed, 583 insertions(+), 622 deletions(-) >>> create mode 100644 include/linux/hugetlb_contpte.h >>> create mode 100644 mm/hugetlb_contpte.c >>> >> From mboxrd@z Thu Jan 1 00:00:00 1970 >> Return-Path: <linux-riscv-bounces+linux- >> riscv=archiver.kernel.org@lists.infradead.org> >> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on >> aws-us-west-2-korg-lkml-1.web.codeaurora.org >> Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) >> (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) >> (No client certificate requested) >> by smtp.lore.kernel.org (Postfix) with ESMTPS id A4D94C3601E >> for <linux-riscv@archiver.kernel.org>; Mon, 7 Apr 2025 12:35:59 +0000 (UTC) >> DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; >> d=lists.infradead.org; s=bombadil.20210309; h=Sender:Content-Type: >> Content-Transfer-Encoding:List-Subscribe:List-Help:List-Post:List-Archive: >> List-Unsubscribe:List-Id:In-Reply-To:From:References:To:Subject:MIME-Version: >> Date:Message-ID:Reply-To:Cc:Content-ID:Content-Description:Resent-Date: >> Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; >> bh=QGtw44ZccGhXZHG0gus8jo8nditsIsPYxbfRUYIB+hU=; b=TuC4N8bBiqSCZqINAlCMfr1aa0 >> HKCtL5AM0VsHJ36rTV1TZCiAN0tKuI4mbGKMbrvNUcKXaa0IaZGgplHJXZPCwfiRmK51dvr1ndwc+ >> x4+UfoK5lEB2HNBzTjcA9nH164vMm8lu0bitMWB+QzfpYT0nprO+11bFlBPqZVI35bwer5bTytL/w >> 2PtmHktDSGJXgSCnDKefpnBo+yiIKU2uq7dhR713fLa1hzLYi5f0+2trqJXfZ5ADJSOBaZc6h2RQo >> Hfb0DRyNJsiBjuBYn3H1+RCnv6lZwV1eVbltqj1BIjrb0C32Zmnb7FxqUYECyH4vEhWbmYgbwpAKI >> 8BYmZxbA==; >> Received: from localhost ([::1] helo=bombadil.infradead.org) >> by bombadil.infradead.org with esmtp (Exim 4.98.1 #2 (Red Hat Linux)) >> id 1u1lhh-00000000H0X-3INP; >> Mon, 07 Apr 2025 12:35:53 +0000 >> Received: from relay2-d.mail.gandi.net ([2001:4b98:dc4:8::222]) >> by bombadil.infradead.org with esmtps (Exim 4.98.1 #2 (Red Hat Linux)) >> id 1u1lDQ-000000009MS-3LfF; >> Mon, 07 Apr 2025 12:04:39 +0000 >> Received: by mail.gandi.net (Postfix) with ESMTPSA id E350243163; >> Mon, 7 Apr 2025 12:04:28 +0000 (UTC) >> Message-ID: <4dd5d187-f977-4f27-9937-8608991797b5@ghiti.fr> >> Date: Mon, 7 Apr 2025 14:04:27 +0200 >> MIME-Version: 1.0 >> User-Agent: Mozilla Thunderbird >> Subject: Re: [PATCH v5 0/9] Merge arm64/riscv hugetlbfs contpte support >> Content-Language: en-US >> To: Alexandre Ghiti <alexghiti@rivosinc.com>, >> Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>, >> Ryan Roberts <ryan.roberts@arm.com>, Mark Rutland <mark.rutland@arm.com>, >> Matthew Wilcox <willy@infradead.org>, >> Paul Walmsley <paul.walmsley@sifive.com>, Palmer Dabbelt >> <palmer@dabbelt.com>, Andrew Morton <akpm@linux-foundation.org>, >> linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, >> linux-riscv@lists.infradead.org, linux-mm@kvack.org >> References: <20250321130635.227011-1-alexghiti@rivosinc.com> >> From: Alexandre Ghiti <alex@ghiti.fr> >> In-Reply-To: <20250321130635.227011-1-alexghiti@rivosinc.com> >> X-GND-State: clean >> X-GND-Score: -100 >> X-GND-Cause: >> gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvtddtudegucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuifetpfffkfdpucggtfgfnhhsuhgsshgtrhhisggvnecuuegrihhlohhuthemuceftddunecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjughrpefkffggfgfuvfhfhfgjtgfgsehtjeertddtvdejnecuhfhrohhmpeetlhgvgigrnhgurhgvucfihhhithhiuceorghlvgigsehghhhithhirdhfrheqnecuggftrfgrthhtvghrnhepveetvdfhvdeuheekvdettdegheetgeejiefgjeetvedtfeeuvddvtefhjeffgeevnecuffhomhgrihhnpehkvghrnhgvlhdrohhrghenucfkphepvddttddumeekiedumeeffeekvdemvghfledtmeeisgemvdefgeekmeduvgduheemvdgrgeehnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehinhgvthepvddttddumeekiedumeeffeekvdemvghfledtmeeisgemvdefgeekmeduvgduheemvdgrgeehpdhhvghloheplgfkrfggieemvddttddumeekiedumeeffeekvdemvghfledtmeeisgemvdefgeekmeduvgduheemvdgrgeehngdpmhgrihhlfhhrohhmpegrlhgvgiesghhhihhtihdrfhhrpdhnsggprhgtphhtthhopedufedprhgtphhtthhopegrlhgvgihghhhithhisehrihhvohhsihhntgdrtghomhdprhgtphhtthhopegtrghtrghlihhnrdhmrghrihhnrghssegrrhhmrdgtohhmpdhrtghpthhtohepfihilhhls >> ehkvghrnhgvlhdrohhrghdprhgtphhtthhopehrhigrnhdrrhhosggvrhhtshesrghrmhdrtghomhdprhgtphhtthhopehmrghrkhdrrhhuthhlrghnugesrghrmhdrtghomhdprhgtphhtthhopeifihhllhihsehinhhfrhgruggvrggurdhorhhgpdhrtghpthhtohepphgruhhlrdifrghlmhhslhgvhiesshhifhhivhgvrdgtohhmpdhrtghpthhtohepphgrlhhmvghrsegurggssggvlhhtrdgtohhm >> X-GND-Sasl: alex@ghiti.fr >> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X- >> CRM114-CacheID: sfid-20250407_050436_994014_8B16F654 X-CRM114-Status: GOOD ( >> 23.24 ) >> X-BeenThere: linux-riscv@lists.infradead.org >> X-Mailman-Version: 2.1.34 >> Precedence: list >> List-Id: <linux-riscv.lists.infradead.org> >> List-Unsubscribe: <http://lists.infradead.org/mailman/options/linux-riscv>, >> <mailto:linux-riscv-request@lists.infradead.org?subject=unsubscribe> >> List-Archive: <http://lists.infradead.org/pipermail/linux-riscv/> >> List-Post: <mailto:linux-riscv@lists.infradead.org> >> List-Help: <mailto:linux-riscv-request@lists.infradead.org?subject=help> >> List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-riscv>, >> <mailto:linux-riscv-request@lists.infradead.org?subject=subscribe> >> Content-Transfer-Encoding: 7bit >> Content-Type: text/plain; charset="us-ascii"; Format="flowed" >> Sender: "linux-riscv" <linux-riscv-bounces@lists.infradead.org> >> Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org >> >> Can someone from arm64 review this? I think it's preferable to share the same >> implementation between riscv and arm64. >> >> The end goal is the support of mTHP using svnapot on riscv, which we want soon, >> so if that patchset does not gain any traction, I'll just copy/paste the arm64 >> implementation into riscv. >> >> Thanks, >> >> Alex >> >> On 21/03/2025 14:06, Alexandre Ghiti wrote: >>> This patchset intends to merge the contiguous ptes hugetlbfs implementation >>> of arm64 and riscv. >>> >>> Both arm64 and riscv support the use of contiguous ptes to map pages that >>> are larger than the default page table size, respectively called contpte >>> and svnapot. >>> >>> The riscv implementation differs from the arm64's in that the LSBs of the >>> pfn of a svnapot pte are used to store the size of the mapping, allowing >>> for future sizes to be added (for now only 64KB is supported). That's an >>> issue for the core mm code which expects to find the *real* pfn a pte points >>> to. Patch 1 fixes that by always returning svnapot ptes with the real pfn >>> and restores the size of the mapping when it is written to a page table. >>> >>> The following patches are just merges of the 2 different implementations >>> that currently exist in arm64 and riscv which are very similar. It paves >>> the way to the reuse of the recent contpte THP work by Ryan [1] to avoid >>> reimplementing the same in riscv. >>> >>> This patchset was tested by running the libhugetlbfs testsuite with 64KB >>> and 2MB pages on both architectures (on a 4KB base page size arm64 kernel). >>> >>> [1] https://lore.kernel.org/linux-arm-kernel/20240215103205.2607016-1- >>> ryan.roberts@arm.com/ >>> >>> v4: https://lore.kernel.org/linux-riscv/20250127093530.19548-1- >>> alexghiti@rivosinc.com/ >>> v3: https://lore.kernel.org/all/20240802151430.99114-1-alexghiti@rivosinc.com/ >>> v2: https://lore.kernel.org/linux-riscv/20240508113419.18620-1- >>> alexghiti@rivosinc.com/ >>> v1: https://lore.kernel.org/linux-riscv/20240301091455.246686-1- >>> alexghiti@rivosinc.com/ >>> >>> Changes in v5: >>> - Fix "int i" unused variable in patch 2 (as reported by PW) >>> - Fix !svnapot build >>> - Fix arch_make_huge_pte() which returned a real napot pte >>> - Make __ptep_get(), ptep_get_and_clear() and __set_ptes() napot aware to >>> avoid leaking real napot pfns to core mm >>> - Fix arch_contpte_get_num_contig() that used to always try to get the >>> mapping size from the ptep, which does not work if the ptep comes the >>> core mm >>> - Rebase on top of 6.14-rc7 + fix for >>> huge_ptep_get_and_clear()/huge_pte_clear() >>> https://lore.kernel.org/linux-riscv/20250317072551.572169-1- >>> alexghiti@rivosinc.com/ >>> >>> Changes in v4: >>> - Rebase on top of 6.13 >>> >>> Changes in v3: >>> - Split set_ptes and ptep_get into internal and external API (Ryan) >>> - Rename ARCH_HAS_CONTPTE into ARCH_WANT_GENERAL_HUGETLB_CONTPTE so that >>> we split hugetlb functions from contpte functions (actually riscv contpte >>> functions to support THP will come into another series) (Ryan) >>> - Rebase on top of 6.11-rc1 >>> >>> Changes in v2: >>> - Rebase on top of 6.9-rc3 >>> >>> Alexandre Ghiti (9): >>> riscv: Safely remove huge_pte_offset() when manipulating NAPOT ptes >>> riscv: Restore the pfn in a NAPOT pte when manipulated by core mm code >>> mm: Use common huge_ptep_get() function for riscv/arm64 >>> mm: Use common set_huge_pte_at() function for riscv/arm64 >>> mm: Use common huge_pte_clear() function for riscv/arm64 >>> mm: Use common huge_ptep_get_and_clear() function for riscv/arm64 >>> mm: Use common huge_ptep_set_access_flags() function for riscv/arm64 >>> mm: Use common huge_ptep_set_wrprotect() function for riscv/arm64 >>> mm: Use common huge_ptep_clear_flush() function for riscv/arm64 >>> >>> arch/arm64/Kconfig | 1 + >>> arch/arm64/include/asm/hugetlb.h | 22 +-- >>> arch/arm64/include/asm/pgtable.h | 68 ++++++- >>> arch/arm64/mm/hugetlbpage.c | 294 +--------------------------- >>> arch/riscv/Kconfig | 1 + >>> arch/riscv/include/asm/hugetlb.h | 36 +--- >>> arch/riscv/include/asm/pgtable-64.h | 11 ++ >>> arch/riscv/include/asm/pgtable.h | 222 ++++++++++++++++++--- >>> arch/riscv/mm/hugetlbpage.c | 243 +---------------------- >>> arch/riscv/mm/pgtable.c | 6 +- >>> include/linux/hugetlb_contpte.h | 39 ++++ >>> mm/Kconfig | 3 + >>> mm/Makefile | 1 + >>> mm/hugetlb_contpte.c | 258 ++++++++++++++++++++++++ >>> 14 files changed, 583 insertions(+), 622 deletions(-) >>> create mode 100644 include/linux/hugetlb_contpte.h >>> create mode 100644 mm/hugetlb_contpte.c >>> >> _______________________________________________ >> linux-riscv mailing list >> linux-riscv@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-riscv >> > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
Hi folks, On Mon, May 05, 2025 at 06:08:50PM +0200, Alexandre Ghiti wrote: > On 29/04/2025 16:09, Ryan Roberts wrote: > > On 07/04/2025 13:04, Alexandre Ghiti wrote: > > > Can someone from arm64 review this? I think it's preferable to share the same > > > implementation between riscv and arm64. > > I've been thinking about this for a while and had some conversations internally. > > This patchset has both pros and cons. > > > > In the pros column, it increases code reuse in an area that has had quite of few > > bugs popping up lately; so this would bring more eyes and hopefully higher > > quality in the long run. > > > > But in the cons column, we have seen HW errata in similar areas in the past and > > I'm nervous that by hoisting this code to mm, we make it harder to workaround > > any future errata. Additionally I can imagine that this change could make it > > harder to support future Arm architecture enhancements. > > > > I appreciate the cons are not strong *technical* arguments but nevertheless they > > are winning out in this case; My opinion is that we should keep the arm64 > > implementations of huge_pte_ (and contpte_ too - I know you have a separate > > series for this) private to arm64. > > > > Sorry about that. > > > > > The end goal is the support of mTHP using svnapot on riscv, which we want soon, > > > so if that patchset does not gain any traction, I'll just copy/paste the arm64 > > > implementation into riscv. > > This copy/paste approach would be my preference. > > > I have to admit that I disagree with this approach, the riscv and arm64 > implementations are *exactly* the same so it sounds weird to duplicate code, > the pros you mention outweigh the cons. > > Unless I'm missing something about the erratas? To me, that's easily fixed > by providing arch specific overrides no? Can you describe what sort of > erratas would not fit then? If we start with the common implementation you have here, nothing prevents us from forking the code in future if the architectures diverge so I'd be inclined to merge this series and see how we get on. However, one thing I *do* think we need to ensure is that the relevant folks from both arm64 (i.e. Ryan) and riscv (i.e. Alexandre) are cc'd on changes to the common code. Otherwise, it's going to be a step backwards in terms of maintainability. Could we add something to MAINTAINERS so that the new file picks you both up as reviewers? Will
Hi Will, On Thu, May 8, 2025 at 2:30 PM Will Deacon <will@kernel.org> wrote: > > Hi folks, > > On Mon, May 05, 2025 at 06:08:50PM +0200, Alexandre Ghiti wrote: > > On 29/04/2025 16:09, Ryan Roberts wrote: > > > On 07/04/2025 13:04, Alexandre Ghiti wrote: > > > > Can someone from arm64 review this? I think it's preferable to share the same > > > > implementation between riscv and arm64. > > > I've been thinking about this for a while and had some conversations internally. > > > This patchset has both pros and cons. > > > > > > In the pros column, it increases code reuse in an area that has had quite of few > > > bugs popping up lately; so this would bring more eyes and hopefully higher > > > quality in the long run. > > > > > > But in the cons column, we have seen HW errata in similar areas in the past and > > > I'm nervous that by hoisting this code to mm, we make it harder to workaround > > > any future errata. Additionally I can imagine that this change could make it > > > harder to support future Arm architecture enhancements. > > > > > > I appreciate the cons are not strong *technical* arguments but nevertheless they > > > are winning out in this case; My opinion is that we should keep the arm64 > > > implementations of huge_pte_ (and contpte_ too - I know you have a separate > > > series for this) private to arm64. > > > > > > Sorry about that. > > > > > > > The end goal is the support of mTHP using svnapot on riscv, which we want soon, > > > > so if that patchset does not gain any traction, I'll just copy/paste the arm64 > > > > implementation into riscv. > > > This copy/paste approach would be my preference. > > > > > > I have to admit that I disagree with this approach, the riscv and arm64 > > implementations are *exactly* the same so it sounds weird to duplicate code, > > the pros you mention outweigh the cons. > > > > Unless I'm missing something about the erratas? To me, that's easily fixed > > by providing arch specific overrides no? Can you describe what sort of > > erratas would not fit then? > > If we start with the common implementation you have here, nothing > prevents us from forking the code in future if the architectures diverge > so I'd be inclined to merge this series and see how we get on. However, > one thing I *do* think we need to ensure is that the relevant folks from > both arm64 (i.e. Ryan) and riscv (i.e. Alexandre) are cc'd on changes to > the common code. Otherwise, it's going to be a step backwards in terms > of maintainability. > > Could we add something to MAINTAINERS so that the new file picks you both > up as reviewers? I'm adding Lorenzo as he is cleaning the mm MAINTAINERS entries. @Lorenzo: should we add a new section "CONTPTE" for this? FYI, hugetlb is the first patchset, I have another patchset to merge THP contpte support [1] as well so the "HUGETLB" section does not seem to be a good fit. [1] https://lore.kernel.org/linux-riscv/20240508191931.46060-1-alexghiti@rivosinc.com/ Thanks, Alex > > Will
On 09/05/2025 12:09, Alexandre Ghiti wrote: > Hi Will, > > On Thu, May 8, 2025 at 2:30 PM Will Deacon <will@kernel.org> wrote: >> >> Hi folks, >> >> On Mon, May 05, 2025 at 06:08:50PM +0200, Alexandre Ghiti wrote: >>> On 29/04/2025 16:09, Ryan Roberts wrote: >>>> On 07/04/2025 13:04, Alexandre Ghiti wrote: >>>>> Can someone from arm64 review this? I think it's preferable to share the same >>>>> implementation between riscv and arm64. >>>> I've been thinking about this for a while and had some conversations internally. >>>> This patchset has both pros and cons. >>>> >>>> In the pros column, it increases code reuse in an area that has had quite of few >>>> bugs popping up lately; so this would bring more eyes and hopefully higher >>>> quality in the long run. >>>> >>>> But in the cons column, we have seen HW errata in similar areas in the past and >>>> I'm nervous that by hoisting this code to mm, we make it harder to workaround >>>> any future errata. Additionally I can imagine that this change could make it >>>> harder to support future Arm architecture enhancements. >>>> >>>> I appreciate the cons are not strong *technical* arguments but nevertheless they >>>> are winning out in this case; My opinion is that we should keep the arm64 >>>> implementations of huge_pte_ (and contpte_ too - I know you have a separate >>>> series for this) private to arm64. >>>> >>>> Sorry about that. >>>> >>>>> The end goal is the support of mTHP using svnapot on riscv, which we want soon, >>>>> so if that patchset does not gain any traction, I'll just copy/paste the arm64 >>>>> implementation into riscv. >>>> This copy/paste approach would be my preference. >>> >>> >>> I have to admit that I disagree with this approach, the riscv and arm64 >>> implementations are *exactly* the same so it sounds weird to duplicate code, >>> the pros you mention outweigh the cons. >>> >>> Unless I'm missing something about the erratas? To me, that's easily fixed >>> by providing arch specific overrides no? Can you describe what sort of >>> erratas would not fit then? One concrete feature is the use of Arm's FEAT_BBM level 2 to avoid having to do break-before-make and TLB maintenance when doing a fold or unfold operation. There is a series in flight to add this support at [1]. I can see this type of approach being extended to the hugetlb helpers in future. I also have another series in flight at [2] that tidies up the hugetlb implementation and does some optimizations. But the optimizations depend on arm64-specific TLB maintenance APIs. [1] https://lore.kernel.org/linux-arm-kernel/20250428153514.55772-2-miko.lenczewski@arm.com/ [2] https://lore.kernel.org/linux-arm-kernel/20250422081822.1836315-1-ryan.roberts@arm.com/ As for errata, that's obviously much more fuzzy; there have been a bunch relating to the MMU in the recent past, and I wouldn't be shocked if more turned up. For future architecture enchancements, I'm aware of one potential feature being discussed for which this change would likely make it harder to implement. >> >> If we start with the common implementation you have here, nothing >> prevents us from forking the code in future if the architectures diverge >> so I'd be inclined to merge this series and see how we get on. OK if that's your preference, I'm ok with it. I don't have strong opinion, just a sense that we will end up with loads of arch-specific overrides. As you say, let's see. Alexandre, I guess this series is quite old now and will need to incorporate the hugtelb fixes I did last cycle? And ideally I'd like [2] to land then for that to also be incorporated into your next version. (I'm still hopeful we can get [2] into v6.16 and have been waiting patiently for Will to pick it up ;) ). I guess we can worry about [1] later as that is only affected by your other series. How does that sound? >> However, >> one thing I *do* think we need to ensure is that the relevant folks from >> both arm64 (i.e. Ryan) and riscv (i.e. Alexandre) are cc'd on changes to >> the common code. Otherwise, it's going to be a step backwards in terms >> of maintainability. >>>> Could we add something to MAINTAINERS so that the new file picks you both >> up as reviewers? That's fine with me. Lorenzo added me for some parts of MM this cycle anyway. Thanks, Ryan > > I'm adding Lorenzo as he is cleaning the mm MAINTAINERS entries. > > @Lorenzo: should we add a new section "CONTPTE" for this? FYI, hugetlb > is the first patchset, I have another patchset to merge THP contpte > support [1] as well so the "HUGETLB" section does not seem to be a > good fit. > > [1] https://lore.kernel.org/linux-riscv/20240508191931.46060-1-alexghiti@rivosinc.com/ > > Thanks, > > Alex > >> >> Will
-cc my gmail, I no longer check kernel mail here at all, everything is via my work mail (lorenzo.stoakes@oracle.com :) So apologies for missing this. On Fri, May 09, 2025 at 02:02:03PM +0100, Ryan Roberts wrote: > On 09/05/2025 12:09, Alexandre Ghiti wrote: > > Hi Will, > > > > On Thu, May 8, 2025 at 2:30 PM Will Deacon <will@kernel.org> wrote: > >> > >> Hi folks, > >> > >> On Mon, May 05, 2025 at 06:08:50PM +0200, Alexandre Ghiti wrote: > >>> On 29/04/2025 16:09, Ryan Roberts wrote: > >>>> On 07/04/2025 13:04, Alexandre Ghiti wrote: > >>>>> Can someone from arm64 review this? I think it's preferable to share the same > >>>>> implementation between riscv and arm64. > >>>> I've been thinking about this for a while and had some conversations internally. > >>>> This patchset has both pros and cons. > >>>> > >>>> In the pros column, it increases code reuse in an area that has had quite of few > >>>> bugs popping up lately; so this would bring more eyes and hopefully higher > >>>> quality in the long run. > >>>> > >>>> But in the cons column, we have seen HW errata in similar areas in the past and > >>>> I'm nervous that by hoisting this code to mm, we make it harder to workaround > >>>> any future errata. Additionally I can imagine that this change could make it > >>>> harder to support future Arm architecture enhancements. > >>>> > >>>> I appreciate the cons are not strong *technical* arguments but nevertheless they > >>>> are winning out in this case; My opinion is that we should keep the arm64 > >>>> implementations of huge_pte_ (and contpte_ too - I know you have a separate > >>>> series for this) private to arm64. > >>>> > >>>> Sorry about that. > >>>> > >>>>> The end goal is the support of mTHP using svnapot on riscv, which we want soon, > >>>>> so if that patchset does not gain any traction, I'll just copy/paste the arm64 > >>>>> implementation into riscv. > >>>> This copy/paste approach would be my preference. > >>> > >>> > >>> I have to admit that I disagree with this approach, the riscv and arm64 > >>> implementations are *exactly* the same so it sounds weird to duplicate code, > >>> the pros you mention outweigh the cons. > >>> > >>> Unless I'm missing something about the erratas? To me, that's easily fixed > >>> by providing arch specific overrides no? Can you describe what sort of > >>> erratas would not fit then? > > One concrete feature is the use of Arm's FEAT_BBM level 2 to avoid having to do > break-before-make and TLB maintenance when doing a fold or unfold operation. > There is a series in flight to add this support at [1]. I can see this type of > approach being extended to the hugetlb helpers in future. > > I also have another series in flight at [2] that tidies up the hugetlb > implementation and does some optimizations. But the optimizations depend on > arm64-specific TLB maintenance APIs. > > [1] > https://lore.kernel.org/linux-arm-kernel/20250428153514.55772-2-miko.lenczewski@arm.com/ > > [2] > https://lore.kernel.org/linux-arm-kernel/20250422081822.1836315-1-ryan.roberts@arm.com/ > > As for errata, that's obviously much more fuzzy; there have been a bunch > relating to the MMU in the recent past, and I wouldn't be shocked if more turned up. > > For future architecture enchancements, I'm aware of one potential feature being > discussed for which this change would likely make it harder to implement. > > >> > >> If we start with the common implementation you have here, nothing > >> prevents us from forking the code in future if the architectures diverge > >> so I'd be inclined to merge this series and see how we get on. > > OK if that's your preference, I'm ok with it. I don't have strong opinion, just > a sense that we will end up with loads of arch-specific overrides. As you say, > let's see. > > Alexandre, I guess this series is quite old now and will need to incorporate the > hugtelb fixes I did last cycle? And ideally I'd like [2] to land then for that > to also be incorporated into your next version. (I'm still hopeful we can get > [2] into v6.16 and have been waiting patiently for Will to pick it up ;) ). > > I guess we can worry about [1] later as that is only affected by your other series. > > How does that sound? > > >> However, > >> one thing I *do* think we need to ensure is that the relevant folks from > >> both arm64 (i.e. Ryan) and riscv (i.e. Alexandre) are cc'd on changes to > >> the common code. Otherwise, it's going to be a step backwards in terms > >> of maintainability. > >>>> Could we add something to MAINTAINERS so that the new file picks you both > >> up as reviewers? > > That's fine with me. Lorenzo added me for some parts of MM this cycle anyway. > > Thanks, > Ryan Indeed :) happy to have you there Ryan! > > > > > I'm adding Lorenzo as he is cleaning the mm MAINTAINERS entries. > > > > @Lorenzo: should we add a new section "CONTPTE" for this? FYI, hugetlb > > is the first patchset, I have another patchset to merge THP contpte > > support [1] as well so the "HUGETLB" section does not seem to be a > > good fit. Hm, this does seem to be very arm64-specific right? But having said that, literally can see risc v entries :) We are in a strange sort of scenario where there's some cross-over here. I don't strictly object to it though, this stuff is important and we should get the mm files absolutely under an appropriate MAINTAINER entry. So right now it seems the files would consist of: include/linux/hugetlb_contpte.h mm/hugetlb_contpte.c Is this correct? Is this series intended to be taken by Andrew or through an arch tree? And who would you sensibly propose for M's and R's? If we are definitely adding things that sit outside hugetlb or anything arch-specific, and is in fact generic mm code, then yes this should be a section. Does contpte stand for 'Contiguous PTE'? Then entry could perhaps be: MEMORY MANAGEMENT - CONTPTE (CONTIGUOUS PTE SUPPORT) I'd say this entry should probably be added as a patch in this series. If you give me a list of R's and M's and confirm those files I can very quickly copy/pasta from an existing entry and then you could respin (and cc my work mail for the series :P) and include that as an additional patch? Happy to ACK that in that case. > > > > [1] https://lore.kernel.org/linux-riscv/20240508191931.46060-1-alexghiti@rivosinc.com/ > > > > Thanks, > > > > Alex > > > >> > >> Will > Cheers, Lorenzo
Hi Lorenzo, On 5/21/25 16:57, Lorenzo Stoakes wrote: > -cc my gmail, I no longer check kernel mail here at all, everything is via my > work mail (lorenzo.stoakes@oracle.com :) > > So apologies for missing this. > > On Fri, May 09, 2025 at 02:02:03PM +0100, Ryan Roberts wrote: >> On 09/05/2025 12:09, Alexandre Ghiti wrote: >>> Hi Will, >>> >>> On Thu, May 8, 2025 at 2:30 PM Will Deacon <will@kernel.org> wrote: >>>> Hi folks, >>>> >>>> On Mon, May 05, 2025 at 06:08:50PM +0200, Alexandre Ghiti wrote: >>>>> On 29/04/2025 16:09, Ryan Roberts wrote: >>>>>> On 07/04/2025 13:04, Alexandre Ghiti wrote: >>>>>>> Can someone from arm64 review this? I think it's preferable to share the same >>>>>>> implementation between riscv and arm64. >>>>>> I've been thinking about this for a while and had some conversations internally. >>>>>> This patchset has both pros and cons. >>>>>> >>>>>> In the pros column, it increases code reuse in an area that has had quite of few >>>>>> bugs popping up lately; so this would bring more eyes and hopefully higher >>>>>> quality in the long run. >>>>>> >>>>>> But in the cons column, we have seen HW errata in similar areas in the past and >>>>>> I'm nervous that by hoisting this code to mm, we make it harder to workaround >>>>>> any future errata. Additionally I can imagine that this change could make it >>>>>> harder to support future Arm architecture enhancements. >>>>>> >>>>>> I appreciate the cons are not strong *technical* arguments but nevertheless they >>>>>> are winning out in this case; My opinion is that we should keep the arm64 >>>>>> implementations of huge_pte_ (and contpte_ too - I know you have a separate >>>>>> series for this) private to arm64. >>>>>> >>>>>> Sorry about that. >>>>>> >>>>>>> The end goal is the support of mTHP using svnapot on riscv, which we want soon, >>>>>>> so if that patchset does not gain any traction, I'll just copy/paste the arm64 >>>>>>> implementation into riscv. >>>>>> This copy/paste approach would be my preference. >>>>> >>>>> I have to admit that I disagree with this approach, the riscv and arm64 >>>>> implementations are *exactly* the same so it sounds weird to duplicate code, >>>>> the pros you mention outweigh the cons. >>>>> >>>>> Unless I'm missing something about the erratas? To me, that's easily fixed >>>>> by providing arch specific overrides no? Can you describe what sort of >>>>> erratas would not fit then? >> One concrete feature is the use of Arm's FEAT_BBM level 2 to avoid having to do >> break-before-make and TLB maintenance when doing a fold or unfold operation. >> There is a series in flight to add this support at [1]. I can see this type of >> approach being extended to the hugetlb helpers in future. >> >> I also have another series in flight at [2] that tidies up the hugetlb >> implementation and does some optimizations. But the optimizations depend on >> arm64-specific TLB maintenance APIs. >> >> [1] >> https://lore.kernel.org/linux-arm-kernel/20250428153514.55772-2-miko.lenczewski@arm.com/ >> >> [2] >> https://lore.kernel.org/linux-arm-kernel/20250422081822.1836315-1-ryan.roberts@arm.com/ >> >> As for errata, that's obviously much more fuzzy; there have been a bunch >> relating to the MMU in the recent past, and I wouldn't be shocked if more turned up. >> >> For future architecture enchancements, I'm aware of one potential feature being >> discussed for which this change would likely make it harder to implement. >> >>>> If we start with the common implementation you have here, nothing >>>> prevents us from forking the code in future if the architectures diverge >>>> so I'd be inclined to merge this series and see how we get on. >> OK if that's your preference, I'm ok with it. I don't have strong opinion, just >> a sense that we will end up with loads of arch-specific overrides. As you say, >> let's see. >> >> Alexandre, I guess this series is quite old now and will need to incorporate the >> hugtelb fixes I did last cycle? And ideally I'd like [2] to land then for that >> to also be incorporated into your next version. (I'm still hopeful we can get >> [2] into v6.16 and have been waiting patiently for Will to pick it up ;) ). >> >> I guess we can worry about [1] later as that is only affected by your other series. >> >> How does that sound? >> >>>> However, >>>> one thing I *do* think we need to ensure is that the relevant folks from >>>> both arm64 (i.e. Ryan) and riscv (i.e. Alexandre) are cc'd on changes to >>>> the common code. Otherwise, it's going to be a step backwards in terms >>>> of maintainability. >>>>>> Could we add something to MAINTAINERS so that the new file picks you both >>>> up as reviewers? >> That's fine with me. Lorenzo added me for some parts of MM this cycle anyway. >> >> Thanks, >> Ryan > Indeed :) happy to have you there Ryan! > >>> I'm adding Lorenzo as he is cleaning the mm MAINTAINERS entries. >>> >>> @Lorenzo: should we add a new section "CONTPTE" for this? FYI, hugetlb >>> is the first patchset, I have another patchset to merge THP contpte >>> support [1] as well so the "HUGETLB" section does not seem to be a >>> good fit. > Hm, this does seem to be very arm64-specific right? > > But having said that, literally can see risc v entries :) > > We are in a strange sort of scenario where there's some cross-over here. > > I don't strictly object to it though, this stuff is important and we should get > the mm files absolutely under an appropriate MAINTAINER entry. > > So right now it seems the files would consist of: > > include/linux/hugetlb_contpte.h > mm/hugetlb_contpte.c > > Is this correct? For now, it is, yes. When this first series gets merged, I would come up with another series that will introduce other files for riscv to support thp contpte based on the arm64 implementation. > > Is this series intended to be taken by Andrew or through an arch tree? I can pick it up in the riscv tree once I have Acked-by from arm64 maintainers. > > And who would you sensibly propose for M's and R's? Ryan is definitely a M, I would be happy to help as M too but if needed, a R is enough for me. > > If we are definitely adding things that sit outside hugetlb or anything > arch-specific, and is in fact generic mm code, then yes this should be a > section. > > Does contpte stand for 'Contiguous PTE'? Yes, that's the name arm64 gave to this feature (more understandable than svnapot for the riscv feature). > > Then entry could perhaps be: > > MEMORY MANAGEMENT - CONTPTE (CONTIGUOUS PTE SUPPORT) > > I'd say this entry should probably be added as a patch in this series. > > If you give me a list of R's and M's and confirm those files I can very quickly > copy/pasta from an existing entry and then you could respin (and cc my work mail > for the series :P) and include that as an additional patch? You can do that or I can do it on my own based on your previous patches, as you prefer. > > Happy to ACK that in that case. Thanks for jumping in! Alex > > >>> [1] https://lore.kernel.org/linux-riscv/20240508191931.46060-1-alexghiti@rivosinc.com/ >>> >>> Thanks, >>> >>> Alex >>> >>>> Will > Cheers, Lorenzo >
Andrew - does taking this proposed MAINTAINERS change through the riscv tree work for you? This series introduces the files being added there, so it seems sensible to add the MAINTAINERS change to this series. And I believe this series is intended to be taken through the riscv tree so seems sensible to do it there? Proposed entry is 'MEMORY MANAGEMENT - CONTPTE (CONTIGUOUS PTE SUPPORT)', which is explicitly relevant for arm64, riscv. Thanks! On Tue, May 27, 2025 at 11:25:57AM +0200, Alexandre Ghiti wrote: > Hi Lorenzo, > > On 5/21/25 16:57, Lorenzo Stoakes wrote: [snip] > > So right now it seems the files would consist of: > > > > include/linux/hugetlb_contpte.h > > mm/hugetlb_contpte.c > > > > Is this correct? > > > For now, it is, yes. When this first series gets merged, I would come up > with another series that will introduce other files for riscv to support thp > contpte based on the arm64 implementation. Cool! > > > > > > Is this series intended to be taken by Andrew or through an arch tree? > > > I can pick it up in the riscv tree once I have Acked-by from arm64 > maintainers. Have pinged Andrew above on this, you'd need an acked-by from mm people also of course. But I guess what makes sense is to take this as a patch in the next respin of this series that actually introduces this stuff. So if Andrew took it, he'd have to take the whole series I would say. > > > > > > And who would you sensibly propose for M's and R's? > > > Ryan is definitely a M, I would be happy to help as M too but if needed, a R > is enough for me. Ryan understands this area better than I do, so I would say it's up to him as to whether he thinks this makes sense. > > > > > > If we are definitely adding things that sit outside hugetlb or anything > > arch-specific, and is in fact generic mm code, then yes this should be a > > section. > > > > Does contpte stand for 'Contiguous PTE'? > > > Yes, that's the name arm64 gave to this feature (more understandable than > svnapot for the riscv feature). Cheers! svnapot, guys... what? :P > > > > > > Then entry could perhaps be: > > > > MEMORY MANAGEMENT - CONTPTE (CONTIGUOUS PTE SUPPORT) > > > > I'd say this entry should probably be added as a patch in this series. > > > > If you give me a list of R's and M's and confirm those files I can very quickly > > copy/pasta from an existing entry and then you could respin (and cc my work mail > > for the series :P) and include that as an additional patch? > > > You can do that or I can do it on my own based on your previous patches, as > you prefer. I absolutely prefer you to do the work haha! ;) Please cc- me on the next respin with this change in and I can take a look. > > > > > > Happy to ACK that in that case. > > > Thanks for jumping in! No problem! > > Alex > > > > > > > > > > [1] https://lore.kernel.org/linux-riscv/20240508191931.46060-1-alexghiti@rivosinc.com/ > > > > > > > > Thanks, > > > > > > > > Alex > > > > > > > > > Will > > Cheers, Lorenzo > > Cheers, Lorenzo
On 27/05/2025 10:37, Lorenzo Stoakes wrote: [...] >>> >>> And who would you sensibly propose for M's and R's? >> >> >> Ryan is definitely a M, I would be happy to help as M too but if needed, a R >> is enough for me. > > Ryan understands this area better than I do, so I would say it's up to him as to > whether he thinks this makes sense. I'd certainly like to be an R. I'd prefer not to sign up for M right now though, unless there is nobody else willing to take it on.
Le 21/03/2025 à 14:06, Alexandre Ghiti a écrit :
> This patchset intends to merge the contiguous ptes hugetlbfs implementation
> of arm64 and riscv.
Can we also add powerpc in the dance ?
powerpc also use contiguous PTEs allthough there is not (yet) a special
name for it:
- b250c8c08c79 powerpc/8xx: Manage 512k huge pages as standard pages
- e47168f3d1b1 powerpc/8xx: Support 16k hugepages with 4k pages
powerpc also use configuous PMDs/PUDs for larger hugepages:
- 57fb15c32f4f ("powerpc/64s: use contiguous PMD/PUD instead of HUGEPD")
- 7c44202e3609 ("powerpc/e500: use contiguous PMD instead of hugepd")
- 0549e7666373 ("powerpc/8xx: rework support for 8M pages using
contiguous PTE entries")
Christophe
>
> Both arm64 and riscv support the use of contiguous ptes to map pages that
> are larger than the default page table size, respectively called contpte
> and svnapot.
>
> The riscv implementation differs from the arm64's in that the LSBs of the
> pfn of a svnapot pte are used to store the size of the mapping, allowing
> for future sizes to be added (for now only 64KB is supported). That's an
> issue for the core mm code which expects to find the *real* pfn a pte points
> to. Patch 1 fixes that by always returning svnapot ptes with the real pfn
> and restores the size of the mapping when it is written to a page table.
>
> The following patches are just merges of the 2 different implementations
> that currently exist in arm64 and riscv which are very similar. It paves
> the way to the reuse of the recent contpte THP work by Ryan [1] to avoid
> reimplementing the same in riscv.
>
> This patchset was tested by running the libhugetlbfs testsuite with 64KB
> and 2MB pages on both architectures (on a 4KB base page size arm64 kernel).
>
> [1] https://lore.kernel.org/linux-arm-kernel/20240215103205.2607016-1-ryan.roberts@arm.com/
>
> v4: https://lore.kernel.org/linux-riscv/20250127093530.19548-1-alexghiti@rivosinc.com/
> v3: https://lore.kernel.org/all/20240802151430.99114-1-alexghiti@rivosinc.com/
> v2: https://lore.kernel.org/linux-riscv/20240508113419.18620-1-alexghiti@rivosinc.com/
> v1: https://lore.kernel.org/linux-riscv/20240301091455.246686-1-alexghiti@rivosinc.com/
>
> Changes in v5:
> - Fix "int i" unused variable in patch 2 (as reported by PW)
> - Fix !svnapot build
> - Fix arch_make_huge_pte() which returned a real napot pte
> - Make __ptep_get(), ptep_get_and_clear() and __set_ptes() napot aware to
> avoid leaking real napot pfns to core mm
> - Fix arch_contpte_get_num_contig() that used to always try to get the
> mapping size from the ptep, which does not work if the ptep comes the core mm
> - Rebase on top of 6.14-rc7 + fix for
> huge_ptep_get_and_clear()/huge_pte_clear()
> https://lore.kernel.org/linux-riscv/20250317072551.572169-1-alexghiti@rivosinc.com/
>
> Changes in v4:
> - Rebase on top of 6.13
>
> Changes in v3:
> - Split set_ptes and ptep_get into internal and external API (Ryan)
> - Rename ARCH_HAS_CONTPTE into ARCH_WANT_GENERAL_HUGETLB_CONTPTE so that
> we split hugetlb functions from contpte functions (actually riscv contpte
> functions to support THP will come into another series) (Ryan)
> - Rebase on top of 6.11-rc1
>
> Changes in v2:
> - Rebase on top of 6.9-rc3
>
> Alexandre Ghiti (9):
> riscv: Safely remove huge_pte_offset() when manipulating NAPOT ptes
> riscv: Restore the pfn in a NAPOT pte when manipulated by core mm code
> mm: Use common huge_ptep_get() function for riscv/arm64
> mm: Use common set_huge_pte_at() function for riscv/arm64
> mm: Use common huge_pte_clear() function for riscv/arm64
> mm: Use common huge_ptep_get_and_clear() function for riscv/arm64
> mm: Use common huge_ptep_set_access_flags() function for riscv/arm64
> mm: Use common huge_ptep_set_wrprotect() function for riscv/arm64
> mm: Use common huge_ptep_clear_flush() function for riscv/arm64
>
> arch/arm64/Kconfig | 1 +
> arch/arm64/include/asm/hugetlb.h | 22 +--
> arch/arm64/include/asm/pgtable.h | 68 ++++++-
> arch/arm64/mm/hugetlbpage.c | 294 +---------------------------
> arch/riscv/Kconfig | 1 +
> arch/riscv/include/asm/hugetlb.h | 36 +---
> arch/riscv/include/asm/pgtable-64.h | 11 ++
> arch/riscv/include/asm/pgtable.h | 222 ++++++++++++++++++---
> arch/riscv/mm/hugetlbpage.c | 243 +----------------------
> arch/riscv/mm/pgtable.c | 6 +-
> include/linux/hugetlb_contpte.h | 39 ++++
> mm/Kconfig | 3 +
> mm/Makefile | 1 +
> mm/hugetlb_contpte.c | 258 ++++++++++++++++++++++++
> 14 files changed, 583 insertions(+), 622 deletions(-)
> create mode 100644 include/linux/hugetlb_contpte.h
> create mode 100644 mm/hugetlb_contpte.c
>
Hi Christophe,
On 21/03/2025 18:24, Christophe Leroy wrote:
>
>
> Le 21/03/2025 à 14:06, Alexandre Ghiti a écrit :
>> This patchset intends to merge the contiguous ptes hugetlbfs
>> implementation
>> of arm64 and riscv.
>
> Can we also add powerpc in the dance ?
>
> powerpc also use contiguous PTEs allthough there is not (yet) a
> special name for it:
> - b250c8c08c79 powerpc/8xx: Manage 512k huge pages as standard pages
> - e47168f3d1b1 powerpc/8xx: Support 16k hugepages with 4k pages
>
> powerpc also use configuous PMDs/PUDs for larger hugepages:
> - 57fb15c32f4f ("powerpc/64s: use contiguous PMD/PUD instead of HUGEPD")
> - 7c44202e3609 ("powerpc/e500: use contiguous PMD instead of hugepd")
> - 0549e7666373 ("powerpc/8xx: rework support for 8M pages using
> contiguous PTE entries")
So I have been looking at the powerpc hugetlb implementation and I have
to admit that I'm struggling to find similarities with how arm64 and
riscv deal with contiguous pte mappings.
I think the 2 main characteristics of contpte (arm64) and svnapot
(riscv) are the break-before-make requirement and the HW A/D update on
only a single pte. Those make the handling of hugetlb pages very similar
between arm64 and riscv.
But I may have missed something, the powerpc hugetlb implementation is
quite "scattered" because of the radix/hash page table and 32/64 bit.
Thanks,
Alex
>
> Christophe
>
>>
>> Both arm64 and riscv support the use of contiguous ptes to map pages
>> that
>> are larger than the default page table size, respectively called contpte
>> and svnapot.
>>
>> The riscv implementation differs from the arm64's in that the LSBs of
>> the
>> pfn of a svnapot pte are used to store the size of the mapping, allowing
>> for future sizes to be added (for now only 64KB is supported). That's an
>> issue for the core mm code which expects to find the *real* pfn a pte
>> points
>> to. Patch 1 fixes that by always returning svnapot ptes with the real
>> pfn
>> and restores the size of the mapping when it is written to a page table.
>>
>> The following patches are just merges of the 2 different implementations
>> that currently exist in arm64 and riscv which are very similar. It paves
>> the way to the reuse of the recent contpte THP work by Ryan [1] to avoid
>> reimplementing the same in riscv.
>>
>> This patchset was tested by running the libhugetlbfs testsuite with 64KB
>> and 2MB pages on both architectures (on a 4KB base page size arm64
>> kernel).
>>
>> [1]
>> https://lore.kernel.org/linux-arm-kernel/20240215103205.2607016-1-ryan.roberts@arm.com/
>>
>> v4:
>> https://lore.kernel.org/linux-riscv/20250127093530.19548-1-alexghiti@rivosinc.com/
>> v3:
>> https://lore.kernel.org/all/20240802151430.99114-1-alexghiti@rivosinc.com/
>> v2:
>> https://lore.kernel.org/linux-riscv/20240508113419.18620-1-alexghiti@rivosinc.com/
>> v1:
>> https://lore.kernel.org/linux-riscv/20240301091455.246686-1-alexghiti@rivosinc.com/
>>
>> Changes in v5:
>> - Fix "int i" unused variable in patch 2 (as reported by PW)
>> - Fix !svnapot build
>> - Fix arch_make_huge_pte() which returned a real napot pte
>> - Make __ptep_get(), ptep_get_and_clear() and __set_ptes() napot
>> aware to
>> avoid leaking real napot pfns to core mm
>> - Fix arch_contpte_get_num_contig() that used to always try to get
>> the
>> mapping size from the ptep, which does not work if the ptep
>> comes the core mm
>> - Rebase on top of 6.14-rc7 + fix for
>> huge_ptep_get_and_clear()/huge_pte_clear()
>> https://lore.kernel.org/linux-riscv/20250317072551.572169-1-alexghiti@rivosinc.com/
>>
>> Changes in v4:
>> - Rebase on top of 6.13
>>
>> Changes in v3:
>> - Split set_ptes and ptep_get into internal and external API (Ryan)
>> - Rename ARCH_HAS_CONTPTE into ARCH_WANT_GENERAL_HUGETLB_CONTPTE
>> so that
>> we split hugetlb functions from contpte functions (actually
>> riscv contpte
>> functions to support THP will come into another series) (Ryan)
>> - Rebase on top of 6.11-rc1
>>
>> Changes in v2:
>> - Rebase on top of 6.9-rc3
>>
>> Alexandre Ghiti (9):
>> riscv: Safely remove huge_pte_offset() when manipulating NAPOT ptes
>> riscv: Restore the pfn in a NAPOT pte when manipulated by core mm
>> code
>> mm: Use common huge_ptep_get() function for riscv/arm64
>> mm: Use common set_huge_pte_at() function for riscv/arm64
>> mm: Use common huge_pte_clear() function for riscv/arm64
>> mm: Use common huge_ptep_get_and_clear() function for riscv/arm64
>> mm: Use common huge_ptep_set_access_flags() function for riscv/arm64
>> mm: Use common huge_ptep_set_wrprotect() function for riscv/arm64
>> mm: Use common huge_ptep_clear_flush() function for riscv/arm64
>>
>> arch/arm64/Kconfig | 1 +
>> arch/arm64/include/asm/hugetlb.h | 22 +--
>> arch/arm64/include/asm/pgtable.h | 68 ++++++-
>> arch/arm64/mm/hugetlbpage.c | 294 +---------------------------
>> arch/riscv/Kconfig | 1 +
>> arch/riscv/include/asm/hugetlb.h | 36 +---
>> arch/riscv/include/asm/pgtable-64.h | 11 ++
>> arch/riscv/include/asm/pgtable.h | 222 ++++++++++++++++++---
>> arch/riscv/mm/hugetlbpage.c | 243 +----------------------
>> arch/riscv/mm/pgtable.c | 6 +-
>> include/linux/hugetlb_contpte.h | 39 ++++
>> mm/Kconfig | 3 +
>> mm/Makefile | 1 +
>> mm/hugetlb_contpte.c | 258 ++++++++++++++++++++++++
>> 14 files changed, 583 insertions(+), 622 deletions(-)
>> create mode 100644 include/linux/hugetlb_contpte.h
>> create mode 100644 mm/hugetlb_contpte.c
>>
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
© 2016 - 2025 Red Hat, Inc.