arch/arm64/Kconfig | 39 ++++- arch/arm64/Makefile | 4 + arch/arm64/include/asm/assembler.h | 4 +- arch/arm64/include/asm/el2_setup.h | 9 ++ arch/arm64/include/asm/pgtable-hwdef.h | 137 ++++++++++++++++++ arch/arm64/include/asm/pgtable-prot.h | 18 ++- arch/arm64/include/asm/pgtable-types.h | 12 ++ arch/arm64/include/asm/pgtable.h | 193 ++++++++++++++++++------- arch/arm64/include/asm/smp.h | 1 + arch/arm64/include/asm/tlbflush.h | 112 ++++++++++++-- arch/arm64/kernel/head.S | 12 ++ arch/arm64/mm/fault.c | 20 +-- arch/arm64/mm/fixmap.c | 24 ++- arch/arm64/mm/hugetlbpage.c | 10 +- arch/arm64/mm/kasan_init.c | 14 +- arch/arm64/mm/mmu.c | 113 +++++++++++---- arch/arm64/mm/pageattr.c | 8 +- arch/arm64/mm/proc.S | 25 +++- arch/arm64/mm/trans_pgd.c | 14 +- include/linux/pgtable.h | 21 ++- kernel/events/core.c | 6 +- mm/debug_vm_pgtable.c | 4 +- mm/huge_memory.c | 4 +- mm/memory.c | 31 ++-- mm/migrate.c | 2 +- mm/mmap.c | 2 +- 26 files changed, 674 insertions(+), 165 deletions(-)
FEAT_D128 is a new arm architecture feature adding support for VMSAv9-128
translation system. FEAT_D128 is an optional feature from ARMV9.3 onwards.
So with this feature arm64 platforms could have two different translation
systems, VMSAv8-64 and VMSAv9-128 could selectively be enabled.
FEAT_D128 adds 128 bit page table entries, thus supporting larger physical
and virtual address range while also expanding available room for more MMU
management feature bits both for HW and SW.
This series has been split into two parts. Generic MM changes followed by
arm64 platform changes, finally enabling D128 with a new config ARM64_D128.
READ_ONCE() on page table entries get routed via level specific pxdp_get()
helpers which platforms could then override when required. These accessors
on arm64 platform help in ensuring page table accesses are performed in an
atomic manner while reading 128 bit page table entries.
All ARM64_VA_BITS and ARM64_PA_BITS combinations for all page sizes are now
supported both on D64 and D128 translation regimes. Although new 56 bits VA
space is not yet supported. Similarly FEAT_D128 skip level is not supported
currently.
Basic page table geometry has been changed with D128 as there are now fewer
entries per level. Please refer to the following table for leaf entry sizes
D64 D128
------------------------------------------------
| PAGE_SIZE | PMD | PUD | PMD | PUD |
-----------------------------|-----------------|
| 4K | 2M | 1G | 1M | 256M |
| 16K | 32M | 64G | 16M | 16G |
| 64K | 512M | 4T | 256M | 1T |
------------------------------------------------
From arm64 kernel features perspective KVM, KASAN and UNMAP_KERNEL_AT_EL0
are currently not supported as well.
Open Questions:
- Do we need to support UNMAP_KERNEL_AT_EL0 with D128
- Do we need to emulate traditional D64 sizes at PUD, PMD level with D128
This series applies on upstream kernel v7.0-rc1.
There are no apparent problems while running MM kselftests with and without
CONFIG_ARM64_D128. Besides the series has been built on other platform such
as x86, powerpc, riscv, arm and s390 etc.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Linu Cherian <linu.cherian@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Anshuman Khandual (15):
mm: Abstract printing of pxd_val()
mm: Add read-write accessors for vm_page_prot
mm: Replace READ_ONCE() in pud_trans_unstable()
perf/events: Replace READ_ONCE() with standard pgtable accessors
arm64/mm: Convert READ_ONCE() as pmdp_get() while accessing PMD
arm64/mm: Convert READ_ONCE() as pudp_get() while accessing PUD
arm64/mm: Convert READ_ONCE() as p4dp_get() while accessing P4D
arm64/mm: Convert READ_ONCE() as pgdp_get() while accessing PGD
arm64/mm: Route all pgtable reads via ptdesc_get()
arm64/mm: Route all pgtable writes via ptdesc_set()
arm64/mm: Route all pgtable atomics to central helpers
arm64/mm: Abstract printing of pxd_val()
arm64/mm: Override read-write accessors for vm_page_prot
arm64/mm: Enable fixmap with 5 level page table
arm64/mm: Add initial support for FEAT_D128 page tables
Linu Cherian (1):
arm64/mm: Add macros __tlb_asid_level and __tlb_range
arch/arm64/Kconfig | 39 ++++-
arch/arm64/Makefile | 4 +
arch/arm64/include/asm/assembler.h | 4 +-
arch/arm64/include/asm/el2_setup.h | 9 ++
arch/arm64/include/asm/pgtable-hwdef.h | 137 ++++++++++++++++++
arch/arm64/include/asm/pgtable-prot.h | 18 ++-
arch/arm64/include/asm/pgtable-types.h | 12 ++
arch/arm64/include/asm/pgtable.h | 193 ++++++++++++++++++-------
arch/arm64/include/asm/smp.h | 1 +
arch/arm64/include/asm/tlbflush.h | 112 ++++++++++++--
arch/arm64/kernel/head.S | 12 ++
arch/arm64/mm/fault.c | 20 +--
arch/arm64/mm/fixmap.c | 24 ++-
arch/arm64/mm/hugetlbpage.c | 10 +-
arch/arm64/mm/kasan_init.c | 14 +-
arch/arm64/mm/mmu.c | 113 +++++++++++----
arch/arm64/mm/pageattr.c | 8 +-
arch/arm64/mm/proc.S | 25 +++-
arch/arm64/mm/trans_pgd.c | 14 +-
include/linux/pgtable.h | 21 ++-
kernel/events/core.c | 6 +-
mm/debug_vm_pgtable.c | 4 +-
mm/huge_memory.c | 4 +-
mm/memory.c | 31 ++--
mm/migrate.c | 2 +-
mm/mmap.c | 2 +-
26 files changed, 674 insertions(+), 165 deletions(-)
--
2.43.0
On 2/24/26 06:11, Anshuman Khandual wrote: > FEAT_D128 is a new arm architecture feature adding support for VMSAv9-128 > translation system. FEAT_D128 is an optional feature from ARMV9.3 onwards. > So with this feature arm64 platforms could have two different translation > systems, VMSAv8-64 and VMSAv9-128 could selectively be enabled. > > FEAT_D128 adds 128 bit page table entries, thus supporting larger physical > and virtual address range while also expanding available room for more MMU > management feature bits both for HW and SW. > > This series has been split into two parts. Generic MM changes followed by > arm64 platform changes, finally enabling D128 with a new config ARM64_D128. > > READ_ONCE() on page table entries get routed via level specific pxdp_get() > helpers which platforms could then override when required. These accessors > on arm64 platform help in ensuring page table accesses are performed in an > atomic manner while reading 128 bit page table entries. > > All ARM64_VA_BITS and ARM64_PA_BITS combinations for all page sizes are now > supported both on D64 and D128 translation regimes. Although new 56 bits VA > space is not yet supported. Similarly FEAT_D128 skip level is not supported > currently. > > Basic page table geometry has been changed with D128 as there are now fewer > entries per level. Please refer to the following table for leaf entry sizes > > D64 D128 > ------------------------------------------------ > | PAGE_SIZE | PMD | PUD | PMD | PUD | > -----------------------------|-----------------| > | 4K | 2M | 1G | 1M | 256M | > | 16K | 32M | 64G | 16M | 16G | > | 64K | 512M | 4T | 256M | 1T | > ------------------------------------------------ > Interesting. That means user space will have it even harder to optimize for THP sizes. What's the effect on cont-pte? Do they still span the same number of entries and there is effectively no change? > From arm64 kernel features perspective KVM, KASAN and UNMAP_KERNEL_AT_EL0 > are currently not supported as well. > > Open Questions: > > - Do we need to support UNMAP_KERNEL_AT_EL0 with D128 > - Do we need to emulate traditional D64 sizes at PUD, PMD level with D128 It would certainly make user space interaction easier. But then, user space already has to consider various PMD sizes (and is better of querying /sys/kernel/mm/transparent_hugepage/hpage_pmd_size instead of hardcoding it). s390x, for example, also has 1M PMD size. I guess with "emulating" you mean something simple like always allocating order-1 page tables that effectively have the same number of page table entries? The would be an option, but I recall that the pte_map_* infrastructure currently expects that leaf page tables only ever span a single page. So it wouldn't really give us a lot of easy benefit I guess. -- Cheers, David
On 07/04/26 8:14 PM, David Hildenbrand (Arm) wrote: > On 2/24/26 06:11, Anshuman Khandual wrote: >> FEAT_D128 is a new arm architecture feature adding support for VMSAv9-128 >> translation system. FEAT_D128 is an optional feature from ARMV9.3 onwards. >> So with this feature arm64 platforms could have two different translation >> systems, VMSAv8-64 and VMSAv9-128 could selectively be enabled. >> >> FEAT_D128 adds 128 bit page table entries, thus supporting larger physical >> and virtual address range while also expanding available room for more MMU >> management feature bits both for HW and SW. >> >> This series has been split into two parts. Generic MM changes followed by >> arm64 platform changes, finally enabling D128 with a new config ARM64_D128. >> >> READ_ONCE() on page table entries get routed via level specific pxdp_get() >> helpers which platforms could then override when required. These accessors >> on arm64 platform help in ensuring page table accesses are performed in an >> atomic manner while reading 128 bit page table entries. >> >> All ARM64_VA_BITS and ARM64_PA_BITS combinations for all page sizes are now >> supported both on D64 and D128 translation regimes. Although new 56 bits VA >> space is not yet supported. Similarly FEAT_D128 skip level is not supported >> currently. >> >> Basic page table geometry has been changed with D128 as there are now fewer >> entries per level. Please refer to the following table for leaf entry sizes >> >> D64 D128 >> ------------------------------------------------ >> | PAGE_SIZE | PMD | PUD | PMD | PUD | >> -----------------------------|-----------------| >> | 4K | 2M | 1G | 1M | 256M | >> | 16K | 32M | 64G | 16M | 16G | >> | 64K | 512M | 4T | 256M | 1T | >> ------------------------------------------------ >> > > Interesting. That means user space will have it even harder to optimize > for THP sizes. > > What's the effect on cont-pte? Do they still span the same number of > entries and there is effectively no change? The numbers are the same for 4K base page size but will need some changes for 16K and 64K base page sizes. Something that git missed in this series, will fix it. > >> From arm64 kernel features perspective KVM, KASAN and UNMAP_KERNEL_AT_EL0 >> are currently not supported as well. >> >> Open Questions: >> >> - Do we need to support UNMAP_KERNEL_AT_EL0 with D128 >> - Do we need to emulate traditional D64 sizes at PUD, PMD level with D128 > > It would certainly make user space interaction easier. But then, user > space already has to consider various PMD sizes (and is better of > querying /sys/kernel/mm/transparent_hugepage/hpage_pmd_size instead of > hardcoding it). s390x, for example, also has 1M PMD size. > > I guess with "emulating" you mean something simple like always > allocating order-1 page tables that effectively have the same number of > page table entries? Yeah - thought something similar. > > The would be an option, but I recall that the pte_map_* infrastructure > currently expects that leaf page tables only ever span a single page. > > So it wouldn't really give us a lot of easy benefit I guess. Right. So probably need to figure all other benefits this might add besides just the user space facing interactions as you have mentioned earlier.
On 4/8/26 12:53, Anshuman Khandual wrote: > On 07/04/26 8:14 PM, David Hildenbrand (Arm) wrote: >> On 2/24/26 06:11, Anshuman Khandual wrote: >>> FEAT_D128 is a new arm architecture feature adding support for VMSAv9-128 >>> translation system. FEAT_D128 is an optional feature from ARMV9.3 onwards. >>> So with this feature arm64 platforms could have two different translation >>> systems, VMSAv8-64 and VMSAv9-128 could selectively be enabled. >>> >>> FEAT_D128 adds 128 bit page table entries, thus supporting larger physical >>> and virtual address range while also expanding available room for more MMU >>> management feature bits both for HW and SW. >>> >>> This series has been split into two parts. Generic MM changes followed by >>> arm64 platform changes, finally enabling D128 with a new config ARM64_D128. >>> >>> READ_ONCE() on page table entries get routed via level specific pxdp_get() >>> helpers which platforms could then override when required. These accessors >>> on arm64 platform help in ensuring page table accesses are performed in an >>> atomic manner while reading 128 bit page table entries. >>> >>> All ARM64_VA_BITS and ARM64_PA_BITS combinations for all page sizes are now >>> supported both on D64 and D128 translation regimes. Although new 56 bits VA >>> space is not yet supported. Similarly FEAT_D128 skip level is not supported >>> currently. >>> >>> Basic page table geometry has been changed with D128 as there are now fewer >>> entries per level. Please refer to the following table for leaf entry sizes >>> >>> D64 D128 >>> ------------------------------------------------ >>> | PAGE_SIZE | PMD | PUD | PMD | PUD | >>> -----------------------------|-----------------| >>> | 4K | 2M | 1G | 1M | 256M | >>> | 16K | 32M | 64G | 16M | 16G | >>> | 64K | 512M | 4T | 256M | 1T | >>> ------------------------------------------------ >>> >> >> Interesting. That means user space will have it even harder to optimize >> for THP sizes. >> >> What's the effect on cont-pte? Do they still span the same number of >> entries and there is effectively no change? > > The numbers are the same for 4K base page size but will need > some changes for 16K and 64K base page sizes. Something that > git missed in this series, will fix it. Oh, and it would be great to also clearly spell out the effect on hugetlb as well. I assume the available hugetlb sizes will change as well. -- Cheers, David
On 08/04/26 5:43 PM, David Hildenbrand (Arm) wrote: > On 4/8/26 12:53, Anshuman Khandual wrote: >> On 07/04/26 8:14 PM, David Hildenbrand (Arm) wrote: >>> On 2/24/26 06:11, Anshuman Khandual wrote: >>>> FEAT_D128 is a new arm architecture feature adding support for VMSAv9-128 >>>> translation system. FEAT_D128 is an optional feature from ARMV9.3 onwards. >>>> So with this feature arm64 platforms could have two different translation >>>> systems, VMSAv8-64 and VMSAv9-128 could selectively be enabled. >>>> >>>> FEAT_D128 adds 128 bit page table entries, thus supporting larger physical >>>> and virtual address range while also expanding available room for more MMU >>>> management feature bits both for HW and SW. >>>> >>>> This series has been split into two parts. Generic MM changes followed by >>>> arm64 platform changes, finally enabling D128 with a new config ARM64_D128. >>>> >>>> READ_ONCE() on page table entries get routed via level specific pxdp_get() >>>> helpers which platforms could then override when required. These accessors >>>> on arm64 platform help in ensuring page table accesses are performed in an >>>> atomic manner while reading 128 bit page table entries. >>>> >>>> All ARM64_VA_BITS and ARM64_PA_BITS combinations for all page sizes are now >>>> supported both on D64 and D128 translation regimes. Although new 56 bits VA >>>> space is not yet supported. Similarly FEAT_D128 skip level is not supported >>>> currently. >>>> >>>> Basic page table geometry has been changed with D128 as there are now fewer >>>> entries per level. Please refer to the following table for leaf entry sizes >>>> >>>> D64 D128 >>>> ------------------------------------------------ >>>> | PAGE_SIZE | PMD | PUD | PMD | PUD | >>>> -----------------------------|-----------------| >>>> | 4K | 2M | 1G | 1M | 256M | >>>> | 16K | 32M | 64G | 16M | 16G | >>>> | 64K | 512M | 4T | 256M | 1T | >>>> ------------------------------------------------ >>>> >>> >>> Interesting. That means user space will have it even harder to optimize >>> for THP sizes. >>> >>> What's the effect on cont-pte? Do they still span the same number of >>> entries and there is effectively no change? >> >> The numbers are the same for 4K base page size but will need >> some changes for 16K and 64K base page sizes. Something that >> git missed in this series, will fix it. > > Oh, and it would be great to also clearly spell out the effect on > hugetlb as well. I assume the available hugetlb sizes will change as well. Sure will update the required information in the commit message as well as in file arch/arm64/mm/hugetlb.c, where HugeTLB sizes support matrix is enlisted.
On 08/04/2026 11:53, Anshuman Khandual wrote: > On 07/04/26 8:14 PM, David Hildenbrand (Arm) wrote: >> On 2/24/26 06:11, Anshuman Khandual wrote: >>> FEAT_D128 is a new arm architecture feature adding support for VMSAv9-128 >>> translation system. FEAT_D128 is an optional feature from ARMV9.3 onwards. >>> So with this feature arm64 platforms could have two different translation >>> systems, VMSAv8-64 and VMSAv9-128 could selectively be enabled. >>> >>> FEAT_D128 adds 128 bit page table entries, thus supporting larger physical >>> and virtual address range while also expanding available room for more MMU >>> management feature bits both for HW and SW. >>> >>> This series has been split into two parts. Generic MM changes followed by >>> arm64 platform changes, finally enabling D128 with a new config ARM64_D128. >>> >>> READ_ONCE() on page table entries get routed via level specific pxdp_get() >>> helpers which platforms could then override when required. These accessors >>> on arm64 platform help in ensuring page table accesses are performed in an >>> atomic manner while reading 128 bit page table entries. >>> >>> All ARM64_VA_BITS and ARM64_PA_BITS combinations for all page sizes are now >>> supported both on D64 and D128 translation regimes. Although new 56 bits VA >>> space is not yet supported. Similarly FEAT_D128 skip level is not supported >>> currently. >>> >>> Basic page table geometry has been changed with D128 as there are now fewer >>> entries per level. Please refer to the following table for leaf entry sizes >>> >>> D64 D128 >>> ------------------------------------------------ >>> | PAGE_SIZE | PMD | PUD | PMD | PUD | >>> -----------------------------|-----------------| >>> | 4K | 2M | 1G | 1M | 256M | >>> | 16K | 32M | 64G | 16M | 16G | >>> | 64K | 512M | 4T | 256M | 1T | >>> ------------------------------------------------ >>> >> >> Interesting. That means user space will have it even harder to optimize >> for THP sizes. >> >> What's the effect on cont-pte? Do they still span the same number of >> entries and there is effectively no change? > > The numbers are the same for 4K base page size but will need > some changes for 16K and 64K base page sizes. Something that > git missed in this series, will fix it. Really - I thought the contiguous sizes were the same for D128 as they are for D64? What's the difference? Perhaps it's different for level 2, but for level 3, I'm pretty sure it remains: PAGE_SIZE CONT_SIZE NR_PTES CONT_ORDER 4K 64K 16 4 16K 2M 128 7 64K 2M 32 5 Thanks, Ryan > >> >>> From arm64 kernel features perspective KVM, KASAN and UNMAP_KERNEL_AT_EL0 >>> are currently not supported as well. >>> >>> Open Questions: >>> >>> - Do we need to support UNMAP_KERNEL_AT_EL0 with D128 >>> - Do we need to emulate traditional D64 sizes at PUD, PMD level with D128 >> >> It would certainly make user space interaction easier. But then, user >> space already has to consider various PMD sizes (and is better of >> querying /sys/kernel/mm/transparent_hugepage/hpage_pmd_size instead of >> hardcoding it). s390x, for example, also has 1M PMD size. >>> I guess with "emulating" you mean something simple like always >> allocating order-1 page tables that effectively have the same number of >> page table entries? > > Yeah - thought something similar. > >> >> The would be an option, but I recall that the pte_map_* infrastructure >> currently expects that leaf page tables only ever span a single page. >>> So it wouldn't really give us a lot of easy benefit I guess. > > Right. So probably need to figure all other benefits this might > add besides just the user space facing interactions as you have > mentioned earlier.
© 2016 - 2026 Red Hat, Inc.