[RFC V1 00/16] arm64/mm: Enable 128 bit page table entries

Anshuman Khandual posted 16 patches 1 month, 3 weeks ago
arch/arm64/Kconfig                     |  39 ++++-
arch/arm64/Makefile                    |   4 +
arch/arm64/include/asm/assembler.h     |   4 +-
arch/arm64/include/asm/el2_setup.h     |   9 ++
arch/arm64/include/asm/pgtable-hwdef.h | 137 ++++++++++++++++++
arch/arm64/include/asm/pgtable-prot.h  |  18 ++-
arch/arm64/include/asm/pgtable-types.h |  12 ++
arch/arm64/include/asm/pgtable.h       | 193 ++++++++++++++++++-------
arch/arm64/include/asm/smp.h           |   1 +
arch/arm64/include/asm/tlbflush.h      | 112 ++++++++++++--
arch/arm64/kernel/head.S               |  12 ++
arch/arm64/mm/fault.c                  |  20 +--
arch/arm64/mm/fixmap.c                 |  24 ++-
arch/arm64/mm/hugetlbpage.c            |  10 +-
arch/arm64/mm/kasan_init.c             |  14 +-
arch/arm64/mm/mmu.c                    | 113 +++++++++++----
arch/arm64/mm/pageattr.c               |   8 +-
arch/arm64/mm/proc.S                   |  25 +++-
arch/arm64/mm/trans_pgd.c              |  14 +-
include/linux/pgtable.h                |  21 ++-
kernel/events/core.c                   |   6 +-
mm/debug_vm_pgtable.c                  |   4 +-
mm/huge_memory.c                       |   4 +-
mm/memory.c                            |  31 ++--
mm/migrate.c                           |   2 +-
mm/mmap.c                              |   2 +-
26 files changed, 674 insertions(+), 165 deletions(-)
[RFC V1 00/16] arm64/mm: Enable 128 bit page table entries
Posted by Anshuman Khandual 1 month, 3 weeks ago
FEAT_D128 is a new arm architecture feature adding support for VMSAv9-128
translation system. FEAT_D128 is an optional feature from ARMV9.3 onwards.
So with this feature arm64 platforms could have two different translation
systems, VMSAv8-64 and VMSAv9-128 could selectively be enabled.

FEAT_D128 adds 128 bit page table entries, thus supporting larger physical
and virtual address range while also expanding available room for more MMU
management feature bits both for HW and SW.

This series has been split into two parts. Generic MM changes followed by
arm64 platform changes, finally enabling D128 with a new config ARM64_D128.

READ_ONCE() on page table entries get routed via level specific pxdp_get()
helpers which platforms could then override when required. These accessors
on arm64 platform help in ensuring page table accesses are performed in an
atomic manner while reading 128 bit page table entries.

All ARM64_VA_BITS and ARM64_PA_BITS combinations for all page sizes are now
supported both on D64 and D128 translation regimes. Although new 56 bits VA
space is not yet supported. Similarly FEAT_D128 skip level is not supported
currently.

Basic page table geometry has been changed with D128 as there are now fewer
entries per level. Please refer to the following table for leaf entry sizes

                    D64              D128
------------------------------------------------
| PAGE_SIZE |   PMD  |  PUD  |   PMD  |   PUD  |
-----------------------------|-----------------|
|     4K    |    2M  |  1G   |    1M  |  256M  |
|    16K    |   32M  | 64G   |   16M  |   16G  |
|    64K    |  512M  |  4T   |  256M  |    1T  |
------------------------------------------------

From arm64 kernel features perspective KVM, KASAN and UNMAP_KERNEL_AT_EL0
are currently not supported as well.

Open Questions:

- Do we need to support UNMAP_KERNEL_AT_EL0 with D128
- Do we need to emulate traditional D64 sizes at PUD, PMD level with D128

This series applies on upstream kernel v7.0-rc1.

There are no apparent problems while running MM kselftests with and without
CONFIG_ARM64_D128. Besides the series has been built on other platform such
as x86, powerpc, riscv, arm and s390 etc.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Linu Cherian <linu.cherian@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org

Anshuman Khandual (15):
  mm: Abstract printing of pxd_val()
  mm: Add read-write accessors for vm_page_prot
  mm: Replace READ_ONCE() in pud_trans_unstable()
  perf/events: Replace READ_ONCE() with standard pgtable accessors
  arm64/mm: Convert READ_ONCE() as pmdp_get() while accessing PMD
  arm64/mm: Convert READ_ONCE() as pudp_get() while accessing PUD
  arm64/mm: Convert READ_ONCE() as p4dp_get() while accessing P4D
  arm64/mm: Convert READ_ONCE() as pgdp_get() while accessing PGD
  arm64/mm: Route all pgtable reads via ptdesc_get()
  arm64/mm: Route all pgtable writes via ptdesc_set()
  arm64/mm: Route all pgtable atomics to central helpers
  arm64/mm: Abstract printing of pxd_val()
  arm64/mm: Override read-write accessors for vm_page_prot
  arm64/mm: Enable fixmap with 5 level page table
  arm64/mm: Add initial support for FEAT_D128 page tables

Linu Cherian (1):
  arm64/mm: Add macros __tlb_asid_level and __tlb_range

 arch/arm64/Kconfig                     |  39 ++++-
 arch/arm64/Makefile                    |   4 +
 arch/arm64/include/asm/assembler.h     |   4 +-
 arch/arm64/include/asm/el2_setup.h     |   9 ++
 arch/arm64/include/asm/pgtable-hwdef.h | 137 ++++++++++++++++++
 arch/arm64/include/asm/pgtable-prot.h  |  18 ++-
 arch/arm64/include/asm/pgtable-types.h |  12 ++
 arch/arm64/include/asm/pgtable.h       | 193 ++++++++++++++++++-------
 arch/arm64/include/asm/smp.h           |   1 +
 arch/arm64/include/asm/tlbflush.h      | 112 ++++++++++++--
 arch/arm64/kernel/head.S               |  12 ++
 arch/arm64/mm/fault.c                  |  20 +--
 arch/arm64/mm/fixmap.c                 |  24 ++-
 arch/arm64/mm/hugetlbpage.c            |  10 +-
 arch/arm64/mm/kasan_init.c             |  14 +-
 arch/arm64/mm/mmu.c                    | 113 +++++++++++----
 arch/arm64/mm/pageattr.c               |   8 +-
 arch/arm64/mm/proc.S                   |  25 +++-
 arch/arm64/mm/trans_pgd.c              |  14 +-
 include/linux/pgtable.h                |  21 ++-
 kernel/events/core.c                   |   6 +-
 mm/debug_vm_pgtable.c                  |   4 +-
 mm/huge_memory.c                       |   4 +-
 mm/memory.c                            |  31 ++--
 mm/migrate.c                           |   2 +-
 mm/mmap.c                              |   2 +-
 26 files changed, 674 insertions(+), 165 deletions(-)

-- 
2.43.0
Re: [RFC V1 00/16] arm64/mm: Enable 128 bit page table entries
Posted by David Hildenbrand (Arm) 1 week, 2 days ago
On 2/24/26 06:11, Anshuman Khandual wrote:
> FEAT_D128 is a new arm architecture feature adding support for VMSAv9-128
> translation system. FEAT_D128 is an optional feature from ARMV9.3 onwards.
> So with this feature arm64 platforms could have two different translation
> systems, VMSAv8-64 and VMSAv9-128 could selectively be enabled.
> 
> FEAT_D128 adds 128 bit page table entries, thus supporting larger physical
> and virtual address range while also expanding available room for more MMU
> management feature bits both for HW and SW. 
> 
> This series has been split into two parts. Generic MM changes followed by
> arm64 platform changes, finally enabling D128 with a new config ARM64_D128.
> 
> READ_ONCE() on page table entries get routed via level specific pxdp_get()
> helpers which platforms could then override when required. These accessors
> on arm64 platform help in ensuring page table accesses are performed in an
> atomic manner while reading 128 bit page table entries.
> 
> All ARM64_VA_BITS and ARM64_PA_BITS combinations for all page sizes are now
> supported both on D64 and D128 translation regimes. Although new 56 bits VA
> space is not yet supported. Similarly FEAT_D128 skip level is not supported
> currently.
> 
> Basic page table geometry has been changed with D128 as there are now fewer
> entries per level. Please refer to the following table for leaf entry sizes
> 
>                     D64              D128
> ------------------------------------------------
> | PAGE_SIZE |   PMD  |  PUD  |   PMD  |   PUD  |
> -----------------------------|-----------------|
> |     4K    |    2M  |  1G   |    1M  |  256M  |
> |    16K    |   32M  | 64G   |   16M  |   16G  |
> |    64K    |  512M  |  4T   |  256M  |    1T  |
> ------------------------------------------------
> 

Interesting. That means user space will have it even harder to optimize
for THP sizes.

What's the effect on cont-pte? Do they still span the same number of
entries and there is effectively no change?

> From arm64 kernel features perspective KVM, KASAN and UNMAP_KERNEL_AT_EL0
> are currently not supported as well.
> 
> Open Questions:
> 
> - Do we need to support UNMAP_KERNEL_AT_EL0 with D128
> - Do we need to emulate traditional D64 sizes at PUD, PMD level with D128

It would certainly make user space interaction easier. But then, user
space already has to consider various PMD sizes (and is better of
querying /sys/kernel/mm/transparent_hugepage/hpage_pmd_size instead of
hardcoding it). s390x, for example, also has 1M PMD size.

I guess with "emulating" you mean something simple like always
allocating order-1 page tables that effectively have the same number of
page table entries?

The would be an option, but I recall that the pte_map_* infrastructure
currently expects that leaf page tables only ever span a single page.

So it wouldn't really give us a lot of easy benefit I guess.

-- 
Cheers,

David
Re: [RFC V1 00/16] arm64/mm: Enable 128 bit page table entries
Posted by Anshuman Khandual 1 week, 1 day ago
On 07/04/26 8:14 PM, David Hildenbrand (Arm) wrote:
> On 2/24/26 06:11, Anshuman Khandual wrote:
>> FEAT_D128 is a new arm architecture feature adding support for VMSAv9-128
>> translation system. FEAT_D128 is an optional feature from ARMV9.3 onwards.
>> So with this feature arm64 platforms could have two different translation
>> systems, VMSAv8-64 and VMSAv9-128 could selectively be enabled.
>>
>> FEAT_D128 adds 128 bit page table entries, thus supporting larger physical
>> and virtual address range while also expanding available room for more MMU
>> management feature bits both for HW and SW. 
>>
>> This series has been split into two parts. Generic MM changes followed by
>> arm64 platform changes, finally enabling D128 with a new config ARM64_D128.
>>
>> READ_ONCE() on page table entries get routed via level specific pxdp_get()
>> helpers which platforms could then override when required. These accessors
>> on arm64 platform help in ensuring page table accesses are performed in an
>> atomic manner while reading 128 bit page table entries.
>>
>> All ARM64_VA_BITS and ARM64_PA_BITS combinations for all page sizes are now
>> supported both on D64 and D128 translation regimes. Although new 56 bits VA
>> space is not yet supported. Similarly FEAT_D128 skip level is not supported
>> currently.
>>
>> Basic page table geometry has been changed with D128 as there are now fewer
>> entries per level. Please refer to the following table for leaf entry sizes
>>
>>                     D64              D128
>> ------------------------------------------------
>> | PAGE_SIZE |   PMD  |  PUD  |   PMD  |   PUD  |
>> -----------------------------|-----------------|
>> |     4K    |    2M  |  1G   |    1M  |  256M  |
>> |    16K    |   32M  | 64G   |   16M  |   16G  |
>> |    64K    |  512M  |  4T   |  256M  |    1T  |
>> ------------------------------------------------
>>
> 
> Interesting. That means user space will have it even harder to optimize
> for THP sizes.
> 
> What's the effect on cont-pte? Do they still span the same number of
> entries and there is effectively no change?

The numbers are the same for 4K base page size but will need
some changes for 16K and 64K base page sizes. Something that
git missed in this series, will fix it.

> 
>> From arm64 kernel features perspective KVM, KASAN and UNMAP_KERNEL_AT_EL0
>> are currently not supported as well.
>>
>> Open Questions:
>>
>> - Do we need to support UNMAP_KERNEL_AT_EL0 with D128
>> - Do we need to emulate traditional D64 sizes at PUD, PMD level with D128
> 
> It would certainly make user space interaction easier. But then, user
> space already has to consider various PMD sizes (and is better of
> querying /sys/kernel/mm/transparent_hugepage/hpage_pmd_size instead of
> hardcoding it). s390x, for example, also has 1M PMD size.
> > I guess with "emulating" you mean something simple like always
> allocating order-1 page tables that effectively have the same number of
> page table entries?

Yeah - thought something similar.

> 
> The would be an option, but I recall that the pte_map_* infrastructure
> currently expects that leaf page tables only ever span a single page.
> > So it wouldn't really give us a lot of easy benefit I guess.

Right. So probably need to figure all other benefits this might
add besides just the user space facing interactions as you have
mentioned earlier.
Re: [RFC V1 00/16] arm64/mm: Enable 128 bit page table entries
Posted by David Hildenbrand (Arm) 1 week, 1 day ago
On 4/8/26 12:53, Anshuman Khandual wrote:
> On 07/04/26 8:14 PM, David Hildenbrand (Arm) wrote:
>> On 2/24/26 06:11, Anshuman Khandual wrote:
>>> FEAT_D128 is a new arm architecture feature adding support for VMSAv9-128
>>> translation system. FEAT_D128 is an optional feature from ARMV9.3 onwards.
>>> So with this feature arm64 platforms could have two different translation
>>> systems, VMSAv8-64 and VMSAv9-128 could selectively be enabled.
>>>
>>> FEAT_D128 adds 128 bit page table entries, thus supporting larger physical
>>> and virtual address range while also expanding available room for more MMU
>>> management feature bits both for HW and SW. 
>>>
>>> This series has been split into two parts. Generic MM changes followed by
>>> arm64 platform changes, finally enabling D128 with a new config ARM64_D128.
>>>
>>> READ_ONCE() on page table entries get routed via level specific pxdp_get()
>>> helpers which platforms could then override when required. These accessors
>>> on arm64 platform help in ensuring page table accesses are performed in an
>>> atomic manner while reading 128 bit page table entries.
>>>
>>> All ARM64_VA_BITS and ARM64_PA_BITS combinations for all page sizes are now
>>> supported both on D64 and D128 translation regimes. Although new 56 bits VA
>>> space is not yet supported. Similarly FEAT_D128 skip level is not supported
>>> currently.
>>>
>>> Basic page table geometry has been changed with D128 as there are now fewer
>>> entries per level. Please refer to the following table for leaf entry sizes
>>>
>>>                     D64              D128
>>> ------------------------------------------------
>>> | PAGE_SIZE |   PMD  |  PUD  |   PMD  |   PUD  |
>>> -----------------------------|-----------------|
>>> |     4K    |    2M  |  1G   |    1M  |  256M  |
>>> |    16K    |   32M  | 64G   |   16M  |   16G  |
>>> |    64K    |  512M  |  4T   |  256M  |    1T  |
>>> ------------------------------------------------
>>>
>>
>> Interesting. That means user space will have it even harder to optimize
>> for THP sizes.
>>
>> What's the effect on cont-pte? Do they still span the same number of
>> entries and there is effectively no change?
> 
> The numbers are the same for 4K base page size but will need
> some changes for 16K and 64K base page sizes. Something that
> git missed in this series, will fix it.

Oh, and it would be great to also clearly spell out the effect on
hugetlb as well. I assume the available hugetlb sizes will change as well.

-- 
Cheers,

David
Re: [RFC V1 00/16] arm64/mm: Enable 128 bit page table entries
Posted by Anshuman Khandual 1 week ago

On 08/04/26 5:43 PM, David Hildenbrand (Arm) wrote:
> On 4/8/26 12:53, Anshuman Khandual wrote:
>> On 07/04/26 8:14 PM, David Hildenbrand (Arm) wrote:
>>> On 2/24/26 06:11, Anshuman Khandual wrote:
>>>> FEAT_D128 is a new arm architecture feature adding support for VMSAv9-128
>>>> translation system. FEAT_D128 is an optional feature from ARMV9.3 onwards.
>>>> So with this feature arm64 platforms could have two different translation
>>>> systems, VMSAv8-64 and VMSAv9-128 could selectively be enabled.
>>>>
>>>> FEAT_D128 adds 128 bit page table entries, thus supporting larger physical
>>>> and virtual address range while also expanding available room for more MMU
>>>> management feature bits both for HW and SW. 
>>>>
>>>> This series has been split into two parts. Generic MM changes followed by
>>>> arm64 platform changes, finally enabling D128 with a new config ARM64_D128.
>>>>
>>>> READ_ONCE() on page table entries get routed via level specific pxdp_get()
>>>> helpers which platforms could then override when required. These accessors
>>>> on arm64 platform help in ensuring page table accesses are performed in an
>>>> atomic manner while reading 128 bit page table entries.
>>>>
>>>> All ARM64_VA_BITS and ARM64_PA_BITS combinations for all page sizes are now
>>>> supported both on D64 and D128 translation regimes. Although new 56 bits VA
>>>> space is not yet supported. Similarly FEAT_D128 skip level is not supported
>>>> currently.
>>>>
>>>> Basic page table geometry has been changed with D128 as there are now fewer
>>>> entries per level. Please refer to the following table for leaf entry sizes
>>>>
>>>>                     D64              D128
>>>> ------------------------------------------------
>>>> | PAGE_SIZE |   PMD  |  PUD  |   PMD  |   PUD  |
>>>> -----------------------------|-----------------|
>>>> |     4K    |    2M  |  1G   |    1M  |  256M  |
>>>> |    16K    |   32M  | 64G   |   16M  |   16G  |
>>>> |    64K    |  512M  |  4T   |  256M  |    1T  |
>>>> ------------------------------------------------
>>>>
>>>
>>> Interesting. That means user space will have it even harder to optimize
>>> for THP sizes.
>>>
>>> What's the effect on cont-pte? Do they still span the same number of
>>> entries and there is effectively no change?
>>
>> The numbers are the same for 4K base page size but will need
>> some changes for 16K and 64K base page sizes. Something that
>> git missed in this series, will fix it.
> 
> Oh, and it would be great to also clearly spell out the effect on
> hugetlb as well. I assume the available hugetlb sizes will change as well.

Sure will update the required information in the commit message as well as in
file arch/arm64/mm/hugetlb.c, where HugeTLB sizes support matrix is enlisted.
Re: [RFC V1 00/16] arm64/mm: Enable 128 bit page table entries
Posted by Ryan Roberts 1 week, 1 day ago
On 08/04/2026 11:53, Anshuman Khandual wrote:
> On 07/04/26 8:14 PM, David Hildenbrand (Arm) wrote:
>> On 2/24/26 06:11, Anshuman Khandual wrote:
>>> FEAT_D128 is a new arm architecture feature adding support for VMSAv9-128
>>> translation system. FEAT_D128 is an optional feature from ARMV9.3 onwards.
>>> So with this feature arm64 platforms could have two different translation
>>> systems, VMSAv8-64 and VMSAv9-128 could selectively be enabled.
>>>
>>> FEAT_D128 adds 128 bit page table entries, thus supporting larger physical
>>> and virtual address range while also expanding available room for more MMU
>>> management feature bits both for HW and SW. 
>>>
>>> This series has been split into two parts. Generic MM changes followed by
>>> arm64 platform changes, finally enabling D128 with a new config ARM64_D128.
>>>
>>> READ_ONCE() on page table entries get routed via level specific pxdp_get()
>>> helpers which platforms could then override when required. These accessors
>>> on arm64 platform help in ensuring page table accesses are performed in an
>>> atomic manner while reading 128 bit page table entries.
>>>
>>> All ARM64_VA_BITS and ARM64_PA_BITS combinations for all page sizes are now
>>> supported both on D64 and D128 translation regimes. Although new 56 bits VA
>>> space is not yet supported. Similarly FEAT_D128 skip level is not supported
>>> currently.
>>>
>>> Basic page table geometry has been changed with D128 as there are now fewer
>>> entries per level. Please refer to the following table for leaf entry sizes
>>>
>>>                     D64              D128
>>> ------------------------------------------------
>>> | PAGE_SIZE |   PMD  |  PUD  |   PMD  |   PUD  |
>>> -----------------------------|-----------------|
>>> |     4K    |    2M  |  1G   |    1M  |  256M  |
>>> |    16K    |   32M  | 64G   |   16M  |   16G  |
>>> |    64K    |  512M  |  4T   |  256M  |    1T  |
>>> ------------------------------------------------
>>>
>>
>> Interesting. That means user space will have it even harder to optimize
>> for THP sizes.
>>
>> What's the effect on cont-pte? Do they still span the same number of
>> entries and there is effectively no change?
> 
> The numbers are the same for 4K base page size but will need
> some changes for 16K and 64K base page sizes. Something that
> git missed in this series, will fix it.

Really - I thought the contiguous sizes were the same for D128 as they are for
D64? What's the difference? Perhaps it's different for level 2, but for level 3,
I'm pretty sure it remains:

PAGE_SIZE	CONT_SIZE	NR_PTES		CONT_ORDER
4K		64K		16		4
16K		2M		128		7
64K		2M		32		5

Thanks,
Ryan

> 
>>
>>> From arm64 kernel features perspective KVM, KASAN and UNMAP_KERNEL_AT_EL0
>>> are currently not supported as well.
>>>
>>> Open Questions:
>>>
>>> - Do we need to support UNMAP_KERNEL_AT_EL0 with D128
>>> - Do we need to emulate traditional D64 sizes at PUD, PMD level with D128
>>
>> It would certainly make user space interaction easier. But then, user
>> space already has to consider various PMD sizes (and is better of
>> querying /sys/kernel/mm/transparent_hugepage/hpage_pmd_size instead of
>> hardcoding it). s390x, for example, also has 1M PMD size.
>>> I guess with "emulating" you mean something simple like always
>> allocating order-1 page tables that effectively have the same number of
>> page table entries?
> 
> Yeah - thought something similar.
> 
>>
>> The would be an option, but I recall that the pte_map_* infrastructure
>> currently expects that leaf page tables only ever span a single page.
>>> So it wouldn't really give us a lot of easy benefit I guess.
> 
> Right. So probably need to figure all other benefits this might
> add besides just the user space facing interactions as you have
> mentioned earlier.