[PATCH v7 00/15] arm64: Unmap linear alias of kernel data/bss

Ard Biesheuvel posted 15 patches 1 week, 2 days ago
arch/arm64/include/asm/mmu.h     |   2 +
arch/arm64/include/asm/pgtable.h |   4 +
arch/arm64/kernel/vmlinux.lds.S  |   8 +-
arch/arm64/mm/fixmap.c           |   6 +-
arch/arm64/mm/kasan_init.c       |   2 +-
arch/arm64/mm/mmu.c              | 164 ++++++++++++--------
arch/powerpc/lib/code-patching.c |  52 +------
arch/sh/mm/init.c                |   3 -
include/linux/pgtable.h          |   2 +-
mm/mm_init.c                     |   2 +-
10 files changed, 121 insertions(+), 124 deletions(-)
[PATCH v7 00/15] arm64: Unmap linear alias of kernel data/bss
Posted by Ard Biesheuvel 1 week, 2 days ago
From: Ard Biesheuvel <ardb@kernel.org>

One of the reasons the lack of randomization of the linear map on arm64
is considered problematic is the fact that bootloaders adhering to the
original arm64 boot protocol (i.e., a substantial fraction of all
Android phones) may place the kernel at the base of DRAM, and therefore
at the base of the non-randomized linear map. This puts a writable alias
of the kernel's data and bss regions at a predictable location, removing
the need for an attacker to guess where KASLR mapped the kernel.

Let's unmap this linear, writable alias entirely, so that knowing the
location of the linear alias does not give write access to the kernel's
data and bss regions.

Changes since v6:
- Improve commits logs and comments
- Add acks from Kevin
- Reorder patches so remapping data/bss R/O occurs after moving the zero
  page into .rodata
- Drop zero page cache flush from SuperH rather than casting away the
  constness
- Map kfence pool with NO_EXEC_MAPPINGS

Note that Sashiko had some comments on patch 15/15 [1] but none of those
seem accurate. (I have tested both suspend/resume and hibernate under
QEMU and both work as expected)

Changes since v5:
- Reorder series in ascending order of impact, so that the first few can
  be merged earlier if desired. This also makes the patch that remaps
  the data/bss linear alias as tagged redundant, which is therefore
  dropped.
- Add patch #3 to address an existing issue spotted by Sashiko
- Fix thinko in contiguous region check (#5), where the whole region
  needs to be considered and not only the first entry (dropped Rb as
  well) - this addresses the kfence issue Sashiko reported on v5 [0]
- Update commit log on #6 to clarify that changing permission bits on
  PTE_CONT entries is safe as long as PTE_CONT itself does not change
- Likewise, drop hunk that adds the PTE_CONT bit to the 'permitted' mask
  in pgattr_change_is_safe(), as changing it is not safe. (#8)
- Move kasan's additional page table to pgdir BSS as well
- Use (NOLOAD) on the .pgdir.bss section so it does not get emitted into
  vmlinux
- Add powerpc and SuperH patches to deal with empty_zero_page[] being
  made const

Changes since v4:
- Update the correct [early] mapping in patch #1
- Make empty_zero_page[] const instead of __ro_after_init
- Drop patches that remap the fixmap page tables r/o for now
- Don't force page mappings for the data/bss linear alias, as it is no
  longer needed for set_memory_valid()
- Add acks

Changes since v3:
- Drop bogus patch adding hierarchical PXN to the fixmap mapping, which
  breaks the KPTI trampoline (thanks to Sashiko)
- Add generic patch to move the empty_zero_page to __ro_after_init, as
  it now lives in generic code.
- Add patches to remap the linear aliases of the fixmap page tables
  read-only too - these live at an a priori known offset in the linear
  map if physical KASLR was omitted, and control a priori known
  addresses in the virtual kernel space.
- Rebase onto v7.1-rc1

Changes since v2:
- Keep bm_pte[] in the region that is remapped r/o or unmapped, as it is
  only manipulated via its kernel alias
- Drop check that prohibits any manipulation of descriptors with the
  CONT bit set
- Add Ryan's ack to a couple of patches
- Rebase onto v7.0-rc4

Changes since v1:
- Put zero page patch at the start of the series
- Tweak __map_memblock() API to respect existing table and contiguous
  mappings, so that the logic to map the kernel alias can be simplified
- Stop abusing the MEMBLOCK_NOMAP flag to initially omit the kernel
  linear alias from the linear map
- Some additional cleanup patches
- Use proper API [set_memory_valid()] to (un)map the linear alias of
  data/bss.

Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Liz Prucka <lizprucka@google.com>
Cc: Seth Jenkins <sethjenkins@google.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jann Horn <jannh@google.com>
Cc: linux-mm@kvack.org
Cc: linux-hardening@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-sh@vger.kernel.org

[0] https://sashiko.dev/#/patchset/20260519151616.2557018-15-ardb%2Bgit%40google.com
[1] https://sashiko.dev/#/patchset/20260526175846.2694125-17-ardb%2Bgit%40google.com

Ard Biesheuvel (15):
  arm64: mm: Remove bogus stop condition from map_mem() loop
  arm64: mm: Drop redundant pgd_t* argument from map_mem()
  arm64: mm: Check for pud_/pmd_set_huge() failures on kernel mappings
  arm64: mm: Preserve existing table mappings when mapping DRAM
  arm64: mm: Preserve non-contiguous descriptors when mapping DRAM
  arm64: mm: Permit contiguous descriptors to be manipulated
  arm64: kfence: Avoid NOMAP tricks when mapping the early pool
  arm64: mm: Permit contiguous attribute for preliminary mappings
  arm64: Move fixmap and kasan page tables to end of kernel image
  arm64: mm: Don't abuse memblock NOMAP to check for overlaps
  powerpc/code-patching: Avoid r/w mapping of the zero page
  sh: Drop cache flush of the zero page at boot
  mm: Make empty_zero_page[] const
  arm64: mm: Map the kernel data/bss read-only in the linear map
  arm64: mm: Unmap kernel data/bss entirely from the linear map

 arch/arm64/include/asm/mmu.h     |   2 +
 arch/arm64/include/asm/pgtable.h |   4 +
 arch/arm64/kernel/vmlinux.lds.S  |   8 +-
 arch/arm64/mm/fixmap.c           |   6 +-
 arch/arm64/mm/kasan_init.c       |   2 +-
 arch/arm64/mm/mmu.c              | 164 ++++++++++++--------
 arch/powerpc/lib/code-patching.c |  52 +------
 arch/sh/mm/init.c                |   3 -
 include/linux/pgtable.h          |   2 +-
 mm/mm_init.c                     |   2 +-
 10 files changed, 121 insertions(+), 124 deletions(-)


base-commit: 254f49634ee16a731174d2ae34bc50bd5f45e731
-- 
2.54.0.823.g6e5bcc1fc9-goog
Re: [PATCH v7 00/15] arm64: Unmap linear alias of kernel data/bss
Posted by Will Deacon 5 days, 13 hours ago
On Fri, 29 May 2026 17:01:51 +0200, Ard Biesheuvel wrote:
> One of the reasons the lack of randomization of the linear map on arm64
> is considered problematic is the fact that bootloaders adhering to the
> original arm64 boot protocol (i.e., a substantial fraction of all
> Android phones) may place the kernel at the base of DRAM, and therefore
> at the base of the non-randomized linear map. This puts a writable alias
> of the kernel's data and bss regions at a predictable location, removing
> the need for an attacker to guess where KASLR mapped the kernel.
> 
> [...]

It would've been nice to hear from the ppc folks on patch 11, but I've
picked it up on the assumption that they'll love the negative diff stat.
Worst case, we can drop/revert stuff if they have late objections.

Applied to arm64 (for-next/mm), thanks!

[01/15] arm64: mm: Remove bogus stop condition from map_mem() loop
        https://git.kernel.org/arm64/c/36ca7f4be809
[02/15] arm64: mm: Drop redundant pgd_t* argument from map_mem()
        https://git.kernel.org/arm64/c/2e527667a3b9
[03/15] arm64: mm: Check for pud_/pmd_set_huge() failures on kernel mappings
        https://git.kernel.org/arm64/c/8dd640d9233d
[04/15] arm64: mm: Preserve existing table mappings when mapping DRAM
        https://git.kernel.org/arm64/c/a64293e993f6
[05/15] arm64: mm: Preserve non-contiguous descriptors when mapping DRAM
        https://git.kernel.org/arm64/c/ecda73ae92ca
[06/15] arm64: mm: Permit contiguous descriptors to be manipulated
        https://git.kernel.org/arm64/c/05c5c31e9d8d
[07/15] arm64: kfence: Avoid NOMAP tricks when mapping the early pool
        https://git.kernel.org/arm64/c/dfd73e574d38
[08/15] arm64: mm: Permit contiguous attribute for preliminary mappings
        https://git.kernel.org/arm64/c/28becb2c1d74
[09/15] arm64: Move fixmap and kasan page tables to end of kernel image
        https://git.kernel.org/arm64/c/382a03e12eba
[10/15] arm64: mm: Don't abuse memblock NOMAP to check for overlaps
        https://git.kernel.org/arm64/c/d672a4b72c95
[11/15] powerpc/code-patching: Avoid r/w mapping of the zero page
        https://git.kernel.org/arm64/c/c0693153fb17
[12/15] sh: Drop cache flush of the zero page at boot
        https://git.kernel.org/arm64/c/99bad3e992e2
[13/15] mm: Make empty_zero_page[] const
        https://git.kernel.org/arm64/c/0aae825f1ed7
[14/15] arm64: mm: Map the kernel data/bss read-only in the linear map
        https://git.kernel.org/arm64/c/f2ba877402e5
[15/15] arm64: mm: Unmap kernel data/bss entirely from the linear map
        https://git.kernel.org/arm64/c/63e0b6a5b693

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
Re: [PATCH v7 00/15] arm64: Unmap linear alias of kernel data/bss
Posted by Ard Biesheuvel 5 days ago
(cc Marc)

On Tue, 2 Jun 2026, at 22:34, Will Deacon wrote:
> On Fri, 29 May 2026 17:01:51 +0200, Ard Biesheuvel wrote:
>> One of the reasons the lack of randomization of the linear map on arm64
>> is considered problematic is the fact that bootloaders adhering to the
>> original arm64 boot protocol (i.e., a substantial fraction of all
>> Android phones) may place the kernel at the base of DRAM, and therefore
>> at the base of the non-randomized linear map. This puts a writable alias
>> of the kernel's data and bss regions at a predictable location, removing
>> the need for an attacker to guess where KASLR mapped the kernel.
>> 
>> [...]
>
> It would've been nice to hear from the ppc folks on patch 11, but I've
> picked it up on the assumption that they'll love the negative diff stat.
> Worst case, we can drop/revert stuff if they have late objections.
>

Thanks.

There is a de facto ack from Michael Ellerman in the Link:, which is why
I included it.

Note that Sashiko found an issue with KVM+MTE, where a read-only mapping
of the zero page in the linear map may result in issues:

"""
Does moving the zero page to .rodata (or unmapping/read-only mapping its
linear alias) expose a guest-to-host denial of service with KVM and MTE?
When an MTE-enabled KVM guest reads an unmapped memory address, KVM handles
the stage-2 fault by mapping the host's shared zero page. KVM will then
call sanitise_mte_tags() in arch/arm64/kvm/mmu.c.
Since the PG_mte_tagged flag is never set on the zero page, KVM's
try_page_mte_tagging() succeeds, and it calls mte_clear_page_tags().
This executes the STGM instruction using the zero page's linear map alias.
If this alias is read-only or unmapped, won't the STGM instruction trigger
a synchronous permission fault or translation fault in EL1, causing a host
kernel panic?
"""

Marc seems to think it is legit, so I came up with the following (I'll send
it out separately with another pair of tweaks):

-------8<------------
From: Ard Biesheuvel <ardb@kernel.org>
Subject: [PATCH] arm64: mte: Disregard the zero page explicitly for
 manipulating tags

The zero page is conceptually immutable, and will be moved into .rodata
to prevent inadvertent corruption.

Prepare the MTE code for this, by ensuring that the zero page is never
taken into account for tag manipulation, given that those actions will
no longer be permitted on the read-only alias of .rodata in the linear
map.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index 7f7b97e09996..093b34944aee 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -80,6 +80,11 @@ static inline bool page_mte_tagged(struct page *page)
  */
 static inline bool try_page_mte_tagging(struct page *page)
 {
+       extern struct page *__zero_page;
+
+       if (page == __zero_page)
+               return false;
+
        VM_WARN_ON_ONCE(folio_test_hugetlb(page_folio(page)));
 
        if (!test_and_set_bit(PG_mte_lock, &page->flags.f))
Re: [PATCH v7 00/15] arm64: Unmap linear alias of kernel data/bss
Posted by Will Deacon 4 days, 22 hours ago
On Wed, Jun 03, 2026 at 10:57:49AM +0200, Ard Biesheuvel wrote:
> (cc Marc)
> 
> On Tue, 2 Jun 2026, at 22:34, Will Deacon wrote:
> > On Fri, 29 May 2026 17:01:51 +0200, Ard Biesheuvel wrote:
> >> One of the reasons the lack of randomization of the linear map on arm64
> >> is considered problematic is the fact that bootloaders adhering to the
> >> original arm64 boot protocol (i.e., a substantial fraction of all
> >> Android phones) may place the kernel at the base of DRAM, and therefore
> >> at the base of the non-randomized linear map. This puts a writable alias
> >> of the kernel's data and bss regions at a predictable location, removing
> >> the need for an attacker to guess where KASLR mapped the kernel.
> >> 
> >> [...]
> >
> > It would've been nice to hear from the ppc folks on patch 11, but I've
> > picked it up on the assumption that they'll love the negative diff stat.
> > Worst case, we can drop/revert stuff if they have late objections.
> >
> 
> Thanks.
> 
> There is a de facto ack from Michael Ellerman in the Link:, which is why
> I included it.
> 
> Note that Sashiko found an issue with KVM+MTE, where a read-only mapping
> of the zero page in the linear map may result in issues:
> 
> """
> Does moving the zero page to .rodata (or unmapping/read-only mapping its
> linear alias) expose a guest-to-host denial of service with KVM and MTE?
> When an MTE-enabled KVM guest reads an unmapped memory address, KVM handles
> the stage-2 fault by mapping the host's shared zero page. KVM will then
> call sanitise_mte_tags() in arch/arm64/kvm/mmu.c.
> Since the PG_mte_tagged flag is never set on the zero page, KVM's
> try_page_mte_tagging() succeeds, and it calls mte_clear_page_tags().
> This executes the STGM instruction using the zero page's linear map alias.
> If this alias is read-only or unmapped, won't the STGM instruction trigger
> a synchronous permission fault or translation fault in EL1, causing a host
> kernel panic?
> """
> 
> Marc seems to think it is legit, so I came up with the following (I'll send
> it out separately with another pair of tweaks):

Thanks, it also looks like we're getting some early WARN_ON()s firing in
CI from split_kernel_leaf_mapping() after applying your changes:

https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/2571596185/test_aarch64/14662134813/artifacts/jobwatch/logs/recipes/21399931/tasks/219104268/results/1007729692/logs/journalctl.log

Will
Re: [PATCH v7 00/15] arm64: Unmap linear alias of kernel data/bss
Posted by Ard Biesheuvel 4 days, 22 hours ago

On Wed, 3 Jun 2026, at 13:22, Will Deacon wrote:
> On Wed, Jun 03, 2026 at 10:57:49AM +0200, Ard Biesheuvel wrote:
>> (cc Marc)
>> 
>> On Tue, 2 Jun 2026, at 22:34, Will Deacon wrote:
>> > On Fri, 29 May 2026 17:01:51 +0200, Ard Biesheuvel wrote:
>> >> One of the reasons the lack of randomization of the linear map on arm64
>> >> is considered problematic is the fact that bootloaders adhering to the
>> >> original arm64 boot protocol (i.e., a substantial fraction of all
>> >> Android phones) may place the kernel at the base of DRAM, and therefore
>> >> at the base of the non-randomized linear map. This puts a writable alias
>> >> of the kernel's data and bss regions at a predictable location, removing
>> >> the need for an attacker to guess where KASLR mapped the kernel.
>> >> 
>> >> [...]
>> >
>> > It would've been nice to hear from the ppc folks on patch 11, but I've
>> > picked it up on the assumption that they'll love the negative diff stat.
>> > Worst case, we can drop/revert stuff if they have late objections.
>> >
>> 
>> Thanks.
>> 
>> There is a de facto ack from Michael Ellerman in the Link:, which is why
>> I included it.
>> 
>> Note that Sashiko found an issue with KVM+MTE, where a read-only mapping
>> of the zero page in the linear map may result in issues:
>> 
>> """
>> Does moving the zero page to .rodata (or unmapping/read-only mapping its
>> linear alias) expose a guest-to-host denial of service with KVM and MTE?
>> When an MTE-enabled KVM guest reads an unmapped memory address, KVM handles
>> the stage-2 fault by mapping the host's shared zero page. KVM will then
>> call sanitise_mte_tags() in arch/arm64/kvm/mmu.c.
>> Since the PG_mte_tagged flag is never set on the zero page, KVM's
>> try_page_mte_tagging() succeeds, and it calls mte_clear_page_tags().
>> This executes the STGM instruction using the zero page's linear map alias.
>> If this alias is read-only or unmapped, won't the STGM instruction trigger
>> a synchronous permission fault or translation fault in EL1, causing a host
>> kernel panic?
>> """
>> 
>> Marc seems to think it is legit, so I came up with the following (I'll send
>> it out separately with another pair of tweaks):
>
> Thanks, it also looks like we're getting some early WARN_ON()s firing in
> CI from split_kernel_leaf_mapping() after applying your changes:
>
> https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/2571596185/test_aarch64/14662134813/artifacts/jobwatch/logs/recipes/21399931/tasks/219104268/results/1007729692/logs/journalctl.log
>

OK I'll investigate