arch/arm64/include/asm/mmu.h | 2 + arch/arm64/include/asm/pgtable.h | 4 + arch/arm64/kernel/vmlinux.lds.S | 8 +- arch/arm64/mm/fixmap.c | 6 +- arch/arm64/mm/kasan_init.c | 2 +- arch/arm64/mm/mmu.c | 164 ++++++++++++-------- arch/powerpc/lib/code-patching.c | 52 +------ arch/sh/mm/init.c | 3 - include/linux/pgtable.h | 2 +- mm/mm_init.c | 2 +- 10 files changed, 121 insertions(+), 124 deletions(-)
From: Ard Biesheuvel <ardb@kernel.org> One of the reasons the lack of randomization of the linear map on arm64 is considered problematic is the fact that bootloaders adhering to the original arm64 boot protocol (i.e., a substantial fraction of all Android phones) may place the kernel at the base of DRAM, and therefore at the base of the non-randomized linear map. This puts a writable alias of the kernel's data and bss regions at a predictable location, removing the need for an attacker to guess where KASLR mapped the kernel. Let's unmap this linear, writable alias entirely, so that knowing the location of the linear alias does not give write access to the kernel's data and bss regions. Changes since v6: - Improve commits logs and comments - Add acks from Kevin - Reorder patches so remapping data/bss R/O occurs after moving the zero page into .rodata - Drop zero page cache flush from SuperH rather than casting away the constness - Map kfence pool with NO_EXEC_MAPPINGS Note that Sashiko had some comments on patch 15/15 [1] but none of those seem accurate. (I have tested both suspend/resume and hibernate under QEMU and both work as expected) Changes since v5: - Reorder series in ascending order of impact, so that the first few can be merged earlier if desired. This also makes the patch that remaps the data/bss linear alias as tagged redundant, which is therefore dropped. - Add patch #3 to address an existing issue spotted by Sashiko - Fix thinko in contiguous region check (#5), where the whole region needs to be considered and not only the first entry (dropped Rb as well) - this addresses the kfence issue Sashiko reported on v5 [0] - Update commit log on #6 to clarify that changing permission bits on PTE_CONT entries is safe as long as PTE_CONT itself does not change - Likewise, drop hunk that adds the PTE_CONT bit to the 'permitted' mask in pgattr_change_is_safe(), as changing it is not safe. (#8) - Move kasan's additional page table to pgdir BSS as well - Use (NOLOAD) on the .pgdir.bss section so it does not get emitted into vmlinux - Add powerpc and SuperH patches to deal with empty_zero_page[] being made const Changes since v4: - Update the correct [early] mapping in patch #1 - Make empty_zero_page[] const instead of __ro_after_init - Drop patches that remap the fixmap page tables r/o for now - Don't force page mappings for the data/bss linear alias, as it is no longer needed for set_memory_valid() - Add acks Changes since v3: - Drop bogus patch adding hierarchical PXN to the fixmap mapping, which breaks the KPTI trampoline (thanks to Sashiko) - Add generic patch to move the empty_zero_page to __ro_after_init, as it now lives in generic code. - Add patches to remap the linear aliases of the fixmap page tables read-only too - these live at an a priori known offset in the linear map if physical KASLR was omitted, and control a priori known addresses in the virtual kernel space. - Rebase onto v7.1-rc1 Changes since v2: - Keep bm_pte[] in the region that is remapped r/o or unmapped, as it is only manipulated via its kernel alias - Drop check that prohibits any manipulation of descriptors with the CONT bit set - Add Ryan's ack to a couple of patches - Rebase onto v7.0-rc4 Changes since v1: - Put zero page patch at the start of the series - Tweak __map_memblock() API to respect existing table and contiguous mappings, so that the logic to map the kernel alias can be simplified - Stop abusing the MEMBLOCK_NOMAP flag to initially omit the kernel linear alias from the linear map - Some additional cleanup patches - Use proper API [set_memory_valid()] to (un)map the linear alias of data/bss. Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Liz Prucka <lizprucka@google.com> Cc: Seth Jenkins <sethjenkins@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jann Horn <jannh@google.com> Cc: linux-mm@kvack.org Cc: linux-hardening@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-sh@vger.kernel.org [0] https://sashiko.dev/#/patchset/20260519151616.2557018-15-ardb%2Bgit%40google.com [1] https://sashiko.dev/#/patchset/20260526175846.2694125-17-ardb%2Bgit%40google.com Ard Biesheuvel (15): arm64: mm: Remove bogus stop condition from map_mem() loop arm64: mm: Drop redundant pgd_t* argument from map_mem() arm64: mm: Check for pud_/pmd_set_huge() failures on kernel mappings arm64: mm: Preserve existing table mappings when mapping DRAM arm64: mm: Preserve non-contiguous descriptors when mapping DRAM arm64: mm: Permit contiguous descriptors to be manipulated arm64: kfence: Avoid NOMAP tricks when mapping the early pool arm64: mm: Permit contiguous attribute for preliminary mappings arm64: Move fixmap and kasan page tables to end of kernel image arm64: mm: Don't abuse memblock NOMAP to check for overlaps powerpc/code-patching: Avoid r/w mapping of the zero page sh: Drop cache flush of the zero page at boot mm: Make empty_zero_page[] const arm64: mm: Map the kernel data/bss read-only in the linear map arm64: mm: Unmap kernel data/bss entirely from the linear map arch/arm64/include/asm/mmu.h | 2 + arch/arm64/include/asm/pgtable.h | 4 + arch/arm64/kernel/vmlinux.lds.S | 8 +- arch/arm64/mm/fixmap.c | 6 +- arch/arm64/mm/kasan_init.c | 2 +- arch/arm64/mm/mmu.c | 164 ++++++++++++-------- arch/powerpc/lib/code-patching.c | 52 +------ arch/sh/mm/init.c | 3 - include/linux/pgtable.h | 2 +- mm/mm_init.c | 2 +- 10 files changed, 121 insertions(+), 124 deletions(-) base-commit: 254f49634ee16a731174d2ae34bc50bd5f45e731 -- 2.54.0.823.g6e5bcc1fc9-goog
On Fri, 29 May 2026 17:01:51 +0200, Ard Biesheuvel wrote:
> One of the reasons the lack of randomization of the linear map on arm64
> is considered problematic is the fact that bootloaders adhering to the
> original arm64 boot protocol (i.e., a substantial fraction of all
> Android phones) may place the kernel at the base of DRAM, and therefore
> at the base of the non-randomized linear map. This puts a writable alias
> of the kernel's data and bss regions at a predictable location, removing
> the need for an attacker to guess where KASLR mapped the kernel.
>
> [...]
It would've been nice to hear from the ppc folks on patch 11, but I've
picked it up on the assumption that they'll love the negative diff stat.
Worst case, we can drop/revert stuff if they have late objections.
Applied to arm64 (for-next/mm), thanks!
[01/15] arm64: mm: Remove bogus stop condition from map_mem() loop
https://git.kernel.org/arm64/c/36ca7f4be809
[02/15] arm64: mm: Drop redundant pgd_t* argument from map_mem()
https://git.kernel.org/arm64/c/2e527667a3b9
[03/15] arm64: mm: Check for pud_/pmd_set_huge() failures on kernel mappings
https://git.kernel.org/arm64/c/8dd640d9233d
[04/15] arm64: mm: Preserve existing table mappings when mapping DRAM
https://git.kernel.org/arm64/c/a64293e993f6
[05/15] arm64: mm: Preserve non-contiguous descriptors when mapping DRAM
https://git.kernel.org/arm64/c/ecda73ae92ca
[06/15] arm64: mm: Permit contiguous descriptors to be manipulated
https://git.kernel.org/arm64/c/05c5c31e9d8d
[07/15] arm64: kfence: Avoid NOMAP tricks when mapping the early pool
https://git.kernel.org/arm64/c/dfd73e574d38
[08/15] arm64: mm: Permit contiguous attribute for preliminary mappings
https://git.kernel.org/arm64/c/28becb2c1d74
[09/15] arm64: Move fixmap and kasan page tables to end of kernel image
https://git.kernel.org/arm64/c/382a03e12eba
[10/15] arm64: mm: Don't abuse memblock NOMAP to check for overlaps
https://git.kernel.org/arm64/c/d672a4b72c95
[11/15] powerpc/code-patching: Avoid r/w mapping of the zero page
https://git.kernel.org/arm64/c/c0693153fb17
[12/15] sh: Drop cache flush of the zero page at boot
https://git.kernel.org/arm64/c/99bad3e992e2
[13/15] mm: Make empty_zero_page[] const
https://git.kernel.org/arm64/c/0aae825f1ed7
[14/15] arm64: mm: Map the kernel data/bss read-only in the linear map
https://git.kernel.org/arm64/c/f2ba877402e5
[15/15] arm64: mm: Unmap kernel data/bss entirely from the linear map
https://git.kernel.org/arm64/c/63e0b6a5b693
Cheers,
--
Will
https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
(cc Marc)
On Tue, 2 Jun 2026, at 22:34, Will Deacon wrote:
> On Fri, 29 May 2026 17:01:51 +0200, Ard Biesheuvel wrote:
>> One of the reasons the lack of randomization of the linear map on arm64
>> is considered problematic is the fact that bootloaders adhering to the
>> original arm64 boot protocol (i.e., a substantial fraction of all
>> Android phones) may place the kernel at the base of DRAM, and therefore
>> at the base of the non-randomized linear map. This puts a writable alias
>> of the kernel's data and bss regions at a predictable location, removing
>> the need for an attacker to guess where KASLR mapped the kernel.
>>
>> [...]
>
> It would've been nice to hear from the ppc folks on patch 11, but I've
> picked it up on the assumption that they'll love the negative diff stat.
> Worst case, we can drop/revert stuff if they have late objections.
>
Thanks.
There is a de facto ack from Michael Ellerman in the Link:, which is why
I included it.
Note that Sashiko found an issue with KVM+MTE, where a read-only mapping
of the zero page in the linear map may result in issues:
"""
Does moving the zero page to .rodata (or unmapping/read-only mapping its
linear alias) expose a guest-to-host denial of service with KVM and MTE?
When an MTE-enabled KVM guest reads an unmapped memory address, KVM handles
the stage-2 fault by mapping the host's shared zero page. KVM will then
call sanitise_mte_tags() in arch/arm64/kvm/mmu.c.
Since the PG_mte_tagged flag is never set on the zero page, KVM's
try_page_mte_tagging() succeeds, and it calls mte_clear_page_tags().
This executes the STGM instruction using the zero page's linear map alias.
If this alias is read-only or unmapped, won't the STGM instruction trigger
a synchronous permission fault or translation fault in EL1, causing a host
kernel panic?
"""
Marc seems to think it is legit, so I came up with the following (I'll send
it out separately with another pair of tweaks):
-------8<------------
From: Ard Biesheuvel <ardb@kernel.org>
Subject: [PATCH] arm64: mte: Disregard the zero page explicitly for
manipulating tags
The zero page is conceptually immutable, and will be moved into .rodata
to prevent inadvertent corruption.
Prepare the MTE code for this, by ensuring that the zero page is never
taken into account for tag manipulation, given that those actions will
no longer be permitted on the read-only alias of .rodata in the linear
map.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index 7f7b97e09996..093b34944aee 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -80,6 +80,11 @@ static inline bool page_mte_tagged(struct page *page)
*/
static inline bool try_page_mte_tagging(struct page *page)
{
+ extern struct page *__zero_page;
+
+ if (page == __zero_page)
+ return false;
+
VM_WARN_ON_ONCE(folio_test_hugetlb(page_folio(page)));
if (!test_and_set_bit(PG_mte_lock, &page->flags.f))
On Wed, Jun 03, 2026 at 10:57:49AM +0200, Ard Biesheuvel wrote: > (cc Marc) > > On Tue, 2 Jun 2026, at 22:34, Will Deacon wrote: > > On Fri, 29 May 2026 17:01:51 +0200, Ard Biesheuvel wrote: > >> One of the reasons the lack of randomization of the linear map on arm64 > >> is considered problematic is the fact that bootloaders adhering to the > >> original arm64 boot protocol (i.e., a substantial fraction of all > >> Android phones) may place the kernel at the base of DRAM, and therefore > >> at the base of the non-randomized linear map. This puts a writable alias > >> of the kernel's data and bss regions at a predictable location, removing > >> the need for an attacker to guess where KASLR mapped the kernel. > >> > >> [...] > > > > It would've been nice to hear from the ppc folks on patch 11, but I've > > picked it up on the assumption that they'll love the negative diff stat. > > Worst case, we can drop/revert stuff if they have late objections. > > > > Thanks. > > There is a de facto ack from Michael Ellerman in the Link:, which is why > I included it. > > Note that Sashiko found an issue with KVM+MTE, where a read-only mapping > of the zero page in the linear map may result in issues: > > """ > Does moving the zero page to .rodata (or unmapping/read-only mapping its > linear alias) expose a guest-to-host denial of service with KVM and MTE? > When an MTE-enabled KVM guest reads an unmapped memory address, KVM handles > the stage-2 fault by mapping the host's shared zero page. KVM will then > call sanitise_mte_tags() in arch/arm64/kvm/mmu.c. > Since the PG_mte_tagged flag is never set on the zero page, KVM's > try_page_mte_tagging() succeeds, and it calls mte_clear_page_tags(). > This executes the STGM instruction using the zero page's linear map alias. > If this alias is read-only or unmapped, won't the STGM instruction trigger > a synchronous permission fault or translation fault in EL1, causing a host > kernel panic? > """ > > Marc seems to think it is legit, so I came up with the following (I'll send > it out separately with another pair of tweaks): Thanks, it also looks like we're getting some early WARN_ON()s firing in CI from split_kernel_leaf_mapping() after applying your changes: https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/2571596185/test_aarch64/14662134813/artifacts/jobwatch/logs/recipes/21399931/tasks/219104268/results/1007729692/logs/journalctl.log Will
On Wed, 3 Jun 2026, at 13:22, Will Deacon wrote: > On Wed, Jun 03, 2026 at 10:57:49AM +0200, Ard Biesheuvel wrote: >> (cc Marc) >> >> On Tue, 2 Jun 2026, at 22:34, Will Deacon wrote: >> > On Fri, 29 May 2026 17:01:51 +0200, Ard Biesheuvel wrote: >> >> One of the reasons the lack of randomization of the linear map on arm64 >> >> is considered problematic is the fact that bootloaders adhering to the >> >> original arm64 boot protocol (i.e., a substantial fraction of all >> >> Android phones) may place the kernel at the base of DRAM, and therefore >> >> at the base of the non-randomized linear map. This puts a writable alias >> >> of the kernel's data and bss regions at a predictable location, removing >> >> the need for an attacker to guess where KASLR mapped the kernel. >> >> >> >> [...] >> > >> > It would've been nice to hear from the ppc folks on patch 11, but I've >> > picked it up on the assumption that they'll love the negative diff stat. >> > Worst case, we can drop/revert stuff if they have late objections. >> > >> >> Thanks. >> >> There is a de facto ack from Michael Ellerman in the Link:, which is why >> I included it. >> >> Note that Sashiko found an issue with KVM+MTE, where a read-only mapping >> of the zero page in the linear map may result in issues: >> >> """ >> Does moving the zero page to .rodata (or unmapping/read-only mapping its >> linear alias) expose a guest-to-host denial of service with KVM and MTE? >> When an MTE-enabled KVM guest reads an unmapped memory address, KVM handles >> the stage-2 fault by mapping the host's shared zero page. KVM will then >> call sanitise_mte_tags() in arch/arm64/kvm/mmu.c. >> Since the PG_mte_tagged flag is never set on the zero page, KVM's >> try_page_mte_tagging() succeeds, and it calls mte_clear_page_tags(). >> This executes the STGM instruction using the zero page's linear map alias. >> If this alias is read-only or unmapped, won't the STGM instruction trigger >> a synchronous permission fault or translation fault in EL1, causing a host >> kernel panic? >> """ >> >> Marc seems to think it is legit, so I came up with the following (I'll send >> it out separately with another pair of tweaks): > > Thanks, it also looks like we're getting some early WARN_ON()s firing in > CI from split_kernel_leaf_mapping() after applying your changes: > > https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/2571596185/test_aarch64/14662134813/artifacts/jobwatch/logs/recipes/21399931/tasks/219104268/results/1007729692/logs/journalctl.log > OK I'll investigate
© 2016 - 2026 Red Hat, Inc.