From nobody Fri Apr 3 20:53:03 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A8F471B4223; Mon, 23 Mar 2026 13:03:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774271013; cv=none; b=agFADmg6BqZNLN7rL5HgFm9OPLqmFQ3/0+MywNQ08UnIhQyrdjnOXeGbIVZban9kTkGhWmgTlsu3y/X4vw3M64nhQ6EivoOXnX6OLKDsvXGw8XWMspxk8w8xTDuu1HGQ0u9qrb+JZ+W+lEd2bAVf/wgCT+KfRka+sSlhi7STijc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774271013; c=relaxed/simple; bh=i+0PDLQo9WcEt2mlaqbT2dwD07UT1Dej2cEKg/tyZo0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QFcVUh8CehkJInMitseewDIE9WFaURix7N4ApEYrizYZa8WLevQAKentb0C6pmzLTSe5F86dBNYpG88z0F8C9vlXVEFLgxUgLevlbpv7COa/S1M62Wpz+b3BGKzCg3MSlsroXTB4YWmPkCA2lCHaYHOft6mrSYK1Z5Psm8Ma1gQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3710B1516; Mon, 23 Mar 2026 06:03:25 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 770CD3F885; Mon, 23 Mar 2026 06:03:29 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , "David Hildenbrand (Arm)" , Dev Jain , Yang Shi , Suzuki K Poulose , Jinjiang Tu , Kevin Brodsky Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: [PATCH v1 1/3] arm64: mm: Fix rodata=full block mapping support for realm guests Date: Mon, 23 Mar 2026 13:03:13 +0000 Message-ID: <20260323130317.1737522-2-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260323130317.1737522-1-ryan.roberts@arm.com> References: <20260323130317.1737522-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Commit a166563e7ec37 ("arm64: mm: support large block mapping when rodata=3Dfull") enabled the linear map to be mapped by block/cont while still allowing granular permission changes on BBML2_NOABORT systems by lazily splitting the live mappings. This mechanism was intended to be usable by realm guests since they need to dynamically share dma buffers with the host by "decrypting" them - which for Arm CCA, means marking them as shared in the page tables. However, it turns out that the mechanism was failing for realm guests because realms need to share their dma buffers (via __set_memory_enc_dec()) much earlier during boot than split_kernel_leaf_mapping() was able to handle. The report linked below showed that GIC's ITS was one such user. But during the investigation I found other callsites that could not meet the split_kernel_leaf_mapping() constraints. The problem is that we block map the linear map based on the boot CPU supporting BBML2_NOABORT, then check that all the other CPUs support it too when finalizing the caps. If they don't, then we stop_machine() and split to ptes. For safety, split_kernel_leaf_mapping() previously wouldn't permit splitting until after the caps were finalized. That ensured that if any secondary cpus were running that didn't support BBML2_NOABORT, we wouldn't risk breaking them. I've fix this problem by reducing the black-out window where we refuse to split; there are now 2 windows. The first is from T0 until the page allocator is inititialized. Splitting allocates memory for the page allocator so it must be in use. The second covers the period between starting to online the secondary cpus until the system caps are finalized (this is a very small window). All of the problematic callers are calling __set_memory_enc_dec() before the secondary cpus come online, so this solves the problem. However, one of these callers, swiotlb_update_mem_attributes(), was trying to split before the page allocator was initialized. So I have moved this call from arch_mm_preinit() to mem_init(), which solves the ordering issue. I've added warnings and return an error if any attempt is made to split in the black-out windows. Note there are other issues which prevent booting all the way to user space, which will be fixed in subsequent patches. Reported-by: Jinjiang Tu Closes: https://lore.kernel.org/all/0b2a4ae5-fc51-4d77-b177-b2e9db74f11d@hu= awei.com/ Fixes: a166563e7ec37 ("arm64: mm: support large block mapping when rodata= =3Dfull") Cc: stable@vger.kernel.org Signed-off-by: Ryan Roberts Reviewed-by: Kevin Brodsky --- arch/arm64/mm/init.c | 9 ++++++++- arch/arm64/mm/mmu.c | 35 +++++++++++++++++++++++++++-------- 2 files changed, 35 insertions(+), 9 deletions(-) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 96711b8578fd0..b9b248d24fd10 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -350,7 +350,6 @@ void __init arch_mm_preinit(void) } =20 swiotlb_init(swiotlb, flags); - swiotlb_update_mem_attributes(); =20 /* * Check boundaries twice: Some fundamental inconsistencies can be @@ -377,6 +376,14 @@ void __init arch_mm_preinit(void) } } =20 +bool page_alloc_available __ro_after_init; + +void __init mem_init(void) +{ + page_alloc_available =3D true; + swiotlb_update_mem_attributes(); +} + void free_initmem(void) { void *lm_init_begin =3D lm_alias(__init_begin); diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index a6a00accf4f93..5b6a8d53e64b7 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -773,14 +773,33 @@ int split_kernel_leaf_mapping(unsigned long start, un= signed long end) { int ret; =20 - /* - * !BBML2_NOABORT systems should not be trying to change permissions on - * anything that is not pte-mapped in the first place. Just return early - * and let the permission change code raise a warning if not already - * pte-mapped. - */ - if (!system_supports_bbml2_noabort()) - return 0; + if (!system_supports_bbml2_noabort()) { + /* + * !BBML2_NOABORT systems should not be trying to change + * permissions on anything that is not pte-mapped in the first + * place. Just return early and let the permission change code + * raise a warning if not already pte-mapped. + */ + if (system_capabilities_finalized() || + !cpu_supports_bbml2_noabort()) + return 0; + + /* + * Boot-time: split_kernel_leaf_mapping_locked() allocates from + * page allocator. Can't split until it's available. + */ + extern bool page_alloc_available; + if (WARN_ON(!page_alloc_available)) + return -EBUSY; + + /* + * Boot-time: Started secondary cpus but don't know if they + * support BBML2_NOABORT yet. Can't allow splitting in this + * window in case they don't. + */ + if (WARN_ON(num_online_cpus() > 1)) + return -EBUSY; + } =20 /* * If the region is within a pte-mapped area, there is no need to try to --=20 2.43.0 From nobody Fri Apr 3 20:53:03 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E21B53AA4F4; Mon, 23 Mar 2026 13:03:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774271015; cv=none; b=EVqJFmgKt9dB1wRZghi1jtM8MylenLRFcRGR5jU47qNA+m6YHc+4NjL6OIQF8cG6tU4z0NmepmfgE6u8c2fQ1B0vcaXrmahTXjEO9Urs0Ry0RKX6VskhFde4/zyyYnQPE3vrrpBDLADG3drpX7BdJ0im3YRD4Yvz/vqV6a3UQaY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774271015; c=relaxed/simple; bh=zM8TDhiXz4UqejfHBMlISVluQljShazmrodv7kYCVOA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dKySfkJXvJx83PDr1jTTxkERdnC3iQGe9fmzfIREDaPsLAdakiZcHdBWCo+Gy0MrlyF2/5+5HFlrRUKMvqza7RaV9hG1g+Prh86i12U3bmXAizLOsTvn77+1uqIcC4o1hrrB5eG8tVUk/fsoh1ljTNL0EK7ZIu/49XMVY9/uR7g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5B2091688; Mon, 23 Mar 2026 06:03:27 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7FA9A3F885; Mon, 23 Mar 2026 06:03:31 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , "David Hildenbrand (Arm)" , Dev Jain , Yang Shi , Suzuki K Poulose , Jinjiang Tu , Kevin Brodsky Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: [PATCH v1 2/3] arm64: mm: Handle invalid large leaf mappings correctly Date: Mon, 23 Mar 2026 13:03:14 +0000 Message-ID: <20260323130317.1737522-3-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260323130317.1737522-1-ryan.roberts@arm.com> References: <20260323130317.1737522-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" It has been possible for a long time to mark ptes in the linear map as invalid. This is done for secretmem, kfence, realm dma memory un/share, and others, by simply clearing the PTE_VALID bit. But until commit a166563e7ec37 ("arm64: mm: support large block mapping when rodata=3Dfull") large leaf mappings were never made invalid in this way. It turns out various parts of the code base are not equipped to handle invalid large leaf mappings (in the way they are currently encoded) and I've observed a kernel panic while booting a realm guest on a BBML2_NOABORT system as a result: [ 15.432706] software IO TLB: Memory encryption is active and system is u= sing DMA bounce buffers [ 15.476896] Unable to handle kernel paging request at virtual address ff= ff000019600000 [ 15.513762] Mem abort info: [ 15.527245] ESR =3D 0x0000000096000046 [ 15.548553] EC =3D 0x25: DABT (current EL), IL =3D 32 bits [ 15.572146] SET =3D 0, FnV =3D 0 [ 15.592141] EA =3D 0, S1PTW =3D 0 [ 15.612694] FSC =3D 0x06: level 2 translation fault [ 15.640644] Data abort info: [ 15.661983] ISV =3D 0, ISS =3D 0x00000046, ISS2 =3D 0x00000000 [ 15.694875] CM =3D 0, WnR =3D 1, TnD =3D 0, TagAccess =3D 0 [ 15.723740] GCS =3D 0, Overlay =3D 0, DirtyBit =3D 0, Xs =3D 0 [ 15.755776] swapper pgtable: 4k pages, 48-bit VAs, pgdp=3D0000000081f3f0= 00 [ 15.800410] [ffff000019600000] pgd=3D0000000000000000, p4d=3D180000009ff= ff403, pud=3D180000009fffe403, pmd=3D00e8000199600704 [ 15.855046] Internal error: Oops: 0000000096000046 [#1] SMP [ 15.886394] Modules linked in: [ 15.900029] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 7.0.0-rc4-d= irty #4 PREEMPT [ 15.935258] Hardware name: linux,dummy-virt (DT) [ 15.955612] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE= =3D--) [ 15.986009] pc : __pi_memcpy_generic+0x128/0x22c [ 16.006163] lr : swiotlb_bounce+0xf4/0x158 [ 16.024145] sp : ffff80008000b8f0 [ 16.038896] x29: ffff80008000b8f0 x28: 0000000000000000 x27: 00000000000= 00000 [ 16.069953] x26: ffffb3976d261ba8 x25: 0000000000000000 x24: ffff0000196= 00000 [ 16.100876] x23: 0000000000000001 x22: ffff0000043430d0 x21: 00000000000= 07ff0 [ 16.131946] x20: 0000000084570010 x19: 0000000000000000 x18: ffff00001ff= e3fcc [ 16.163073] x17: 0000000000000000 x16: 00000000003fffff x15: 646e6120657= 66974 [ 16.194131] x14: 0000000000000000 x13: 0000000000000000 x12: 00000000000= 00000 [ 16.225059] x11: 0000000000000000 x10: 0000000000000010 x9 : 00000000000= 00018 [ 16.256113] x8 : 0000000000000018 x7 : 0000000000000000 x6 : 00000000000= 00000 [ 16.287203] x5 : ffff000019607ff0 x4 : ffff000004578000 x3 : ffff0000196= 00000 [ 16.318145] x2 : 0000000000007ff0 x1 : ffff000004570010 x0 : ffff0000196= 00000 [ 16.349071] Call trace: [ 16.360143] __pi_memcpy_generic+0x128/0x22c (P) [ 16.380310] swiotlb_tbl_map_single+0x154/0x2b4 [ 16.400282] swiotlb_map+0x5c/0x228 [ 16.415984] dma_map_phys+0x244/0x2b8 [ 16.432199] dma_map_page_attrs+0x44/0x58 [ 16.449782] virtqueue_map_page_attrs+0x38/0x44 [ 16.469596] virtqueue_map_single_attrs+0xc0/0x130 [ 16.490509] virtnet_rq_alloc.isra.0+0xa4/0x1fc [ 16.510355] try_fill_recv+0x2a4/0x584 [ 16.526989] virtnet_open+0xd4/0x238 [ 16.542775] __dev_open+0x110/0x24c [ 16.558280] __dev_change_flags+0x194/0x20c [ 16.576879] netif_change_flags+0x24/0x6c [ 16.594489] dev_change_flags+0x48/0x7c [ 16.611462] ip_auto_config+0x258/0x1114 [ 16.628727] do_one_initcall+0x80/0x1c8 [ 16.645590] kernel_init_freeable+0x208/0x2f0 [ 16.664917] kernel_init+0x24/0x1e0 [ 16.680295] ret_from_fork+0x10/0x20 [ 16.696369] Code: 927cec03 cb0e0021 8b0e0042 a9411c26 (a900340c) [ 16.723106] ---[ end trace 0000000000000000 ]--- [ 16.752866] Kernel panic - not syncing: Attempted to kill init! exitcode= =3D0x0000000b [ 16.792556] Kernel Offset: 0x3396ea200000 from 0xffff800080000000 [ 16.818966] PHYS_OFFSET: 0xfff1000080000000 [ 16.837237] CPU features: 0x0000000,00060005,13e38581,957e772f [ 16.862904] Memory Limit: none [ 16.876526] ---[ end Kernel panic - not syncing: Attempted to kill init!= exitcode=3D0x0000000b ]--- This panic occurs because the swiotlb memory was previously shared to the host (__set_memory_enc_dec()), which involves transitioning the (large) leaf mappings to invalid, sharing to the host, then marking the mappings valid again. But pageattr_p[mu]d_entry() would only update the entry if it is a section mapping, since otherwise it concluded it must be a table entry so shouldn't be modified. But p[mu]d_sect() only returns true if the entry is valid. So the result was that the large leaf entry was made invalid in the first pass then ignored in the second pass. It remains invalid until the above code tries to access it and blows up. The simple fix would be to update pageattr_pmd_entry() to use !pmd_table() instead of pmd_sect(). That would solve this problem. But the ptdump code also suffers from a similar issue. It checks pmd_leaf() and doesn't call into the arch-specific note_page() machinery if it returns false. As a result of this, ptdump wasn't even able to show the invalid large leaf mappings; it looked like they were valid which made this super fun to debug. the ptdump code is core-mm and pmd_table() is arm64-specific so we can't use the same trick to solve that. But we already support the concept of "present-invalid" for user space entries. And even better, pmd_leaf() will return true for a leaf mapping that is marked present-invalid. So let's just use that encoding for present-invalid kernel mappings too. Then we can use pmd_leaf() where we previously used pmd_sect() and everything is magically fixed. Additionally, from inspection kernel_page_present() was broken in a similar way, so I'm also updating that to use pmd_leaf(). I haven't spotted any other issues of this shape but plan to do a follow up patch to remove pmd_sect() and pud_sect() in favour of the more sophisticated pmd_leaf()/pud_leaf() which are core-mm APIs and will simplify arm64 code a bit. Fixes: a166563e7ec37 ("arm64: mm: support large block mapping when rodata= =3Dfull") Cc: stable@vger.kernel.org Signed-off-by: Ryan Roberts --- arch/arm64/mm/pageattr.c | 50 ++++++++++++++++++++++------------------ 1 file changed, 28 insertions(+), 22 deletions(-) diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c index 358d1dc9a576f..87dfe4c82fa92 100644 --- a/arch/arm64/mm/pageattr.c +++ b/arch/arm64/mm/pageattr.c @@ -25,6 +25,11 @@ static ptdesc_t set_pageattr_masks(ptdesc_t val, struct = mm_walk *walk) { struct page_change_data *masks =3D walk->private; =20 + /* + * Some users clear and set bits which alias eachother (e.g. PTE_NG and + * PTE_PRESENT_INVALID). It is therefore important that we always clear + * first then set. + */ val &=3D ~(pgprot_val(masks->clear_mask)); val |=3D (pgprot_val(masks->set_mask)); =20 @@ -36,7 +41,7 @@ static int pageattr_pud_entry(pud_t *pud, unsigned long a= ddr, { pud_t val =3D pudp_get(pud); =20 - if (pud_sect(val)) { + if (pud_leaf(val)) { if (WARN_ON_ONCE((next - addr) !=3D PUD_SIZE)) return -EINVAL; val =3D __pud(set_pageattr_masks(pud_val(val), walk)); @@ -52,7 +57,7 @@ static int pageattr_pmd_entry(pmd_t *pmd, unsigned long a= ddr, { pmd_t val =3D pmdp_get(pmd); =20 - if (pmd_sect(val)) { + if (pmd_leaf(val)) { if (WARN_ON_ONCE((next - addr) !=3D PMD_SIZE)) return -EINVAL; val =3D __pmd(set_pageattr_masks(pmd_val(val), walk)); @@ -132,11 +137,12 @@ static int __change_memory_common(unsigned long start= , unsigned long size, ret =3D update_range_prot(start, size, set_mask, clear_mask); =20 /* - * If the memory is being made valid without changing any other bits - * then a TLBI isn't required as a non-valid entry cannot be cached in - * the TLB. + * If the memory is being switched from present-invalid to valid without + * changing any other bits then a TLBI isn't required as a non-valid + * entry cannot be cached in the TLB. */ - if (pgprot_val(set_mask) !=3D PTE_VALID || pgprot_val(clear_mask)) + if (pgprot_val(set_mask) !=3D (PTE_MAYBE_NG | PTE_VALID) || + pgprot_val(clear_mask) !=3D PTE_PRESENT_INVALID) flush_tlb_kernel_range(start, start + size); return ret; } @@ -237,18 +243,18 @@ int set_memory_valid(unsigned long addr, int numpages= , int enable) { if (enable) return __change_memory_common(addr, PAGE_SIZE * numpages, - __pgprot(PTE_VALID), - __pgprot(0)); + __pgprot(PTE_MAYBE_NG | PTE_VALID), + __pgprot(PTE_PRESENT_INVALID)); else return __change_memory_common(addr, PAGE_SIZE * numpages, - __pgprot(0), - __pgprot(PTE_VALID)); + __pgprot(PTE_PRESENT_INVALID), + __pgprot(PTE_MAYBE_NG | PTE_VALID)); } =20 int set_direct_map_invalid_noflush(struct page *page) { - pgprot_t clear_mask =3D __pgprot(PTE_VALID); - pgprot_t set_mask =3D __pgprot(0); + pgprot_t clear_mask =3D __pgprot(PTE_MAYBE_NG | PTE_VALID); + pgprot_t set_mask =3D __pgprot(PTE_PRESENT_INVALID); =20 if (!can_set_direct_map()) return 0; @@ -259,8 +265,8 @@ int set_direct_map_invalid_noflush(struct page *page) =20 int set_direct_map_default_noflush(struct page *page) { - pgprot_t set_mask =3D __pgprot(PTE_VALID | PTE_WRITE); - pgprot_t clear_mask =3D __pgprot(PTE_RDONLY); + pgprot_t set_mask =3D __pgprot(PTE_MAYBE_NG | PTE_VALID | PTE_WRITE); + pgprot_t clear_mask =3D __pgprot(PTE_PRESENT_INVALID | PTE_RDONLY); =20 if (!can_set_direct_map()) return 0; @@ -296,8 +302,8 @@ static int __set_memory_enc_dec(unsigned long addr, * entries or Synchronous External Aborts caused by RIPAS_EMPTY */ ret =3D __change_memory_common(addr, PAGE_SIZE * numpages, - __pgprot(set_prot), - __pgprot(clear_prot | PTE_VALID)); + __pgprot(set_prot | PTE_PRESENT_INVALID), + __pgprot(clear_prot | PTE_MAYBE_NG | PTE_VALID)); =20 if (ret) return ret; @@ -311,8 +317,8 @@ static int __set_memory_enc_dec(unsigned long addr, return ret; =20 return __change_memory_common(addr, PAGE_SIZE * numpages, - __pgprot(PTE_VALID), - __pgprot(0)); + __pgprot(PTE_MAYBE_NG | PTE_VALID), + __pgprot(PTE_PRESENT_INVALID)); } =20 static int realm_set_memory_encrypted(unsigned long addr, int numpages) @@ -404,15 +410,15 @@ bool kernel_page_present(struct page *page) pud =3D READ_ONCE(*pudp); if (pud_none(pud)) return false; - if (pud_sect(pud)) - return true; + if (pud_leaf(pud)) + return pud_valid(pud); =20 pmdp =3D pmd_offset(pudp, addr); pmd =3D READ_ONCE(*pmdp); if (pmd_none(pmd)) return false; - if (pmd_sect(pmd)) - return true; + if (pmd_leaf(pmd)) + return pmd_valid(pmd); =20 ptep =3D pte_offset_kernel(pmdp, addr); return pte_valid(__ptep_get(ptep)); --=20 2.43.0 From nobody Fri Apr 3 20:53:03 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C98161B4223 for ; Mon, 23 Mar 2026 13:03:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774271017; cv=none; b=mG0GYeltjVOMJ0hALBugeoPeeGF29Mt9lEr53yMAQgDESyrI+iMh//cDOVSZCbwJUoRTQTtSCqGkM37T8JO0ny6TndShOzCqz87mVvxFpMKHiDseCUbpKVp+Vrm6nNd1Iqavfcn2TXwO2AHqvtbaDDW3SNgNGh5Q5SeFw8CPwpo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774271017; c=relaxed/simple; bh=yTMOCJQs9Y+qHTdo67Svr7n/ymzVKGN2wsSlLo1wCPQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pAcEIOAgbzAz2PYrDrNgj1mB6tOzzXBSVNela/VOcEB3ga/jgUJR0ZexLX0y3O5PjBBp8TFQTx55dByCtbb8XCZqVwl6/iIm+T+EIob2NH308/uZRO6Voe28W2LF+3t/XI08X1x56ndyFdSzkYztlsvVouwTZhgdM+yj336E/DM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 448801684; Mon, 23 Mar 2026 06:03:29 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A60143F885; Mon, 23 Mar 2026 06:03:33 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , "David Hildenbrand (Arm)" , Dev Jain , Yang Shi , Suzuki K Poulose , Jinjiang Tu , Kevin Brodsky Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH v1 3/3] arm64: mm: Remove pmd_sect() and pud_sect() Date: Mon, 23 Mar 2026 13:03:15 +0000 Message-ID: <20260323130317.1737522-4-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260323130317.1737522-1-ryan.roberts@arm.com> References: <20260323130317.1737522-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The semantics of pXd_leaf() are very similar to pXd_sect(). The only difference is that pXd_sect() only considers it a section if PTE_VALID is set, whereas pXd_leaf() permits both "valid" and "present-invalid" types. Using pXd_sect() has caused issues now that large leaf entries can be present-invalid since commit a166563e7ec37 ("arm64: mm: support large block mapping when rodata=3Dfull"), so let's just remove the API and standardize on pXd_leaf(). Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/pgtable.h | 5 ----- arch/arm64/mm/mmu.c | 18 +++++++++--------- 2 files changed, 9 insertions(+), 14 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta= ble.h index b3e58735c49bd..8aced28ba8f6e 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -779,8 +779,6 @@ extern pgprot_t phys_mem_access_prot(struct file *file,= unsigned long pfn, =20 #define pmd_table(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) =3D=3D \ PMD_TYPE_TABLE) -#define pmd_sect(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) =3D=3D \ - PMD_TYPE_SECT) #define pmd_leaf(pmd) (pmd_present(pmd) && !pmd_table(pmd)) #define pmd_bad(pmd) (!pmd_table(pmd)) =20 @@ -799,11 +797,8 @@ static inline int pmd_trans_huge(pmd_t pmd) #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ =20 #if defined(CONFIG_ARM64_64K_PAGES) || CONFIG_PGTABLE_LEVELS < 3 -static inline bool pud_sect(pud_t pud) { return false; } static inline bool pud_table(pud_t pud) { return true; } #else -#define pud_sect(pud) ((pud_val(pud) & PUD_TYPE_MASK) =3D=3D \ - PUD_TYPE_SECT) #define pud_table(pud) ((pud_val(pud) & PUD_TYPE_MASK) =3D=3D \ PUD_TYPE_TABLE) #endif diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 5b6a8d53e64b7..7c9928c939445 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -204,7 +204,7 @@ static int alloc_init_cont_pte(pmd_t *pmdp, unsigned lo= ng addr, pmd_t pmd =3D READ_ONCE(*pmdp); pte_t *ptep; =20 - BUG_ON(pmd_sect(pmd)); + BUG_ON(pmd_leaf(pmd)); if (pmd_none(pmd)) { pmdval_t pmdval =3D PMD_TYPE_TABLE | PMD_TABLE_UXN | PMD_TABLE_AF; phys_addr_t pte_phys; @@ -303,7 +303,7 @@ static int alloc_init_cont_pmd(pud_t *pudp, unsigned lo= ng addr, /* * Check for initial section mappings in the pgd/pud. */ - BUG_ON(pud_sect(pud)); + BUG_ON(pud_leaf(pud)); if (pud_none(pud)) { pudval_t pudval =3D PUD_TYPE_TABLE | PUD_TABLE_UXN | PUD_TABLE_AF; phys_addr_t pmd_phys; @@ -1499,7 +1499,7 @@ static void unmap_hotplug_pmd_range(pud_t *pudp, unsi= gned long addr, continue; =20 WARN_ON(!pmd_present(pmd)); - if (pmd_sect(pmd)) { + if (pmd_leaf(pmd)) { pmd_clear(pmdp); =20 /* @@ -1532,7 +1532,7 @@ static void unmap_hotplug_pud_range(p4d_t *p4dp, unsi= gned long addr, continue; =20 WARN_ON(!pud_present(pud)); - if (pud_sect(pud)) { + if (pud_leaf(pud)) { pud_clear(pudp); =20 /* @@ -1646,7 +1646,7 @@ static void free_empty_pmd_table(pud_t *pudp, unsigne= d long addr, if (pmd_none(pmd)) continue; =20 - WARN_ON(!pmd_present(pmd) || !pmd_table(pmd) || pmd_sect(pmd)); + WARN_ON(!pmd_present(pmd) || !pmd_table(pmd)); free_empty_pte_table(pmdp, addr, next, floor, ceiling); } while (addr =3D next, addr < end); =20 @@ -1686,7 +1686,7 @@ static void free_empty_pud_table(p4d_t *p4dp, unsigne= d long addr, if (pud_none(pud)) continue; =20 - WARN_ON(!pud_present(pud) || !pud_table(pud) || pud_sect(pud)); + WARN_ON(!pud_present(pud) || !pud_table(pud)); free_empty_pmd_table(pudp, addr, next, floor, ceiling); } while (addr =3D next, addr < end); =20 @@ -1782,7 +1782,7 @@ int __meminit vmemmap_check_pmd(pmd_t *pmdp, int node, { vmemmap_verify((pte_t *)pmdp, node, addr, next); =20 - return pmd_sect(READ_ONCE(*pmdp)); + return pmd_leaf(READ_ONCE(*pmdp)); } =20 int __meminit vmemmap_populate(unsigned long start, unsigned long end, int= node, @@ -1846,7 +1846,7 @@ void p4d_clear_huge(p4d_t *p4dp) =20 int pud_clear_huge(pud_t *pudp) { - if (!pud_sect(READ_ONCE(*pudp))) + if (!pud_leaf(READ_ONCE(*pudp))) return 0; pud_clear(pudp); return 1; @@ -1854,7 +1854,7 @@ int pud_clear_huge(pud_t *pudp) =20 int pmd_clear_huge(pmd_t *pmdp) { - if (!pmd_sect(READ_ONCE(*pmdp))) + if (!pmd_leaf(READ_ONCE(*pmdp))) return 0; pmd_clear(pmdp); return 1; --=20 2.43.0