From nobody Sat Feb 7 07:10:18 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E9BE227B50C; Mon, 2 Feb 2026 04:26:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770006388; cv=none; b=ieNppXA/iBqsqg63DtUODniBmjKTD2I3rAH6fh3FV8MHoo+3Kh/uKV5UUrJuPWc1cU1Kr9jhlC4i62nnJvmIhFvve091Fx4WpzdLfpBc0ehdtUg3RQWO6WkzlnQHYbs7W5/gWHtrbFtfXXDCuvdQiulE5S3CNXcWNwuNLU4TtTY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770006388; c=relaxed/simple; bh=v0nNJ4YRekgj7nMd9ZPwMS7trcO0J0EH0ZMTDF4LZEY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=bu2r5DURnJ445dM966b90ehh15S5HIGlEYCuIjOSv9PyCmfIY9jfH3SNl2gt0XDcljThp/g9Hgd27gzO33eXBVtgqhzITINVbnHDz6a8DlIbwQbY9s2tWdTRc7iVROMV/1VQyTSl2lr0WCR6TJGsmnBsQYCh4MIfWIc5pX1LSK4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 25B4B497; Sun, 1 Feb 2026 20:26:19 -0800 (PST) Received: from ergosum.cambridge.arm.com (ergosum.cambridge.arm.com [10.1.196.45]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 34DA73F740; Sun, 1 Feb 2026 20:26:24 -0800 (PST) From: Anshuman Khandual To: linux-arm-kernel@lists.infradead.org Cc: Anshuman Khandual , Catalin Marinas , Will Deacon , Ryan Roberts , Yang Shi , Christoph Lameter , linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: [PATCH 1/2] arm64/mm: Enable batched TLB flush in unmap_hotplug_range() Date: Mon, 2 Feb 2026 04:26:16 +0000 Message-Id: <20260202042617.504183-2-anshuman.khandual@arm.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20260202042617.504183-1-anshuman.khandual@arm.com> References: <20260202042617.504183-1-anshuman.khandual@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" During a memory hot remove operartion both linear and vmemmap mappings for the memory range being removed, get unmapped via unmap_hotplug_range() but mapped pages get freed only for vmemmap mapping. This is just a sequential operation where each table entry gets cleared, followed by a leaf specific TLB flush, and then followed by memory free operation when applicable. This approach was simple and uniform both for vmemmap and linear mappings. But linear mapping might contain CONT marked block memory where it becomes necessary to first clear out all entire in the range before a TLB flush. This is as per the architecture requirement. Hence batch all TLB flushes during the table tear down walk and finally do it in unmap_hotplug_range(). Besides it is helps in improving the performance via TLBI range operation along with reduced synchronization instructions. The time spent executing unmap_hotplug_range() improved 97% measured over a 2GB memory hot removal in KVM guest. This scheme is not applicable during vmemmap mapping tear down where memory needs to be freed and hence a TLB flush is required after clearing out page table entry. Cc: Catalin Marinas Cc: Will Deacon Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Closes: https://lore.kernel.org/all/aWZYXhrT6D2M-7-N@willie-the-truck/ Fixes: bbd6ec605c0f ("arm64/mm: Enable memory hot remove") Cc: stable@vger.kernel.org Signed-off-by: Ryan Roberts Signed-off-by: Anshuman Khandual Reviewed-by: Ryan Roberts --- arch/arm64/mm/mmu.c | 81 +++++++++++++++++++++++++++++++++++++-------- 1 file changed, 67 insertions(+), 14 deletions(-) diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 8e1d80a7033e..8ec8a287aaa1 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1458,10 +1458,32 @@ static void unmap_hotplug_pte_range(pmd_t *pmdp, un= signed long addr, =20 WARN_ON(!pte_present(pte)); __pte_clear(&init_mm, addr, ptep); - flush_tlb_kernel_range(addr, addr + PAGE_SIZE); - if (free_mapped) + if (free_mapped) { + /* + * If page is part of an existing contiguous + * memory block, individual TLB invalidation + * here would not be appropriate. Instead it + * will require clearing all entries for the + * memory block and subsequently a TLB flush + * for the entire range. + */ + WARN_ON(pte_cont(pte)); + + /* + * TLB flush is essential for freeing memory. + */ + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); free_hotplug_page_range(pte_page(pte), PAGE_SIZE, altmap); + } + + /* + * TLB flush is batched in unmap_hotplug_range() + * for the entire range, when memory need not be + * freed. Besides linear mapping might have CONT + * blocks where TLB flush needs to be done after + * clearing all relevant entries. + */ } while (addr +=3D PAGE_SIZE, addr < end); } =20 @@ -1482,15 +1504,32 @@ static void unmap_hotplug_pmd_range(pud_t *pudp, un= signed long addr, WARN_ON(!pmd_present(pmd)); if (pmd_sect(pmd)) { pmd_clear(pmdp); + if (free_mapped) { + /* + * If page is part of an existing contiguous + * memory block, individual TLB invalidation + * here would not be appropriate. Instead it + * will require clearing all entries for the + * memory block and subsequently a TLB flush + * for the entire range. + */ + WARN_ON(pmd_cont(pmd)); + + /* + * TLB flush is essential for freeing memory. + */ + flush_tlb_kernel_range(addr, addr + PMD_SIZE); + free_hotplug_page_range(pmd_page(pmd), + PMD_SIZE, altmap); + } =20 /* - * One TLBI should be sufficient here as the PMD_SIZE - * range is mapped with a single block entry. + * TLB flush is batched in unmap_hotplug_range() + * for the entire range, when memory need not be + * freed. Besides linear mapping might have CONT + * blocks where TLB flush needs to be done after + * clearing all relevant entries. */ - flush_tlb_kernel_range(addr, addr + PAGE_SIZE); - if (free_mapped) - free_hotplug_page_range(pmd_page(pmd), - PMD_SIZE, altmap); continue; } WARN_ON(!pmd_table(pmd)); @@ -1515,15 +1554,20 @@ static void unmap_hotplug_pud_range(p4d_t *p4dp, un= signed long addr, WARN_ON(!pud_present(pud)); if (pud_sect(pud)) { pud_clear(pudp); + if (free_mapped) { + /* + * TLB flush is essential for freeing memory. + */ + flush_tlb_kernel_range(addr, addr + PUD_SIZE); + free_hotplug_page_range(pud_page(pud), + PUD_SIZE, altmap); + } =20 /* - * One TLBI should be sufficient here as the PUD_SIZE - * range is mapped with a single block entry. + * TLB flush is batched in unmap_hotplug_range() + * for the entire range, when memory need not be + * freed. */ - flush_tlb_kernel_range(addr, addr + PAGE_SIZE); - if (free_mapped) - free_hotplug_page_range(pud_page(pud), - PUD_SIZE, altmap); continue; } WARN_ON(!pud_table(pud)); @@ -1553,6 +1597,7 @@ static void unmap_hotplug_p4d_range(pgd_t *pgdp, unsi= gned long addr, static void unmap_hotplug_range(unsigned long addr, unsigned long end, bool free_mapped, struct vmem_altmap *altmap) { + unsigned long start =3D addr; unsigned long next; pgd_t *pgdp, pgd; =20 @@ -1574,6 +1619,14 @@ static void unmap_hotplug_range(unsigned long addr, = unsigned long end, WARN_ON(!pgd_present(pgd)); unmap_hotplug_p4d_range(pgdp, addr, next, free_mapped, altmap); } while (addr =3D next, addr < end); + + /* + * Batched TLB flush only for linear mapping which + * might contain CONT blocks, and does not require + * freeing up memory as well. + */ + if (!free_mapped) + flush_tlb_kernel_range(start, end); } =20 static void free_empty_pte_table(pmd_t *pmdp, unsigned long addr, --=20 2.30.2 From nobody Sat Feb 7 07:10:18 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id BC3A228EA72; Mon, 2 Feb 2026 04:26:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770006389; cv=none; b=Ka2iaKXaHjYE6tU95Kpe7v0rIE7MxrrxUs6mvXj9eUDgadjq+Kxm4+5DTpXwUmESChANstHgvtkaoRZ8IcjlV0ANh43sIHJRfCNz+PXmT8vfAh9Rr3DK9gN4I6aCaN/EjE37wuOWGfxbfvptjMiPJJDJGF8jV67kZM7SoA3gYIk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770006389; c=relaxed/simple; bh=LOSCuA7MZGwrIIsEWNN8iCtblTwVXafkk+0MvfypKTs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=m6znUY1If8OAceyn92QUL0mbRvLkQRcHAKs0PkcOMGkJnDTJXh6fwxh5hrFhYrHFTmWOd0pubhYiQX3E5e2mnErAAE1zgx/Wv6/Hxa5Q8KUHhP7WxsPBMSqN+HADbTfu+ygazoHxNDaUoziqCj6NfI6s870WjpQIeSlwcXmwJhI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C100D1042; Sun, 1 Feb 2026 20:26:20 -0800 (PST) Received: from ergosum.cambridge.arm.com (ergosum.cambridge.arm.com [10.1.196.45]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id D2D5E3F740; Sun, 1 Feb 2026 20:26:25 -0800 (PST) From: Anshuman Khandual To: linux-arm-kernel@lists.infradead.org Cc: Anshuman Khandual , Catalin Marinas , Will Deacon , Ryan Roberts , Yang Shi , Christoph Lameter , linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: [PATCH 2/2] arm64/mm: Reject memory removal that splits a kernel leaf mapping Date: Mon, 2 Feb 2026 04:26:17 +0000 Message-Id: <20260202042617.504183-3-anshuman.khandual@arm.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20260202042617.504183-1-anshuman.khandual@arm.com> References: <20260202042617.504183-1-anshuman.khandual@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Linear and vmemmap mapings that get teared down during a memory hot remove operation might contain leaf level entries on any page table level. If the requested memory range's linear or vmemmap mappings falls within such leaf entries, new mappings need to be created for the remaning memory mapped on the leaf entry earlier, following standard break before make aka BBM rules. Currently memory hot remove operation does not perform such restructuring, and so removing memory ranges that could split a kernel leaf level mapping need to be rejected. Cc: Catalin Marinas Cc: Will Deacon Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Closes: https://lore.kernel.org/all/aWZYXhrT6D2M-7-N@willie-the-truck/ Fixes: bbd6ec605c0f ("arm64/mm: Enable memory hot remove") Cc: stable@vger.kernel.org Suggested-by: Ryan Roberts Signed-off-by: Anshuman Khandual --- arch/arm64/mm/mmu.c | 126 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 126 insertions(+) diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 8ec8a287aaa1..9d59e10fb3de 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -2063,6 +2063,129 @@ void arch_remove_memory(u64 start, u64 size, struct= vmem_altmap *altmap) __remove_pgd_mapping(swapper_pg_dir, __phys_to_virt(start), size); } =20 + +static bool split_kernel_leaf_boundary(unsigned long addr) +{ + pgd_t *pgdp, pgd; + p4d_t *p4dp, p4d; + pud_t *pudp, pud; + pmd_t *pmdp, pmd; + pte_t *ptep, pte; + + /* + * PGD: If addr is PGD aligned then addr already + * describes a leaf boundary. + */ + if (ALIGN_DOWN(addr, PGDIR_SIZE) =3D=3D addr) + return false; + + pgdp =3D pgd_offset_k(addr); + pgd =3D pgdp_get(pgdp); + if (!pgd_present(pgd)) + return false; + + /* + * P4D: If addr is P4D aligned then addr already + * describes a leaf boundary. + */ + if (ALIGN_DOWN(addr, P4D_SIZE) =3D=3D addr) + return false; + + p4dp =3D p4d_offset(pgdp, addr); + p4d =3D p4dp_get(p4dp); + if (!p4d_present(p4d)) + return false; + + /* + * PUD: If addr is PUD aligned then addr already + * describes a leaf boundary. + */ + if (ALIGN_DOWN(addr, PUD_SIZE) =3D=3D addr) + return false; + + pudp =3D pud_offset(p4dp, addr); + pud =3D pudp_get(pudp); + if (!pud_present(pud)) + return false; + + if (pud_leaf(pud)) + return true; + + /* + * CONT_PMD: If addr is CONT_PMD aligned then + * addr already describes a leaf boundary. + */ + if (ALIGN_DOWN(addr, CONT_PMD_SIZE) =3D=3D addr) + return false; + + pmdp =3D pmd_offset(pudp, addr); + pmd =3D pmdp_get(pmdp); + if (!pmd_present(pmd)) + return false; + + if (pmd_leaf(pmd) && pmd_cont(pmd)) + return true; + + /* + * PMD: If addr is PMD aligned then addr already + * describes a leaf boundary. + */ + if (ALIGN_DOWN(addr, PMD_SIZE) =3D=3D addr) + return false; + + if (pmd_leaf(pmd)) + return true; + + /* + * CONT_PTE: If addr is CONT_PTE aligned then addr + * already describes a leaf boundary. + */ + if (ALIGN_DOWN(addr, CONT_PTE_SIZE) =3D=3D addr) + return false; + + ptep =3D pte_offset_kernel(pmdp, addr); + pte =3D __ptep_get(ptep); + if (!pte_present(pte)) + return false; + + if (pte_valid(pte) && pte_cont(pte)) + return true; + + if (ALIGN_DOWN(addr, PAGE_SIZE) =3D=3D addr) + return false; + return true; +} + +static bool can_unmap_without_split(unsigned long pfn, unsigned long nr_pa= ges) +{ + unsigned long linear_start, linear_end, phys_start, phys_end; + unsigned long vmemmap_size, vmemmap_start, vmemmap_end; + + /* Assert linear map edges do not split a leaf entry */ + phys_start =3D PFN_PHYS(pfn); + phys_end =3D phys_start + nr_pages * PAGE_SIZE; + linear_start =3D __phys_to_virt(phys_start); + linear_end =3D __phys_to_virt(phys_end); + if (split_kernel_leaf_boundary(linear_start) || + split_kernel_leaf_boundary(linear_end)) { + pr_warn("[%lx %lx] splits a leaf entry in linear map\n", + phys_start, phys_end); + return false; + } + + /* Assert vmemmap edges do not split a leaf entry */ + vmemmap_size =3D nr_pages * sizeof(struct page); + vmemmap_start =3D (unsigned long) pfn_to_page(pfn); + vmemmap_end =3D vmemmap_start + vmemmap_size; + if (split_kernel_leaf_boundary(vmemmap_start) || + split_kernel_leaf_boundary(vmemmap_end)) { + pr_warn("[%lx %lx] splits a leaf entry in vmemmap\n", + phys_start, phys_end); + return false; + } + return true; +} + /* * This memory hotplug notifier helps prevent boot memory from being * inadvertently removed as it blocks pfn range offlining process in @@ -2083,6 +2206,9 @@ static int prevent_bootmem_remove_notifier(struct not= ifier_block *nb, if ((action !=3D MEM_GOING_OFFLINE) && (action !=3D MEM_OFFLINE)) return NOTIFY_OK; =20 + if (!can_unmap_without_split(pfn, arg->nr_pages)) + return NOTIFY_BAD; + for (; pfn < end_pfn; pfn +=3D PAGES_PER_SECTION) { unsigned long start =3D PFN_PHYS(pfn); unsigned long end =3D start + (1UL << PA_SECTION_SHIFT); --=20 2.30.2