From nobody Mon Jun 15 12:20:21 2026
Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com
 [209.85.214.178])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id ECEE4342526
	for <linux-kernel@vger.kernel.org>; Fri, 10 Apr 2026 09:24:28 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.178
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1775813071; cv=none;
 b=NX87/zv4oFJQcpx1w+0DWJi81pfTmzwRLtKRX6qzwprvaQPJq6U6GB2MzbcAzB2clL/U4aHSkGu5XxDutjQzlnG3w+OfVzBT7Z5y0IJOqd+9YOTU9P/zkw1fgdN/czLwW1eG/g/3/ezpfuGepaUe2PX1Vr+DPsgraC9nhpRht4A=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1775813071; c=relaxed/simple;
	bh=/fMd4it+qSwp/CVwXbvt+QluOLSv1xTEN5Wki1tf1Iw=;
	h=From:To:Cc:Subject:Date:Message-Id:MIME-Version;
 b=Ns2HH1KDb6B4Rr87DF3izdj21jYCyMoVc8aTGP3pCRNMaVhWIsNKAXVMY2i52PyeU+CXocDIBdKPW+YuC8XLo2yAM7oQj11Zj0fHAnOWBTBdR08sae09HL+VziCHSYTlp0qR4SJizcD915BhKHTfjdeBfHAvdp7I6PQa+nMm6E0=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=bytedance.com;
 spf=pass smtp.mailfrom=bytedance.com;
 dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com
 header.b=lP8BTX66; arc=none smtp.client-ip=209.85.214.178
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=bytedance.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=bytedance.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com
 header.b="lP8BTX66"
Received: by mail-pl1-f178.google.com with SMTP id
 d9443c01a7336-2b25cf1b5f0so13596005ad.3
        for <linux-kernel@vger.kernel.org>;
 Fri, 10 Apr 2026 02:24:28 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=bytedance.com; s=google; t=1775813068; x=1776417868;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:message-id:date:subject:cc
         :to:from:from:to:cc:subject:date:message-id:reply-to;
        bh=MYV/0duq4LOt+J5kTnYMXRFeei6PB/waIW4pRM0G3pQ=;
        b=lP8BTX66AGFegA9/rPib1jhypwxo4KCfWQh+OxPjoV9Ex2vNXHl6EWTvLLLmPL37J6
         x0qNofZ4NL60DwrJQlMlSAVmLITQICrDVzhOvz0cO9dZGj2hQEwFJTP8chttuGevd1Gw
         hRfq21eEEut80QrBIW7gCVw1H2eLwnLQXxVMJV02+ur6rdiKuPRGb33e+qWjF9eIwEN7
         Fgc/Qc1qLfcoM9iqK2ZKEmfY6NN0KAipWFeNHRXPyG7jHOvtY06dGd7T+3NqUO8KzPWJ
         Z3jabXXLq2nsSiGv1mB4Ub7psAj/a4kNHnHliYSJPMOy/iivP72fzM2uSifbbBehFOFG
         MGHA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20251104; t=1775813068; x=1776417868;
        h=content-transfer-encoding:mime-version:message-id:date:subject:cc
         :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date
         :message-id:reply-to;
        bh=MYV/0duq4LOt+J5kTnYMXRFeei6PB/waIW4pRM0G3pQ=;
        b=mXy3qxFFMQxxQU52YMc/cKoxRGr0I874b9OpqUKYUTVEodR5ejQHVpFDq+HAOplRqX
         6mtkPFlCsCmzc1gQH6gW0UnCRYZ8EP/XTGV6NZNSkgEQV35Wvq5qcz+W75s1RkPgg0VI
         JIBSf/8cTUJobRdQsnqWchHydww4i1CL582cTr6HjznpS46RKVa45Z7nR6ZYeechWocS
         VoNOcq2pGqdTwnVdf3VjvjlyeizPAIRIV9bae+CCHKolFIXviPkHt/Qno+pJIiHOoQZe
         VZIG4LQB+YcaH5uCS6q79m6Kme6CtpfzDmbNpFP6OalFAMCfFr16XXhgRIr5dZ6dbJp+
         AO5Q==
X-Forwarded-Encrypted: i=1;
 AJvYcCUk1agzIKzCxit5b4ie1KxQLbvHgPSPMhXQzW877V9px1B1HDgRmaHdNO+zQ2h0VHc8vfrO3kLGP1/RpHM=@vger.kernel.org
X-Gm-Message-State: AOJu0YxNCFC+/74/Z9lyFZRpB4mJ87RmU61K8rdvsHCerk4ntp3iwPhM
	3XJb5E/JrV3d9U8CKGHQH1WSBWzyD4Qd2O81f3t5HdeSMtzU+iNHHMs94xo1ZuYAusA=
X-Gm-Gg: AeBDietEFop4mKny5C60ad1GC1oZh7Vw1EpMy1w2ZA0aKwesWmDgjQG9Lvmpl4hXvmI
	y350iTuvQ15+40uhJ2ZL5xx+ePfqZuhzMIhbjztb/ygLJeLFBNBEqko6cmX3ZH1SCpiNKm30BmX
	zyFBAqTlCiQY6OyUk6QB4vOmzXeOeYl+s7qyFa3RaFmteYkbzVEmIqcd7OozLYTAombkDWcsZLN
	W5AT0j2QYceANVFVr3kVvu0WwZpSKnWoLNjKVt4nq/ssV7pmoWDD6dUHun0IIu1HvCkpY3PyMfh
	xBqGTlibazD7Hocp3SBX+1NpaEktGJe0Vd6Gm339R/PZK9brzTxBnCr8zQvWKDl2f5V4lw/w3vK
	1jXzTcFUMjrMGQijAa4IfKJ5WXPed/BZZM5k/qcSKnaA3MXZwxJdqnoUoH/qMFC0sD96BZ1An8j
	mUDmSImyCNoFRnQNQI343FELea64V5Mqm1F8+J3CHknT0=
X-Received: by 2002:a17:903:3d54:b0:2b0:c90f:449d with SMTP id
 d9443c01a7336-2b2d5974e97mr19644455ad.19.1775813068134;
        Fri, 10 Apr 2026 02:24:28 -0700 (PDT)
Received: from n232-176-004.byted.org ([36.110.163.98])
        by smtp.gmail.com with ESMTPSA id
 d9443c01a7336-2b2d4dd7faasm23159065ad.26.2026.04.10.02.24.24
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Fri, 10 Apr 2026 02:24:27 -0700 (PDT)
From: Muchun Song <songmuchun@bytedance.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>,
	Muchun Song <songmuchun@bytedance.com>,
	Lorenzo Stoakes <ljs@kernel.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>,
	linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH v2] mm/sparse: Remove sparse buffer pre-allocation mechanism
Date: Fri, 10 Apr 2026 17:24:19 +0800
Message-Id: <20260410092419.2446420-1-songmuchun@bytedance.com>
X-Mailer: git-send-email 2.20.1
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Commit 9bdac9142407 ("sparsemem: Put mem map for one node together.")
introduced a mechanism to pre-allocate a large memory block to hold all
memmaps for a NUMA node upfront.

However, the original commit message did not clearly state the actual
benefits or the necessity of explicitly pre-allocating a single chunk
for all memmap areas of a given node.

One of the concerns about removing this pre-allocation is that the
subsequent per-section memmap allocations could become scattered around,
and might turn too many memory blocks/sections into an "un-offlinable"
state. However, tests show that even without the explicit node-wide
pre-allocation, memblock still allocates memory closely and
back-to-back. When tracing vmemmap_set_pmd allocations, the physical
chunks allocated by memblock are strictly adjacent to each other in a
single contiguous physical range (mapped top-down). Because they are
packed tightly together naturally, they will at most consume or pollute
the exact same number of memory blocks as the explicit pre-allocation
did.

Another concern is the boot performance impact of calling memmap_alloc()
multiple times compared to one large node-wide allocation. Tests on a
256GB VM showed that memmap allocation time increased from 199,555 ns
to 741,292 ns. Even though it is 3.7x slower, on a 1TB machine, the
entire memory allocation time would only take a few milliseconds. This
boot performance difference is completely negligible.

Since no negative impact on memory offlining behavior or noticeable
boot performance regression was found, this patch proposes removing
the explicit node-wide memmap pre-allocation mechanism to reduce the
maintenance burden.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
Changes in v2:
 - Addressed David Hildenbrand's and Mike Rapoport's concerns from the
   v1 discussion by incorporating the detailed memblock contiguous
   allocation analysis and the boot performance measurements directly
   into the commit message.
---
 include/linux/mm.h  |  1 -
 mm/sparse-vmemmap.c |  7 +-----
 mm/sparse.c         | 58 +--------------------------------------------
 3 files changed, 2 insertions(+), 64 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0b776907152e..1d676fef4303 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4855,7 +4855,6 @@ static inline void print_vma_addr(char *prefix, unsig=
ned long rip)
 }
 #endif
=20
-void *sparse_buffer_alloc(unsigned long size);
 unsigned long section_map_size(void);
 struct page * __populate_section_memmap(unsigned long pfn,
 		unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 6eadb9d116e4..aca1b00e86dd 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -87,15 +87,10 @@ static void * __meminit altmap_alloc_block_buf(unsigned=
 long size,
 void * __meminit vmemmap_alloc_block_buf(unsigned long size, int node,
 					 struct vmem_altmap *altmap)
 {
-	void *ptr;
-
 	if (altmap)
 		return altmap_alloc_block_buf(size, altmap);
=20
-	ptr =3D sparse_buffer_alloc(size);
-	if (!ptr)
-		ptr =3D vmemmap_alloc_block(size, node);
-	return ptr;
+	return vmemmap_alloc_block(size, node);
 }
=20
 static unsigned long __meminit vmem_altmap_next_pfn(struct vmem_altmap *al=
tmap)
diff --git a/mm/sparse.c b/mm/sparse.c
index effdac6b0ab1..672e2ad396a8 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -241,12 +241,9 @@ struct page __init *__populate_section_memmap(unsigned=
 long pfn,
 		struct dev_pagemap *pgmap)
 {
 	unsigned long size =3D section_map_size();
-	struct page *map =3D sparse_buffer_alloc(size);
+	struct page *map;
 	phys_addr_t addr =3D __pa(MAX_DMA_ADDRESS);
=20
-	if (map)
-		return map;
-
 	map =3D memmap_alloc(size, size, addr, nid, false);
 	if (!map)
 		panic("%s: Failed to allocate %lu bytes align=3D0x%lx nid=3D%d from=3D%p=
a\n",
@@ -256,55 +253,6 @@ struct page __init *__populate_section_memmap(unsigned=
 long pfn,
 }
 #endif /* !CONFIG_SPARSEMEM_VMEMMAP */
=20
-static void *sparsemap_buf __meminitdata;
-static void *sparsemap_buf_end __meminitdata;
-
-static inline void __meminit sparse_buffer_free(unsigned long size)
-{
-	WARN_ON(!sparsemap_buf || size =3D=3D 0);
-	memblock_free(sparsemap_buf, size);
-}
-
-static void __init sparse_buffer_init(unsigned long size, int nid)
-{
-	phys_addr_t addr =3D __pa(MAX_DMA_ADDRESS);
-	WARN_ON(sparsemap_buf);	/* forgot to call sparse_buffer_fini()? */
-	/*
-	 * Pre-allocated buffer is mainly used by __populate_section_memmap
-	 * and we want it to be properly aligned to the section size - this is
-	 * especially the case for VMEMMAP which maps memmap to PMDs
-	 */
-	sparsemap_buf =3D memmap_alloc(size, section_map_size(), addr, nid, true);
-	sparsemap_buf_end =3D sparsemap_buf + size;
-}
-
-static void __init sparse_buffer_fini(void)
-{
-	unsigned long size =3D sparsemap_buf_end - sparsemap_buf;
-
-	if (sparsemap_buf && size > 0)
-		sparse_buffer_free(size);
-	sparsemap_buf =3D NULL;
-}
-
-void * __meminit sparse_buffer_alloc(unsigned long size)
-{
-	void *ptr =3D NULL;
-
-	if (sparsemap_buf) {
-		ptr =3D (void *) roundup((unsigned long)sparsemap_buf, size);
-		if (ptr + size > sparsemap_buf_end)
-			ptr =3D NULL;
-		else {
-			/* Free redundant aligned space */
-			if ((unsigned long)(ptr - sparsemap_buf) > 0)
-				sparse_buffer_free((unsigned long)(ptr - sparsemap_buf));
-			sparsemap_buf =3D ptr + size;
-		}
-	}
-	return ptr;
-}
-
 void __weak __meminit vmemmap_populate_print_last(void)
 {
 }
@@ -362,8 +310,6 @@ static void __init sparse_init_nid(int nid, unsigned lo=
ng pnum_begin,
 		goto failed;
 	}
=20
-	sparse_buffer_init(map_count * section_map_size(), nid);
-
 	sparse_vmemmap_init_nid_early(nid);
=20
 	for_each_present_section_nr(pnum_begin, pnum) {
@@ -381,7 +327,6 @@ static void __init sparse_init_nid(int nid, unsigned lo=
ng pnum_begin,
 				       __func__, nid);
 				pnum_begin =3D pnum;
 				sparse_usage_fini();
-				sparse_buffer_fini();
 				goto failed;
 			}
 			memmap_boot_pages_add(DIV_ROUND_UP(PAGES_PER_SECTION * sizeof(struct pa=
ge),
@@ -390,7 +335,6 @@ static void __init sparse_init_nid(int nid, unsigned lo=
ng pnum_begin,
 		}
 	}
 	sparse_usage_fini();
-	sparse_buffer_fini();
 	return;
 failed:
 	/*
--=20
2.20.1