kernel/liveupdate/kexec_handover.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
kho_reserve_scratch() iterates over all online NUMA nodes to allocate
per-node scratch memory. On systems with memoryless NUMA nodes (nodes
that have CPUs but no memory), memblock_alloc_range_nid() fails because
there is no memory available on that node. This causes KHO initialization
to fail and kho_enable to be set to false.
Some ARM64 systems have NUMA topologies where certain nodes contain only
CPUs without any associated memory. These configurations are valid and
should not prevent KHO from functioning.
Fix this by only counting nodes that have memory (N_MEMORY state) and
skip memoryless nodes in the per-node scratch allocation loop.
Signed-off-by: Evangelos Petrongonas <epetron@amazon.de>
---
v2:
- Removed kho_mem_nodes_count in favour of nodes_weight(nodes_state[N_MEMORY])
- Use for_each_node_state(nid, N_MEMORY) to loop over nodes that are both
online and have memory.
TIL: Nodes in N_MEMORY are a subset of those that are online. Thanks Mike :)
kernel/liveupdate/kexec_handover.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
index 9dc51fab604f..979ebaf015bf 100644
--- a/kernel/liveupdate/kexec_handover.c
+++ b/kernel/liveupdate/kexec_handover.c
@@ -643,7 +643,7 @@ static void __init kho_reserve_scratch(void)
scratch_size_update();
/* FIXME: deal with node hot-plug/remove */
- kho_scratch_cnt = num_online_nodes() + 2;
+ kho_scratch_cnt = nodes_weight(node_states[N_MEMORY]) + 2;
size = kho_scratch_cnt * sizeof(*kho_scratch);
kho_scratch = memblock_alloc(size, PAGE_SIZE);
if (!kho_scratch)
@@ -673,7 +673,11 @@ static void __init kho_reserve_scratch(void)
kho_scratch[i].size = size;
i++;
- for_each_online_node(nid) {
+ /*
+ * Loop over nodes that have both memory and are online. Skip
+ * memoryless nodes, as we can not allocate scratch areas there.
+ */
+ for_each_node_state(nid, N_MEMORY) {
size = scratch_size_node(nid);
addr = memblock_alloc_range_nid(size, CMA_MIN_ALIGNMENT_BYTES,
0, MEMBLOCK_ALLOC_ACCESSIBLE,
--
2.43.0
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
On Tue, 20 Jan 2026 17:59:11 +0000 Evangelos Petrongonas <epetron@amazon.de> wrote:
> kho_reserve_scratch() iterates over all online NUMA nodes to allocate
> per-node scratch memory. On systems with memoryless NUMA nodes (nodes
> that have CPUs but no memory), memblock_alloc_range_nid() fails because
> there is no memory available on that node. This causes KHO initialization
> to fail and kho_enable to be set to false.
>
> Some ARM64 systems have NUMA topologies where certain nodes contain only
> CPUs without any associated memory. These configurations are valid and
> should not prevent KHO from functioning.
>
> Fix this by only counting nodes that have memory (N_MEMORY state) and
> skip memoryless nodes in the per-node scratch allocation loop.
>
So kho is unusable on such machines.
Should we backport this? I'm thinking
Fixes: 3dc92c311498 ("kexec: add Kexec HandOver (KHO) generation helpers").
On Thu, Jan 22, 2026 at 03:21:12PM -0800, Andrew Morton wrote:
> On Tue, 20 Jan 2026 17:59:11 +0000 Evangelos Petrongonas <epetron@amazon.de> wrote:
>
> > kho_reserve_scratch() iterates over all online NUMA nodes to allocate
> > per-node scratch memory. On systems with memoryless NUMA nodes (nodes
> > that have CPUs but no memory), memblock_alloc_range_nid() fails because
> > there is no memory available on that node. This causes KHO initialization
> > to fail and kho_enable to be set to false.
> >
> > Some ARM64 systems have NUMA topologies where certain nodes contain only
> > CPUs without any associated memory. These configurations are valid and
> > should not prevent KHO from functioning.
> >
> > Fix this by only counting nodes that have memory (N_MEMORY state) and
> > skip memoryless nodes in the per-node scratch allocation loop.
> >
>
> So kho is unusable on such machines.
>
> Should we backport this? I'm thinking
>
> Fixes: 3dc92c311498 ("kexec: add Kexec HandOver (KHO) generation helpers").
It's only for v6.18, but sure, why not.
--
Sincerely yours,
Mike.
On Tue, Jan 20, 2026 at 12:59 PM Evangelos Petrongonas
<epetron@amazon.de> wrote:
>
> kho_reserve_scratch() iterates over all online NUMA nodes to allocate
> per-node scratch memory. On systems with memoryless NUMA nodes (nodes
> that have CPUs but no memory), memblock_alloc_range_nid() fails because
> there is no memory available on that node. This causes KHO initialization
> to fail and kho_enable to be set to false.
>
> Some ARM64 systems have NUMA topologies where certain nodes contain only
> CPUs without any associated memory. These configurations are valid and
> should not prevent KHO from functioning.
>
> Fix this by only counting nodes that have memory (N_MEMORY state) and
> skip memoryless nodes in the per-node scratch allocation loop.
>
> Signed-off-by: Evangelos Petrongonas <epetron@amazon.de>
> ---
> v2:
> - Removed kho_mem_nodes_count in favour of nodes_weight(nodes_state[N_MEMORY])
> - Use for_each_node_state(nid, N_MEMORY) to loop over nodes that are both
> online and have memory.
> TIL: Nodes in N_MEMORY are a subset of those that are online. Thanks Mike :)
>
> kernel/liveupdate/kexec_handover.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
> index 9dc51fab604f..979ebaf015bf 100644
> --- a/kernel/liveupdate/kexec_handover.c
> +++ b/kernel/liveupdate/kexec_handover.c
> @@ -643,7 +643,7 @@ static void __init kho_reserve_scratch(void)
> scratch_size_update();
>
> /* FIXME: deal with node hot-plug/remove */
> - kho_scratch_cnt = num_online_nodes() + 2;
> + kho_scratch_cnt = nodes_weight(node_states[N_MEMORY]) + 2;
> size = kho_scratch_cnt * sizeof(*kho_scratch);
> kho_scratch = memblock_alloc(size, PAGE_SIZE);
> if (!kho_scratch)
> @@ -673,7 +673,11 @@ static void __init kho_reserve_scratch(void)
> kho_scratch[i].size = size;
> i++;
>
> - for_each_online_node(nid) {
> + /*
> + * Loop over nodes that have both memory and are online. Skip
> + * memoryless nodes, as we can not allocate scratch areas there.
> + */
> + for_each_node_state(nid, N_MEMORY) {
> size = scratch_size_node(nid);
> addr = memblock_alloc_range_nid(size, CMA_MIN_ALIGNMENT_BYTES,
> 0, MEMBLOCK_ALLOC_ACCESSIBLE,
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
On Tue, Jan 20, 2026 at 05:59:11PM +0000, Evangelos Petrongonas wrote:
> kho_reserve_scratch() iterates over all online NUMA nodes to allocate
> per-node scratch memory. On systems with memoryless NUMA nodes (nodes
> that have CPUs but no memory), memblock_alloc_range_nid() fails because
> there is no memory available on that node. This causes KHO initialization
> to fail and kho_enable to be set to false.
>
> Some ARM64 systems have NUMA topologies where certain nodes contain only
> CPUs without any associated memory. These configurations are valid and
> should not prevent KHO from functioning.
>
> Fix this by only counting nodes that have memory (N_MEMORY state) and
> skip memoryless nodes in the per-node scratch allocation loop.
>
> Signed-off-by: Evangelos Petrongonas <epetron@amazon.de>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> ---
> v2:
> - Removed kho_mem_nodes_count in favour of nodes_weight(nodes_state[N_MEMORY])
> - Use for_each_node_state(nid, N_MEMORY) to loop over nodes that are both
> online and have memory.
> TIL: Nodes in N_MEMORY are a subset of those that are online. Thanks Mike :)
>
> kernel/liveupdate/kexec_handover.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
> index 9dc51fab604f..979ebaf015bf 100644
> --- a/kernel/liveupdate/kexec_handover.c
> +++ b/kernel/liveupdate/kexec_handover.c
> @@ -643,7 +643,7 @@ static void __init kho_reserve_scratch(void)
> scratch_size_update();
>
> /* FIXME: deal with node hot-plug/remove */
> - kho_scratch_cnt = num_online_nodes() + 2;
> + kho_scratch_cnt = nodes_weight(node_states[N_MEMORY]) + 2;
> size = kho_scratch_cnt * sizeof(*kho_scratch);
> kho_scratch = memblock_alloc(size, PAGE_SIZE);
> if (!kho_scratch)
> @@ -673,7 +673,11 @@ static void __init kho_reserve_scratch(void)
> kho_scratch[i].size = size;
> i++;
>
> - for_each_online_node(nid) {
> + /*
> + * Loop over nodes that have both memory and are online. Skip
> + * memoryless nodes, as we can not allocate scratch areas there.
> + */
> + for_each_node_state(nid, N_MEMORY) {
> size = scratch_size_node(nid);
> addr = memblock_alloc_range_nid(size, CMA_MIN_ALIGNMENT_BYTES,
> 0, MEMBLOCK_ALLOC_ACCESSIBLE,
> --
> 2.43.0
>
>
>
>
> Amazon Web Services Development Center Germany GmbH
> Tamara-Danz-Str. 13
> 10243 Berlin
> Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
> Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
> Sitz: Berlin
> Ust-ID: DE 365 538 597
>
--
Sincerely yours,
Mike.
On Tue, Jan 20 2026, Evangelos Petrongonas wrote: > kho_reserve_scratch() iterates over all online NUMA nodes to allocate > per-node scratch memory. On systems with memoryless NUMA nodes (nodes > that have CPUs but no memory), memblock_alloc_range_nid() fails because > there is no memory available on that node. This causes KHO initialization > to fail and kho_enable to be set to false. > > Some ARM64 systems have NUMA topologies where certain nodes contain only > CPUs without any associated memory. These configurations are valid and > should not prevent KHO from functioning. > > Fix this by only counting nodes that have memory (N_MEMORY state) and > skip memoryless nodes in the per-node scratch allocation loop. > > Signed-off-by: Evangelos Petrongonas <epetron@amazon.de> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> [...] -- Regards, Pratyush Yadav
© 2016 - 2026 Red Hat, Inc.