The scratch memory for kexec handover is used to bootstrap the
kexec'ed kernel. It is only needed when CONFIG_KEXEC_HANDOVER
is enabled and only if it is a KHO boot. Add checks to prevent
marking a KHO scratch region unless needed.
Fixes: a2daf83e10378 ("x86/e820: temporarily enable KHO scratch for memory below 1M")
Reported-by: Vlad Poenaru <thevlad@meta.com>
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
---
mm/memblock.c | 74 ++++++++++++++++++++++++++++++---------------------
1 file changed, 44 insertions(+), 30 deletions(-)
diff --git a/mm/memblock.c b/mm/memblock.c
index 8b13d5c28922a..8a2cebcfe0a18 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1114,36 +1114,6 @@ int __init_memblock memblock_reserved_mark_noinit(phys_addr_t base, phys_addr_t
MEMBLOCK_RSRV_NOINIT);
}
-/**
- * memblock_mark_kho_scratch - Mark a memory region as MEMBLOCK_KHO_SCRATCH.
- * @base: the base phys addr of the region
- * @size: the size of the region
- *
- * Only memory regions marked with %MEMBLOCK_KHO_SCRATCH will be considered
- * for allocations during early boot with kexec handover.
- *
- * Return: 0 on success, -errno on failure.
- */
-__init int memblock_mark_kho_scratch(phys_addr_t base, phys_addr_t size)
-{
- return memblock_setclr_flag(&memblock.memory, base, size, 1,
- MEMBLOCK_KHO_SCRATCH);
-}
-
-/**
- * memblock_clear_kho_scratch - Clear MEMBLOCK_KHO_SCRATCH flag for a
- * specified region.
- * @base: the base phys addr of the region
- * @size: the size of the region
- *
- * Return: 0 on success, -errno on failure.
- */
-__init int memblock_clear_kho_scratch(phys_addr_t base, phys_addr_t size)
-{
- return memblock_setclr_flag(&memblock.memory, base, size, 0,
- MEMBLOCK_KHO_SCRATCH);
-}
-
static bool should_skip_region(struct memblock_type *type,
struct memblock_region *m,
int nid, int flags)
@@ -2617,12 +2587,56 @@ static bool __init reserve_mem_kho_revive(const char *name, phys_addr_t size,
return true;
}
+
+/**
+ * memblock_mark_kho_scratch - Mark a memory region as MEMBLOCK_KHO_SCRATCH.
+ * @base: the base phys addr of the region
+ * @size: the size of the region
+ *
+ * Only memory regions marked with %MEMBLOCK_KHO_SCRATCH will be considered
+ * for allocations during early boot with kexec handover.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+__init int memblock_mark_kho_scratch(phys_addr_t base, phys_addr_t size)
+{
+ if (is_kho_boot())
+ return memblock_setclr_flag(&memblock.memory, base, size, 1,
+ MEMBLOCK_KHO_SCRATCH);
+ return 0;
+}
+
+/**
+ * memblock_clear_kho_scratch - Clear MEMBLOCK_KHO_SCRATCH flag for a
+ * specified region.
+ * @base: the base phys addr of the region
+ * @size: the size of the region
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+__init int memblock_clear_kho_scratch(phys_addr_t base, phys_addr_t size)
+{
+ if (is_kho_boot())
+ return memblock_setclr_flag(&memblock.memory, base, size, 0,
+ MEMBLOCK_KHO_SCRATCH);
+ return 0;
+}
#else
static bool __init reserve_mem_kho_revive(const char *name, phys_addr_t size,
phys_addr_t align)
{
return false;
}
+
+__init int memblock_mark_kho_scratch(phys_addr_t base, phys_addr_t size)
+{
+ return 0;
+}
+
+__init int memblock_clear_kho_scratch(phys_addr_t base, phys_addr_t size)
+{
+ return 0;
+}
#endif /* CONFIG_KEXEC_HANDOVER */
/*
--
2.47.3
On Thu, Nov 27, 2025 at 08:33:20PM +0000, Usama Arif wrote:
> The scratch memory for kexec handover is used to bootstrap the
> kexec'ed kernel. It is only needed when CONFIG_KEXEC_HANDOVER
> is enabled and only if it is a KHO boot. Add checks to prevent
> marking a KHO scratch region unless needed.
Please add a paragraph along the lines of Pratyush's note from
https://lore.kernel.org/all/86bjknyxgu.fsf@kernel.org/:
Yeah, I don't think it will have much of a difference in practice, but I
do think it is a good correctness fix. Marking the lower 1M as scratch
is a hack to get around the limitations with KHO, and we should not be
doing that when KHO isn't involved.
> Fixes: a2daf83e10378 ("x86/e820: temporarily enable KHO scratch for memory below 1M")
> Reported-by: Vlad Poenaru <thevlad@meta.com>
> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> ---
> mm/memblock.c | 74 ++++++++++++++++++++++++++++++---------------------
> 1 file changed, 44 insertions(+), 30 deletions(-)
>
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 8b13d5c28922a..8a2cebcfe0a18 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1114,36 +1114,6 @@ int __init_memblock memblock_reserved_mark_noinit(phys_addr_t base, phys_addr_t
> MEMBLOCK_RSRV_NOINIT);
> }
>
> -/**
> - * memblock_mark_kho_scratch - Mark a memory region as MEMBLOCK_KHO_SCRATCH.
> - * @base: the base phys addr of the region
> - * @size: the size of the region
> - *
> - * Only memory regions marked with %MEMBLOCK_KHO_SCRATCH will be considered
> - * for allocations during early boot with kexec handover.
> - *
> - * Return: 0 on success, -errno on failure.
> - */
> -__init int memblock_mark_kho_scratch(phys_addr_t base, phys_addr_t size)
> -{
> - return memblock_setclr_flag(&memblock.memory, base, size, 1,
> - MEMBLOCK_KHO_SCRATCH);
> -}
> -
> -/**
> - * memblock_clear_kho_scratch - Clear MEMBLOCK_KHO_SCRATCH flag for a
> - * specified region.
> - * @base: the base phys addr of the region
> - * @size: the size of the region
> - *
> - * Return: 0 on success, -errno on failure.
> - */
> -__init int memblock_clear_kho_scratch(phys_addr_t base, phys_addr_t size)
> -{
> - return memblock_setclr_flag(&memblock.memory, base, size, 0,
> - MEMBLOCK_KHO_SCRATCH);
> -}
No need to move these functions under #ifdef CONFIG_KEXEC_HANDOVER. We
already have inline stubs when CONFIG_KEXEC_HANDOVER=n in
include/linux/memblock.h
Just add 'if (is_kho_boot())' here and in memblock_mark_kho_scratch().
> static bool should_skip_region(struct memblock_type *type,
> struct memblock_region *m,
> int nid, int flags)
--
Sincerely yours,
Mike.
On Thu, Nov 27 2025, Usama Arif wrote:
> The scratch memory for kexec handover is used to bootstrap the
> kexec'ed kernel. It is only needed when CONFIG_KEXEC_HANDOVER
> is enabled and only if it is a KHO boot. Add checks to prevent
> marking a KHO scratch region unless needed.
>
> Fixes: a2daf83e10378 ("x86/e820: temporarily enable KHO scratch for memory below 1M")
> Reported-by: Vlad Poenaru <thevlad@meta.com>
> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> ---
> mm/memblock.c | 74 ++++++++++++++++++++++++++++++---------------------
> 1 file changed, 44 insertions(+), 30 deletions(-)
>
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 8b13d5c28922a..8a2cebcfe0a18 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1114,36 +1114,6 @@ int __init_memblock memblock_reserved_mark_noinit(phys_addr_t base, phys_addr_t
> MEMBLOCK_RSRV_NOINIT);
> }
>
> -/**
> - * memblock_mark_kho_scratch - Mark a memory region as MEMBLOCK_KHO_SCRATCH.
> - * @base: the base phys addr of the region
> - * @size: the size of the region
> - *
> - * Only memory regions marked with %MEMBLOCK_KHO_SCRATCH will be considered
> - * for allocations during early boot with kexec handover.
> - *
> - * Return: 0 on success, -errno on failure.
> - */
> -__init int memblock_mark_kho_scratch(phys_addr_t base, phys_addr_t size)
> -{
> - return memblock_setclr_flag(&memblock.memory, base, size, 1,
> - MEMBLOCK_KHO_SCRATCH);
> -}
> -
> -/**
> - * memblock_clear_kho_scratch - Clear MEMBLOCK_KHO_SCRATCH flag for a
> - * specified region.
> - * @base: the base phys addr of the region
> - * @size: the size of the region
> - *
> - * Return: 0 on success, -errno on failure.
> - */
> -__init int memblock_clear_kho_scratch(phys_addr_t base, phys_addr_t size)
> -{
> - return memblock_setclr_flag(&memblock.memory, base, size, 0,
> - MEMBLOCK_KHO_SCRATCH);
> -}
> -
> static bool should_skip_region(struct memblock_type *type,
> struct memblock_region *m,
> int nid, int flags)
> @@ -2617,12 +2587,56 @@ static bool __init reserve_mem_kho_revive(const char *name, phys_addr_t size,
>
> return true;
> }
> +
> +/**
> + * memblock_mark_kho_scratch - Mark a memory region as MEMBLOCK_KHO_SCRATCH.
> + * @base: the base phys addr of the region
> + * @size: the size of the region
> + *
> + * Only memory regions marked with %MEMBLOCK_KHO_SCRATCH will be considered
> + * for allocations during early boot with kexec handover.
> + *
> + * Return: 0 on success, -errno on failure.
> + */
> +__init int memblock_mark_kho_scratch(phys_addr_t base, phys_addr_t size)
> +{
> + if (is_kho_boot())
> + return memblock_setclr_flag(&memblock.memory, base, size, 1,
> + MEMBLOCK_KHO_SCRATCH);
> + return 0;
> +}
> +
> +/**
> + * memblock_clear_kho_scratch - Clear MEMBLOCK_KHO_SCRATCH flag for a
> + * specified region.
> + * @base: the base phys addr of the region
> + * @size: the size of the region
> + *
> + * Return: 0 on success, -errno on failure.
> + */
> +__init int memblock_clear_kho_scratch(phys_addr_t base, phys_addr_t size)
> +{
> + if (is_kho_boot())
> + return memblock_setclr_flag(&memblock.memory, base, size, 0,
> + MEMBLOCK_KHO_SCRATCH);
> + return 0;
> +}
> #else
> static bool __init reserve_mem_kho_revive(const char *name, phys_addr_t size,
> phys_addr_t align)
> {
> return false;
> }
> +
> +__init int memblock_mark_kho_scratch(phys_addr_t base, phys_addr_t size)
> +{
> + return 0;
> +}
> +
> +__init int memblock_clear_kho_scratch(phys_addr_t base, phys_addr_t size)
> +{
> + return 0;
> +}
Nit: I don't think we need the alternate version here. When
CONFIG_KEXEC_HANDOVER is disabled, is_kho_boot() is
static inline bool is_kho_boot(void)
{
return false;
}
So the above functions work for both cases.
I would prefer to not have two variants, but I don't think it is a
blocker. Up to you.
Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
> #endif /* CONFIG_KEXEC_HANDOVER */
>
> /*
--
Regards,
Pratyush Yadav
On Thu, 27 Nov 2025 20:33:20 +0000 Usama Arif <usamaarif642@gmail.com> wrote: > The scratch memory for kexec handover is used to bootstrap the > kexec'ed kernel. It is only needed when CONFIG_KEXEC_HANDOVER > is enabled and only if it is a KHO boot. Add checks to prevent > marking a KHO scratch region unless needed. What effect does this change have? Lessened memory consumption, presumably. Of what magnitude and for what time period?
On 27/11/2025 20:55, Andrew Morton wrote: > On Thu, 27 Nov 2025 20:33:20 +0000 Usama Arif <usamaarif642@gmail.com> wrote: > >> The scratch memory for kexec handover is used to bootstrap the >> kexec'ed kernel. It is only needed when CONFIG_KEXEC_HANDOVER >> is enabled and only if it is a KHO boot. Add checks to prevent >> marking a KHO scratch region unless needed. > > What effect does this change have? Lessened memory consumption, > presumably. Of what magnitude and for what time period? For some context, this came out of https://lore.kernel.org/all/ba690e06-c2a1-4d2e-9428-9ca2ea9f2b86@gmail.com/ (I should have probably added that in the commit message..) We are experiencing several warnings a day in meta fleet due to a warning introduced in that patch. We dont have CONFIG_KEXEC_HANDOVER enabled in the fleet. The IMA memory seems to conincide with the 1st MB, but as Mike pointed out they are different arrays so this scratch memory is likely not a cause of the warnings. But it is not useful (and was a bit confusing) seeing KHO scratch memory being marked even when KHO is disabled. The imapct is as you said, but its only marked for a very short period of time. I think a better reason for this patch is just to not mark the memory at all when KHO is disabled (or not in use) for clarity.
On Thu, Nov 27 2025, Usama Arif wrote: > On 27/11/2025 20:55, Andrew Morton wrote: >> On Thu, 27 Nov 2025 20:33:20 +0000 Usama Arif <usamaarif642@gmail.com> wrote: >> >>> The scratch memory for kexec handover is used to bootstrap the >>> kexec'ed kernel. It is only needed when CONFIG_KEXEC_HANDOVER >>> is enabled and only if it is a KHO boot. Add checks to prevent >>> marking a KHO scratch region unless needed. >> >> What effect does this change have? Lessened memory consumption, >> presumably. Of what magnitude and for what time period? > > For some context, this came out of https://lore.kernel.org/all/ba690e06-c2a1-4d2e-9428-9ca2ea9f2b86@gmail.com/ > (I should have probably added that in the commit message..) > We are experiencing several warnings a day in meta fleet due to a warning introduced > in that patch. We dont have CONFIG_KEXEC_HANDOVER enabled in the fleet. The IMA memory > seems to conincide with the 1st MB, but as Mike pointed out they are different arrays > so this scratch memory is likely not a cause of the warnings. But it is not useful (and > was a bit confusing) seeing KHO scratch memory being marked even when KHO is disabled. Yeah, it is not yet clear if this is really the root cause for your issue. > > The imapct is as you said, but its only marked for a very short period of time. > I think a better reason for this patch is just to not mark the memory at all when KHO > is disabled (or not in use) for clarity. Yeah, I don't think it will have much of a difference in practice, but I do think it is a good correctness fix. Marking the lower 1M as scratch is a hack to get around the limitations with KHO, and we should not be doing that when KHO isn't involved. -- Regards, Pratyush Yadav
© 2016 - 2025 Red Hat, Inc.