[PATCH v3 2/2] efi: Support booting with kexec handover (KHO)

Evangelos Petrongonas posted 2 patches 1 month, 1 week ago
[PATCH v3 2/2] efi: Support booting with kexec handover (KHO)
Posted by Evangelos Petrongonas 1 month, 1 week ago
When KHO (Kexec HandOver) is enabled, it sets up scratch memory regions
early during device tree scanning. After kexec, the new kernel
exclusively uses this region for memory allocations during boot up to
the initialization of the page allocator

However, when booting with EFI, EFI's reserve_regions() uses
memblock_remove(0, PHYS_ADDR_MAX) to clear all memory regions before
rebuilding them from EFI data. This destroys KHO scratch regions and
their flags, thus causing a kernel panic, as there are no scratch
memory regions.

Instead of wholesale removal, iterate through memory regions and only
remove non-KHO ones. This preserves KHO scratch regions, which are
good known memory, while still allowing EFI to rebuild its memory map.

Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Evangelos Petrongonas <epetron@amazon.de>
---
Changes in v3:
	- Improve the code comments, by stating that the scratch regions are
	good known memory

Changes in v2:
	- Replace the for loop with for_each_mem_region
	- Fix comment indentation
	- Amend commit message to specify that scratch regions
	are known good regions

 drivers/firmware/efi/efi-init.c | 29 +++++++++++++++++++++++++----
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/firmware/efi/efi-init.c b/drivers/firmware/efi/efi-init.c
index a00e07b853f2..a65c2d5b9e7b 100644
--- a/drivers/firmware/efi/efi-init.c
+++ b/drivers/firmware/efi/efi-init.c
@@ -12,6 +12,7 @@
 #include <linux/efi.h>
 #include <linux/fwnode.h>
 #include <linux/init.h>
+#include <linux/kexec_handover.h>
 #include <linux/memblock.h>
 #include <linux/mm_types.h>
 #include <linux/of.h>
@@ -164,12 +165,32 @@ static __init void reserve_regions(void)
 		pr_info("Processing EFI memory map:\n");
 
 	/*
-	 * Discard memblocks discovered so far: if there are any at this
-	 * point, they originate from memory nodes in the DT, and UEFI
-	 * uses its own memory map instead.
+	 * Discard memblocks discovered so far except for KHO scratch
+	 * regions. Most memblocks at this point originate from memory nodes
+	 * in the DT and UEFI uses its own memory map instead. However, if
+	 * KHO is enabled, scratch regions, which are good known memory
+	 * must be preserved.
 	 */
 	memblock_dump_all();
-	memblock_remove(0, PHYS_ADDR_MAX);
+
+	if (is_kho_boot()) {
+		struct memblock_region *r;
+
+		/* Remove all non-KHO regions */
+		for_each_mem_region(r) {
+			if (!memblock_is_kho_scratch(r)) {
+				memblock_remove(r->base, r->size);
+				r--;
+			}
+		}
+	} else {
+		/*
+		 * KHO is disabled. Discard memblocks discovered so far:
+		 * if there are any at this point, they originate from memory
+		 * nodes in the DT, and UEFI uses its own memory map instead.
+		 */
+		memblock_remove(0, PHYS_ADDR_MAX);
+	}
 
 	for_each_efi_memory_desc(md) {
 		paddr = md->phys_addr;
-- 
2.47.3




Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
Re: [PATCH v3 2/2] efi: Support booting with kexec handover (KHO)
Posted by Pratyush Yadav 3 weeks, 3 days ago
On Thu, Aug 21 2025, Evangelos Petrongonas wrote:

> When KHO (Kexec HandOver) is enabled, it sets up scratch memory regions
> early during device tree scanning. After kexec, the new kernel
> exclusively uses this region for memory allocations during boot up to
> the initialization of the page allocator
>
> However, when booting with EFI, EFI's reserve_regions() uses
> memblock_remove(0, PHYS_ADDR_MAX) to clear all memory regions before
> rebuilding them from EFI data. This destroys KHO scratch regions and
> their flags, thus causing a kernel panic, as there are no scratch
> memory regions.
>
> Instead of wholesale removal, iterate through memory regions and only
> remove non-KHO ones. This preserves KHO scratch regions, which are
> good known memory, while still allowing EFI to rebuild its memory map.
>
> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Signed-off-by: Evangelos Petrongonas <epetron@amazon.de>
> ---
> Changes in v3:
> 	- Improve the code comments, by stating that the scratch regions are
> 	good known memory
>
> Changes in v2:
> 	- Replace the for loop with for_each_mem_region
> 	- Fix comment indentation
> 	- Amend commit message to specify that scratch regions
> 	are known good regions
>
>  drivers/firmware/efi/efi-init.c | 29 +++++++++++++++++++++++++----
>  1 file changed, 25 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/firmware/efi/efi-init.c b/drivers/firmware/efi/efi-init.c
> index a00e07b853f2..a65c2d5b9e7b 100644
> --- a/drivers/firmware/efi/efi-init.c
> +++ b/drivers/firmware/efi/efi-init.c
> @@ -12,6 +12,7 @@
>  #include <linux/efi.h>
>  #include <linux/fwnode.h>
>  #include <linux/init.h>
> +#include <linux/kexec_handover.h>
>  #include <linux/memblock.h>
>  #include <linux/mm_types.h>
>  #include <linux/of.h>
> @@ -164,12 +165,32 @@ static __init void reserve_regions(void)
>  		pr_info("Processing EFI memory map:\n");
>  
>  	/*
> -	 * Discard memblocks discovered so far: if there are any at this
> -	 * point, they originate from memory nodes in the DT, and UEFI
> -	 * uses its own memory map instead.
> +	 * Discard memblocks discovered so far except for KHO scratch
> +	 * regions. Most memblocks at this point originate from memory nodes
> +	 * in the DT and UEFI uses its own memory map instead. However, if
> +	 * KHO is enabled, scratch regions, which are good known memory
> +	 * must be preserved.
>  	 */
>  	memblock_dump_all();
> -	memblock_remove(0, PHYS_ADDR_MAX);
> +
> +	if (is_kho_boot()) {
> +		struct memblock_region *r;
> +
> +		/* Remove all non-KHO regions */
> +		for_each_mem_region(r) {
> +			if (!memblock_is_kho_scratch(r)) {
> +				memblock_remove(r->base, r->size);
> +				r--;

Hmm, this caught me off-guard. I had to do a double take to realize that
memblock_remove() would decrease memblock.memory.cnt and move the whole
regions array back. A comment would have been nice here.

But then, I wouldn't want you to do a full resend of the series for this
minor nitpick. So perhaps whoever is taking this patch can add one when
applying? Either way is fine though...

Acked-by: Pratyush Yadav <pratyush@kernel.org>

> +			}
> +		}
> +	} else {
> +		/*
> +		 * KHO is disabled. Discard memblocks discovered so far:
> +		 * if there are any at this point, they originate from memory
> +		 * nodes in the DT, and UEFI uses its own memory map instead.
> +		 */
> +		memblock_remove(0, PHYS_ADDR_MAX);
> +	}
>  
>  	for_each_efi_memory_desc(md) {
>  		paddr = md->phys_addr;

-- 
Regards,
Pratyush Yadav
Re: [PATCH v3 2/2] efi: Support booting with kexec handover (KHO)
Posted by Ard Biesheuvel 1 month, 1 week ago
(cc Ilias)

Note to akpm: please drop this series for now.

On Fri, 22 Aug 2025 at 04:00, Evangelos Petrongonas <epetron@amazon.de> wrote:
>
> When KHO (Kexec HandOver) is enabled, it sets up scratch memory regions
> early during device tree scanning. After kexec, the new kernel
> exclusively uses this region for memory allocations during boot up to
> the initialization of the page allocator
>
> However, when booting with EFI, EFI's reserve_regions() uses
> memblock_remove(0, PHYS_ADDR_MAX) to clear all memory regions before
> rebuilding them from EFI data. This destroys KHO scratch regions and
> their flags, thus causing a kernel panic, as there are no scratch
> memory regions.
>
> Instead of wholesale removal, iterate through memory regions and only
> remove non-KHO ones. This preserves KHO scratch regions, which are
> good known memory, while still allowing EFI to rebuild its memory map.
>
> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Signed-off-by: Evangelos Petrongonas <epetron@amazon.de>
> ---
> Changes in v3:
>         - Improve the code comments, by stating that the scratch regions are
>         good known memory
>
> Changes in v2:
>         - Replace the for loop with for_each_mem_region
>         - Fix comment indentation
>         - Amend commit message to specify that scratch regions
>         are known good regions
>
>  drivers/firmware/efi/efi-init.c | 29 +++++++++++++++++++++++++----
>  1 file changed, 25 insertions(+), 4 deletions(-)
>

I'd rather drop the memblock_remove() entirely if possible. Could we
get some insight into whether memblocks are generally already
populated at this point during the boot?


> diff --git a/drivers/firmware/efi/efi-init.c b/drivers/firmware/efi/efi-init.c
> index a00e07b853f2..a65c2d5b9e7b 100644
> --- a/drivers/firmware/efi/efi-init.c
> +++ b/drivers/firmware/efi/efi-init.c
> @@ -12,6 +12,7 @@
>  #include <linux/efi.h>
>  #include <linux/fwnode.h>
>  #include <linux/init.h>
> +#include <linux/kexec_handover.h>
>  #include <linux/memblock.h>
>  #include <linux/mm_types.h>
>  #include <linux/of.h>
> @@ -164,12 +165,32 @@ static __init void reserve_regions(void)
>                 pr_info("Processing EFI memory map:\n");
>
>         /*
> -        * Discard memblocks discovered so far: if there are any at this
> -        * point, they originate from memory nodes in the DT, and UEFI
> -        * uses its own memory map instead.
> +        * Discard memblocks discovered so far except for KHO scratch
> +        * regions. Most memblocks at this point originate from memory nodes
> +        * in the DT and UEFI uses its own memory map instead. However, if
> +        * KHO is enabled, scratch regions, which are good known memory
> +        * must be preserved.
>          */
>         memblock_dump_all();
> -       memblock_remove(0, PHYS_ADDR_MAX);
> +
> +       if (is_kho_boot()) {
> +               struct memblock_region *r;
> +
> +               /* Remove all non-KHO regions */
> +               for_each_mem_region(r) {
> +                       if (!memblock_is_kho_scratch(r)) {
> +                               memblock_remove(r->base, r->size);
> +                               r--;
> +                       }
> +               }
> +       } else {
> +               /*
> +                * KHO is disabled. Discard memblocks discovered so far:
> +                * if there are any at this point, they originate from memory
> +                * nodes in the DT, and UEFI uses its own memory map instead.
> +                */
> +               memblock_remove(0, PHYS_ADDR_MAX);
> +       }
>
>         for_each_efi_memory_desc(md) {
>                 paddr = md->phys_addr;
> --
> 2.47.3
>
>
>
>
> Amazon Web Services Development Center Germany GmbH
> Tamara-Danz-Str. 13
> 10243 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
> Sitz: Berlin
> Ust-ID: DE 365 538 597
>
Re: [PATCH v3 2/2] efi: Support booting with kexec handover (KHO)
Posted by Ard Biesheuvel 4 weeks, 1 day ago
On Sat, 23 Aug 2025 at 23:47, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> (cc Ilias)
>
> Note to akpm: please drop this series for now.
>
> On Fri, 22 Aug 2025 at 04:00, Evangelos Petrongonas <epetron@amazon.de> wrote:
> >
> > When KHO (Kexec HandOver) is enabled, it sets up scratch memory regions
> > early during device tree scanning. After kexec, the new kernel
> > exclusively uses this region for memory allocations during boot up to
> > the initialization of the page allocator
> >
> > However, when booting with EFI, EFI's reserve_regions() uses
> > memblock_remove(0, PHYS_ADDR_MAX) to clear all memory regions before
> > rebuilding them from EFI data. This destroys KHO scratch regions and
> > their flags, thus causing a kernel panic, as there are no scratch
> > memory regions.
> >
> > Instead of wholesale removal, iterate through memory regions and only
> > remove non-KHO ones. This preserves KHO scratch regions, which are
> > good known memory, while still allowing EFI to rebuild its memory map.
> >
> > Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> > Signed-off-by: Evangelos Petrongonas <epetron@amazon.de>
> > ---
> > Changes in v3:
> >         - Improve the code comments, by stating that the scratch regions are
> >         good known memory
> >
> > Changes in v2:
> >         - Replace the for loop with for_each_mem_region
> >         - Fix comment indentation
> >         - Amend commit message to specify that scratch regions
> >         are known good regions
> >
> >  drivers/firmware/efi/efi-init.c | 29 +++++++++++++++++++++++++----
> >  1 file changed, 25 insertions(+), 4 deletions(-)
> >
>
> I'd rather drop the memblock_remove() entirely if possible. Could we
> get some insight into whether memblocks are generally already
> populated at this point during the boot?
>
>

Ping?
Re: Re: [PATCH v3 2/2] efi: Support booting with kexec handover (KHO)
Posted by Evangelos Petrongonas 4 weeks, 1 day ago
On Thu, 4 Sep 2025 09:19:21 +0200, Ard Biesheuvel <ardb@kernel.org> wrote:
> On Sat, 23 Aug 2025 at 23:47, Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > (cc Ilias)
> >
> > Note to akpm: please drop this series for now.
> >
> > On Fri, 22 Aug 2025 at 04:00, Evangelos Petrongonas <epetron@amazon.de> wrote:
> > >
> > > When KHO (Kexec HandOver) is enabled, it sets up scratch memory regions
> > > early during device tree scanning. After kexec, the new kernel
> > > exclusively uses this region for memory allocations during boot up to
> > > the initialization of the page allocator
> > >
> > > However, when booting with EFI, EFI's reserve_regions() uses
> > > memblock_remove(0, PHYS_ADDR_MAX) to clear all memory regions before
> > > rebuilding them from EFI data. This destroys KHO scratch regions and
> > > their flags, thus causing a kernel panic, as there are no scratch
> > > memory regions.
> > >
> > > Instead of wholesale removal, iterate through memory regions and only
> > > remove non-KHO ones. This preserves KHO scratch regions, which are
> > > good known memory, while still allowing EFI to rebuild its memory map.
> > >
> > > Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> > > Signed-off-by: Evangelos Petrongonas <epetron@amazon.de>
> > > ---
> > > Changes in v3:
> > >         - Improve the code comments, by stating that the scratch regions are
> > >         good known memory
> > >
> > > Changes in v2:
> > >         - Replace the for loop with for_each_mem_region
> > >         - Fix comment indentation
> > >         - Amend commit message to specify that scratch regions
> > >         are known good regions
> > >
> > >  drivers/firmware/efi/efi-init.c | 29 +++++++++++++++++++++++++----
> > >  1 file changed, 25 insertions(+), 4 deletions(-)
> > >
> >
> > I'd rather drop the memblock_remove() entirely if possible. Could we
> > get some insight into whether memblocks are generally already
> > populated at this point during the boot?
> >
> >
> 
> Ping?

Hey Ard I was AFK travelling. I am back now and will get to it.
PS: Keen to meet you later today in the KVM Forum.

Kind Regards,
Evangelos




Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
Re: Re: [PATCH v3 2/2] efi: Support booting with kexec handover (KHO)
Posted by Ard Biesheuvel 4 weeks, 1 day ago
On Thu, 4 Sept 2025 at 11:36, Evangelos Petrongonas <epetron@amazon.de> wrote:
>
> On Thu, 4 Sep 2025 09:19:21 +0200, Ard Biesheuvel <ardb@kernel.org> wrote:
> > On Sat, 23 Aug 2025 at 23:47, Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > (cc Ilias)
> > >
> > > Note to akpm: please drop this series for now.
> > >
> > > On Fri, 22 Aug 2025 at 04:00, Evangelos Petrongonas <epetron@amazon.de> wrote:
> > > >
> > > > When KHO (Kexec HandOver) is enabled, it sets up scratch memory regions
> > > > early during device tree scanning. After kexec, the new kernel
> > > > exclusively uses this region for memory allocations during boot up to
> > > > the initialization of the page allocator
> > > >
> > > > However, when booting with EFI, EFI's reserve_regions() uses
> > > > memblock_remove(0, PHYS_ADDR_MAX) to clear all memory regions before
> > > > rebuilding them from EFI data. This destroys KHO scratch regions and
> > > > their flags, thus causing a kernel panic, as there are no scratch
> > > > memory regions.
> > > >
> > > > Instead of wholesale removal, iterate through memory regions and only
> > > > remove non-KHO ones. This preserves KHO scratch regions, which are
> > > > good known memory, while still allowing EFI to rebuild its memory map.
> > > >
> > > > Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> > > > Signed-off-by: Evangelos Petrongonas <epetron@amazon.de>
> > > > ---
> > > > Changes in v3:
> > > >         - Improve the code comments, by stating that the scratch regions are
> > > >         good known memory
> > > >
> > > > Changes in v2:
> > > >         - Replace the for loop with for_each_mem_region
> > > >         - Fix comment indentation
> > > >         - Amend commit message to specify that scratch regions
> > > >         are known good regions
> > > >
> > > >  drivers/firmware/efi/efi-init.c | 29 +++++++++++++++++++++++++----
> > > >  1 file changed, 25 insertions(+), 4 deletions(-)
> > > >
> > >
> > > I'd rather drop the memblock_remove() entirely if possible. Could we
> > > get some insight into whether memblocks are generally already
> > > populated at this point during the boot?
> > >
> > >
> >
> > Ping?
>
> Hey Ard I was AFK travelling. I am back now and will get to it.
> PS: Keen to meet you later today in the KVM Forum.
>

Yes, let's catch up!
Re: Re: Re: [PATCH v3 2/2] efi: Support booting with kexec handover (KHO)
Posted by Evangelos Petrongonas 4 weeks, 1 day ago
On Thu, 4 Sep 2025 11:39:02 +0200, Ard Biesheuvel <ardb@kernel.org> wrote:
> On Thu, 4 Sept 2025 at 11:36, Evangelos Petrongonas <epetron@amazon.de> wrote:
> >
> > On Thu, 4 Sep 2025 09:19:21 +0200, Ard Biesheuvel <ardb@kernel.org> wrote:
> > > On Sat, 23 Aug 2025 at 23:47, Ard Biesheuvel <ardb@kernel.org> wrote:
> > > >
> > > > (cc Ilias)
> > > >
> > > > Note to akpm: please drop this series for now.
> > > >
> > > > On Fri, 22 Aug 2025 at 04:00, Evangelos Petrongonas <epetron@amazon.de> wrote:
> > > > >
> > > > > When KHO (Kexec HandOver) is enabled, it sets up scratch memory regions
> > > > > early during device tree scanning. After kexec, the new kernel
> > > > > exclusively uses this region for memory allocations during boot up to
> > > > > the initialization of the page allocator
> > > > >
> > > > > However, when booting with EFI, EFI's reserve_regions() uses
> > > > > memblock_remove(0, PHYS_ADDR_MAX) to clear all memory regions before
> > > > > rebuilding them from EFI data. This destroys KHO scratch regions and
> > > > > their flags, thus causing a kernel panic, as there are no scratch
> > > > > memory regions.
> > > > >
> > > > > Instead of wholesale removal, iterate through memory regions and only
> > > > > remove non-KHO ones. This preserves KHO scratch regions, which are
> > > > > good known memory, while still allowing EFI to rebuild its memory map.
> > > > >
> > > > > Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> > > > > Signed-off-by: Evangelos Petrongonas <epetron@amazon.de>
> > > > > ---
> > > > > Changes in v3:
> > > > >         - Improve the code comments, by stating that the scratch regions are
> > > > >         good known memory
> > > > >
> > > > > Changes in v2:
> > > > >         - Replace the for loop with for_each_mem_region
> > > > >         - Fix comment indentation
> > > > >         - Amend commit message to specify that scratch regions
> > > > >         are known good regions
> > > > >
> > > > >  drivers/firmware/efi/efi-init.c | 29 +++++++++++++++++++++++++----
> > > > >  1 file changed, 25 insertions(+), 4 deletions(-)
> > > > >
> > > >
> > > > I'd rather drop the memblock_remove() entirely if possible. Could we
> > > > get some insight into whether memblocks are generally already
> > > > populated at this point during the boot?
> > > >
> > > >
> > >
> > > Ping?
> >
> > Hey Ard I was AFK travelling. I am back now and will get to it.
> > PS: Keen to meet you later today in the KVM Forum.
> >
> 
> Yes, let's catch up!
> 
> 

I did some testing on qemu with memblock and EFI debug enabled

(`memblock=debug efi=debug`) and no KHO.
We see that `memblock_dump_all()` in `reserve_regions()` outputs:
```
[    0.000000] MEMBLOCK configuration:
[    0.000000]  memory size = 0x0000000200000000 reserved size = 0x000000000db5383e
[    0.000000]  memory.cnt  = 0x7
[    0.000000]  memory[0x0]	[0x0000000040000000-0x000000023c76ffff], 0x00000001fc770000 bytes on node 0 flags: 0x0
...
[    0.000000]  reserved.cnt  = 0xf
[    0.000000]  reserved[0x0]	[0x00000000fe000000-0x00000000ffffffff], 0x0000000002000000 bytes flags: 0x20
```

Moreover checking the code, the boot flow  (at least on arm64)
populates memblocks from DT memory nodes via
`early_init_dt_add_memory_arch()` before `efi_init()` is called

`setup_arch()` -> `setup_machine_fdt()` -> `early_init_dt_scan()` ->
`early_init_dt_scan_memory()` -> `early_init_dt_add_memory_arch()` ->
`memblock_add()`

As a result, it seems that memblocks ARE populated when calling the
`reserve_regions()`. So looks like  we still need the
`memblock_remove()` (?)





Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
Re: Re: Re: [PATCH v3 2/2] efi: Support booting with kexec handover (KHO)
Posted by Ard Biesheuvel 3 weeks, 4 days ago
On Thu, 4 Sept 2025 at 14:59, Evangelos Petrongonas <epetron@amazon.de> wrote:
>
> On Thu, 4 Sep 2025 11:39:02 +0200, Ard Biesheuvel <ardb@kernel.org> wrote:
> > On Thu, 4 Sept 2025 at 11:36, Evangelos Petrongonas <epetron@amazon.de> wrote:
> > >
> > > On Thu, 4 Sep 2025 09:19:21 +0200, Ard Biesheuvel <ardb@kernel.org> wrote:
> > > > On Sat, 23 Aug 2025 at 23:47, Ard Biesheuvel <ardb@kernel.org> wrote:
> > > > >
> > > > > (cc Ilias)
> > > > >
> > > > > Note to akpm: please drop this series for now.
> > > > >
> > > > > On Fri, 22 Aug 2025 at 04:00, Evangelos Petrongonas <epetron@amazon.de> wrote:
> > > > > >
> > > > > > When KHO (Kexec HandOver) is enabled, it sets up scratch memory regions
> > > > > > early during device tree scanning. After kexec, the new kernel
> > > > > > exclusively uses this region for memory allocations during boot up to
> > > > > > the initialization of the page allocator
> > > > > >
> > > > > > However, when booting with EFI, EFI's reserve_regions() uses
> > > > > > memblock_remove(0, PHYS_ADDR_MAX) to clear all memory regions before
> > > > > > rebuilding them from EFI data. This destroys KHO scratch regions and
> > > > > > their flags, thus causing a kernel panic, as there are no scratch
> > > > > > memory regions.
> > > > > >
> > > > > > Instead of wholesale removal, iterate through memory regions and only
> > > > > > remove non-KHO ones. This preserves KHO scratch regions, which are
> > > > > > good known memory, while still allowing EFI to rebuild its memory map.
> > > > > >
> > > > > > Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> > > > > > Signed-off-by: Evangelos Petrongonas <epetron@amazon.de>
> > > > > > ---
> > > > > > Changes in v3:
> > > > > >         - Improve the code comments, by stating that the scratch regions are
> > > > > >         good known memory
> > > > > >
> > > > > > Changes in v2:
> > > > > >         - Replace the for loop with for_each_mem_region
> > > > > >         - Fix comment indentation
> > > > > >         - Amend commit message to specify that scratch regions
> > > > > >         are known good regions
> > > > > >
> > > > > >  drivers/firmware/efi/efi-init.c | 29 +++++++++++++++++++++++++----
> > > > > >  1 file changed, 25 insertions(+), 4 deletions(-)
> > > > > >
> > > > >
> > > > > I'd rather drop the memblock_remove() entirely if possible. Could we
> > > > > get some insight into whether memblocks are generally already
> > > > > populated at this point during the boot?
> > > > >
> > > > >
> > > >
> > > > Ping?
> > >
> > > Hey Ard I was AFK travelling. I am back now and will get to it.
> > > PS: Keen to meet you later today in the KVM Forum.
> > >
> >
> > Yes, let's catch up!
> >
> >
>
> I did some testing on qemu with memblock and EFI debug enabled
>
> (`memblock=debug efi=debug`) and no KHO.
> We see that `memblock_dump_all()` in `reserve_regions()` outputs:
> ```
> [    0.000000] MEMBLOCK configuration:
> [    0.000000]  memory size = 0x0000000200000000 reserved size = 0x000000000db5383e
> [    0.000000]  memory.cnt  = 0x7
> [    0.000000]  memory[0x0]     [0x0000000040000000-0x000000023c76ffff], 0x00000001fc770000 bytes on node 0 flags: 0x0
> ...
> [    0.000000]  reserved.cnt  = 0xf
> [    0.000000]  reserved[0x0]   [0x00000000fe000000-0x00000000ffffffff], 0x0000000002000000 bytes flags: 0x20
> ```
>
> Moreover checking the code, the boot flow  (at least on arm64)
> populates memblocks from DT memory nodes via
> `early_init_dt_add_memory_arch()` before `efi_init()` is called
>
> `setup_arch()` -> `setup_machine_fdt()` -> `early_init_dt_scan()` ->
> `early_init_dt_scan_memory()` -> `early_init_dt_add_memory_arch()` ->
> `memblock_add()`
>
> As a result, it seems that memblocks ARE populated when calling the
> `reserve_regions()`. So looks like  we still need the
> `memblock_remove()` (?)
>

Indeed.

For the series,

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>