[PATCH v4] kho: kexec-metadata: track previous kernel chain

Breno Leitao posted 1 patch 2 weeks, 2 days ago
include/linux/kho/abi/kexec_handover.h | 29 +++++++++++++++
kernel/liveupdate/kexec_handover.c     | 65 ++++++++++++++++++++++++++++++++++
2 files changed, 94 insertions(+)
[PATCH v4] kho: kexec-metadata: track previous kernel chain
Posted by Breno Leitao 2 weeks, 2 days ago
Use Kexec Handover (KHO) to pass the previous kernel's version string
and the number of kexec reboots since the last cold boot to the next
kernel, and print it at boot time.

Example output:
    [    0.000000] KHO: exec from: 6.19.0-rc4-next-20260107 (count 1)

Motivation
==========

Bugs that only reproduce when kexecing from specific kernel versions
are difficult to diagnose. These issues occur when a buggy kernel
kexecs into a new kernel, with the bug manifesting only in the second
kernel.

Recent examples include the following commits:

 * eb2266312507 ("x86/boot: Fix page table access in 5-level to 4-level paging transition")
 * 77d48d39e991 ("efistub/tpm: Use ACPI reclaim memory for event log to avoid corruption")
 * 64b45dd46e15 ("x86/efi: skip memattr table on kexec boot")

As kexec-based reboots become more common, these version-dependent bugs
are appearing more frequently. At scale, correlating crashes to the
previous kernel version is challenging, especially when issues only
occur in specific transition scenarios.

Implementation
==============

The kexec metadata is stored as a plain C struct (struct kho_kexec_metadata)
rather than FDT format, for simplicity and direct field access. It is
registered via kho_add_subtree() as a separate subtree, keeping it
independent from the core KHO ABI. This design choice:

 - Keeps the core KHO ABI minimal and stable
 - Allows the metadata format to evolve independently
 - Avoids requiring version bumps for all KHO consumers (LUO, etc.)
   when the metadata format changes

The struct kho_metadata contains two fields:
 - previous_release: The kernel version that initiated the kexec
 - kexec_count: Number of kexec boots since last cold boot

On cold boot, kexec_count starts at 0 and increments with each kexec.
The count helps identify issues that only manifest after multiple
consecutive kexec reboots.

Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: SeongJae Park <sj@kernel.org>
---
Changes in v4:
- Squashed everything in a single commit
- Moved from FDT to C structs (Pratyush)
- Usage of subtress intead of FDT directly (Pratyush)
- Renamed a bunch of variables and functions.
- Link to v3: https://patch.msgid.link/20260108-kho-v3-0-b1d6b7a89342@debian.org

Changes in v3:
- Remove the extra CONFIG for this feature.
- Reworded some identifiers, properties and printks.
- Better documented the questions raised during v2.
- Link to v2: https://patch.msgid.link/20260102-kho-v2-0-1747b1a3a1d6@debian.org

Changes from v2 to v1 (RFC)
- Track the number of kexecs since cold boot (Pasha)
- Change the printk() order compared to KHO
- Rewording of the commit summary
- Link to RFC: https://patch.msgid.link/20251230-kho-v1-1-4d795a24da9e@debian.org
---
 include/linux/kho/abi/kexec_handover.h | 29 +++++++++++++++
 kernel/liveupdate/kexec_handover.c     | 65 ++++++++++++++++++++++++++++++++++
 2 files changed, 94 insertions(+)

diff --git a/include/linux/kho/abi/kexec_handover.h b/include/linux/kho/abi/kexec_handover.h
index 285eda8a36e45..e18022a4e664d 100644
--- a/include/linux/kho/abi/kexec_handover.h
+++ b/include/linux/kho/abi/kexec_handover.h
@@ -11,6 +11,7 @@
 #define _LINUX_KHO_ABI_KEXEC_HANDOVER_H
 
 #include <linux/types.h>
+#include <linux/utsname.h>
 
 /**
  * DOC: Kexec Handover ABI
@@ -84,6 +85,34 @@
 /* The FDT property for sub-FDTs. */
 #define KHO_FDT_SUB_TREE_PROP_NAME "fdt"
 
+/**
+ * DOC: Kexec Metadata ABI
+ *
+ * The "kexec-metadata" subtree stores optional metadata about the kexec chain.
+ * It is registered via kho_add_subtree(), keeping it independent from the core
+ * KHO ABI. This allows the metadata format to evolve without affecting other
+ * KHO consumers.
+ *
+ * The metadata is stored as a plain C struct rather than FDT format for
+ * simplicity and direct field access.
+ */
+
+/**
+ * struct kho_kexec_metadata - Kexec metadata passed between kernels
+ * @previous_release: Kernel version string that initiated the kexec
+ * @kexec_count: Number of kexec boots since last cold boot
+ *
+ * This structure is preserved across kexec and allows the new kernel to
+ * identify which kernel it was booted from and how many kexec reboots
+ * have occurred.
+ */
+struct kho_kexec_metadata {
+	char previous_release[__NEW_UTS_LEN + 1];
+	u32 kexec_count;
+} __packed;
+
+#define KHO_METADATA_NODE_NAME "kexec-metadata"
+
 /**
  * DOC: Kexec Handover ABI for vmalloc Preservation
  *
diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
index 3cf2dc6840c92..3130444e183b3 100644
--- a/kernel/liveupdate/kexec_handover.c
+++ b/kernel/liveupdate/kexec_handover.c
@@ -15,6 +15,7 @@
 #include <linux/count_zeros.h>
 #include <linux/kexec.h>
 #include <linux/kexec_handover.h>
+#include <linux/utsname.h>
 #include <linux/kho/abi/kexec_handover.h>
 #include <linux/libfdt.h>
 #include <linux/list.h>
@@ -1246,6 +1247,8 @@ struct kho_in {
 	phys_addr_t fdt_phys;
 	phys_addr_t scratch_phys;
 	phys_addr_t mem_map_phys;
+	char previous_release[__NEW_UTS_LEN + 1];
+	u32 kexec_count;
 	struct kho_debugfs dbg;
 };
 
@@ -1331,6 +1334,59 @@ static __init int kho_out_fdt_setup(void)
 	return err;
 }
 
+static void __init kho_process_kexec_metadata(void)
+{
+	struct kho_kexec_metadata *metadata;
+	phys_addr_t metadata_phys;
+	int err;
+
+	err = kho_retrieve_subtree(KHO_METADATA_NODE_NAME, &metadata_phys);
+	if (err)
+		/* This is fine, previous kernel didn't export metadata */
+		return;
+
+	metadata = phys_to_virt(metadata_phys);
+
+	/*
+	 * Copy data to the kernel structure that will persist during
+	 * kernel lifetime.
+	 */
+	kho_in.kexec_count = metadata->kexec_count;
+	strscpy(kho_in.previous_release, metadata->previous_release,
+		sizeof(kho_in.previous_release));
+
+	pr_info("exec from: %s (count %u)\n", kho_in.previous_release,
+					      kho_in.kexec_count);
+}
+
+/*
+ * Create kexec metadata to pass kernel version and boot count to the
+ * next kernel. This keeps the core KHO ABI minimal and allows the
+ * metadata format to evolve independently.
+ */
+static __init int kho_populate_kexec_metadata(void)
+{
+	struct kho_kexec_metadata *metadata;
+	int err;
+
+	metadata = kho_alloc_preserve(sizeof(*metadata));
+	if (IS_ERR(metadata))
+		return PTR_ERR(metadata);
+
+	strscpy(metadata->previous_release, init_uts_ns.name.release,
+		sizeof(metadata->previous_release));
+	/* kho_in.kexec_count is set to 0 on cold boot */
+	metadata->kexec_count = kho_in.kexec_count + 1;
+
+	err = kho_add_subtree(KHO_METADATA_NODE_NAME, metadata);
+	if (err) {
+		kho_unpreserve_free(metadata);
+		return err;
+	}
+
+	return 0;
+}
+
 static __init int kho_init(void)
 {
 	const void *fdt = kho_get_fdt();
@@ -1357,6 +1413,15 @@ static __init int kho_init(void)
 	if (err)
 		goto err_free_fdt;
 
+	if (fdt)
+		kho_process_kexec_metadata();
+
+	/* Populate kexec metadata for the possible next kexec */
+	err = kho_populate_kexec_metadata();
+	if (err)
+		pr_warn("failed to initialize kexec-metadata subtree: %d\n",
+			err);
+
 	if (fdt) {
 		kho_in_debugfs_init(&kho_in.dbg, fdt);
 		return 0;

---
base-commit: 5eec2b2e1f37acff8b926d2494eadaeef59be279
change-id: 20251230-kho-7707e8a2ef1e

Best regards,
--  
Breno Leitao <leitao@debian.org>
Re: [PATCH v4] kho: kexec-metadata: track previous kernel chain
Posted by Pratyush Yadav 1 week, 4 days ago
Hi Breno,

On Wed, Jan 21 2026, Breno Leitao wrote:

> Use Kexec Handover (KHO) to pass the previous kernel's version string
> and the number of kexec reboots since the last cold boot to the next
> kernel, and print it at boot time.
>
> Example output:
>     [    0.000000] KHO: exec from: 6.19.0-rc4-next-20260107 (count 1)
>
> Motivation
> ==========
>
> Bugs that only reproduce when kexecing from specific kernel versions
> are difficult to diagnose. These issues occur when a buggy kernel
> kexecs into a new kernel, with the bug manifesting only in the second
> kernel.
>
> Recent examples include the following commits:
>
>  * eb2266312507 ("x86/boot: Fix page table access in 5-level to 4-level paging transition")
>  * 77d48d39e991 ("efistub/tpm: Use ACPI reclaim memory for event log to avoid corruption")
>  * 64b45dd46e15 ("x86/efi: skip memattr table on kexec boot")
>
> As kexec-based reboots become more common, these version-dependent bugs
> are appearing more frequently. At scale, correlating crashes to the
> previous kernel version is challenging, especially when issues only
> occur in specific transition scenarios.
>
> Implementation
> ==============
>
> The kexec metadata is stored as a plain C struct (struct kho_kexec_metadata)
> rather than FDT format, for simplicity and direct field access. It is
> registered via kho_add_subtree() as a separate subtree, keeping it
> independent from the core KHO ABI. This design choice:
>
>  - Keeps the core KHO ABI minimal and stable
>  - Allows the metadata format to evolve independently
>  - Avoids requiring version bumps for all KHO consumers (LUO, etc.)
>    when the metadata format changes
>
> The struct kho_metadata contains two fields:
>  - previous_release: The kernel version that initiated the kexec
>  - kexec_count: Number of kexec boots since last cold boot
>
> On cold boot, kexec_count starts at 0 and increments with each kexec.
> The count helps identify issues that only manifest after multiple
> consecutive kexec reboots.
>
> Signed-off-by: Breno Leitao <leitao@debian.org>
> Acked-by: SeongJae Park <sj@kernel.org>
[...]
> diff --git a/include/linux/kho/abi/kexec_handover.h b/include/linux/kho/abi/kexec_handover.h
> index 285eda8a36e45..e18022a4e664d 100644
> --- a/include/linux/kho/abi/kexec_handover.h
> +++ b/include/linux/kho/abi/kexec_handover.h
> @@ -11,6 +11,7 @@
>  #define _LINUX_KHO_ABI_KEXEC_HANDOVER_H
>  
>  #include <linux/types.h>
> +#include <linux/utsname.h>
>  
>  /**
>   * DOC: Kexec Handover ABI
> @@ -84,6 +85,34 @@
>  /* The FDT property for sub-FDTs. */
>  #define KHO_FDT_SUB_TREE_PROP_NAME "fdt"
>  
> +/**
> + * DOC: Kexec Metadata ABI
> + *
> + * The "kexec-metadata" subtree stores optional metadata about the kexec chain.
> + * It is registered via kho_add_subtree(), keeping it independent from the core
> + * KHO ABI. This allows the metadata format to evolve without affecting other
> + * KHO consumers.
> + *
> + * The metadata is stored as a plain C struct rather than FDT format for
> + * simplicity and direct field access.
> + */
> +
> +/**
> + * struct kho_kexec_metadata - Kexec metadata passed between kernels
> + * @previous_release: Kernel version string that initiated the kexec

Other than Mike's comments, I only have a small nitpick. Please add a
comment here that using __NEW_UTS_LEN in the ABI is safe here since it
is a part of UAPI.

LGTM otherwise.

> + * @kexec_count: Number of kexec boots since last cold boot
> + *
> + * This structure is preserved across kexec and allows the new kernel to
> + * identify which kernel it was booted from and how many kexec reboots
> + * have occurred.
> + */
> +struct kho_kexec_metadata {
> +	char previous_release[__NEW_UTS_LEN + 1];
> +	u32 kexec_count;
> +} __packed;
> +
[...]

-- 
Regards,
Pratyush Yadav
Re: [PATCH v4] kho: kexec-metadata: track previous kernel chain
Posted by Breno Leitao 1 week, 4 days ago
On Wed, Jan 21, 2026 at 06:50:38AM -0800, Breno Leitao wrote:
> +static __init int kho_populate_kexec_metadata(void)
> +{
> +	struct kho_kexec_metadata *metadata;
> +	int err;
> +
> +	metadata = kho_alloc_preserve(sizeof(*metadata));
> +	if (IS_ERR(metadata))
> +		return PTR_ERR(metadata);
> +
> +	strscpy(metadata->previous_release, init_uts_ns.name.release,
> +		sizeof(metadata->previous_release));
> +	/* kho_in.kexec_count is set to 0 on cold boot */
> +	metadata->kexec_count = kho_in.kexec_count + 1;
> +
> +	err = kho_add_subtree(KHO_METADATA_NODE_NAME, metadata);

There is a hidden bug in here when CONFIG_KEXEC_HANDOVER_DEBUGFS is set.

kho_add_subtree() expects a fdt as the second argument, and we are
passing a pure C struct. That works fine, except for debugfs, which
does:

 1. kho_add_subtree() calls kho_debugfs_fdt_add()
 2. kho_debugfs_fdt_add() calls __kho_debugfs_fdt_add()
 3. __kho_debugfs_fdt_add() executes fdt_totalsize(fdt)

The fdt_totalsize() macro reads bytes 4-7 of the input as a big-endian u32, and
this will hit struct kho_kexec_metadata, given I am passing a C struct instead
of a FDT.

  struct kho_kexec_metadata {
      char previous_release[__NEW_UTS_LEN + 1];  // 65 bytes
      u32 kexec_count;
  } __packed;

Bytes 4-7 would be characters from previous_release (e.g., "0-rc" from
"6.19.0-rc4..."). Interpreted as big-endian u32, this gives a garbage size
value.

The alternatives I see here are:

 1) Come back to FDT instead of plain C struct, similarly to the previous
    version [1]
 2) Created some helpers to treat C struct fields specially just for this
    feature, and we can do it later if we have more users.
 3) Move this kexec_metadata to work on top of LUO (similarly to memfd), but
    that would be an unnecessary dependency just to have this kexec_metadata.

That said, for the next version, I am coming back to to FDT.

Link: https://lore.kernel.org/all/20260108-kho-v3-0-b1d6b7a89342@debian.org/ [1]

--breno
Re: [PATCH v4] kho: kexec-metadata: track previous kernel chain
Posted by Pratyush Yadav 1 week, 4 days ago
Hi Breno,

On Mon, Jan 26 2026, Breno Leitao wrote:

> On Wed, Jan 21, 2026 at 06:50:38AM -0800, Breno Leitao wrote:
>> +static __init int kho_populate_kexec_metadata(void)
>> +{
>> +	struct kho_kexec_metadata *metadata;
>> +	int err;
>> +
>> +	metadata = kho_alloc_preserve(sizeof(*metadata));
>> +	if (IS_ERR(metadata))
>> +		return PTR_ERR(metadata);
>> +
>> +	strscpy(metadata->previous_release, init_uts_ns.name.release,
>> +		sizeof(metadata->previous_release));
>> +	/* kho_in.kexec_count is set to 0 on cold boot */
>> +	metadata->kexec_count = kho_in.kexec_count + 1;
>> +
>> +	err = kho_add_subtree(KHO_METADATA_NODE_NAME, metadata);
>
> There is a hidden bug in here when CONFIG_KEXEC_HANDOVER_DEBUGFS is set.

Good catch!

>
> kho_add_subtree() expects a fdt as the second argument, and we are
> passing a pure C struct. That works fine, except for debugfs, which
> does:
>
>  1. kho_add_subtree() calls kho_debugfs_fdt_add()
>  2. kho_debugfs_fdt_add() calls __kho_debugfs_fdt_add()
>  3. __kho_debugfs_fdt_add() executes fdt_totalsize(fdt)
>
> The fdt_totalsize() macro reads bytes 4-7 of the input as a big-endian u32, and
> this will hit struct kho_kexec_metadata, given I am passing a C struct instead
> of a FDT.
>
>   struct kho_kexec_metadata {
>       char previous_release[__NEW_UTS_LEN + 1];  // 65 bytes
>       u32 kexec_count;
>   } __packed;
>
> Bytes 4-7 would be characters from previous_release (e.g., "0-rc" from
> "6.19.0-rc4..."). Interpreted as big-endian u32, this gives a garbage size
> value.
>
> The alternatives I see here are:
>
>  1) Come back to FDT instead of plain C struct, similarly to the previous
>     version [1]
>  2) Created some helpers to treat C struct fields specially just for this
>     feature, and we can do it later if we have more users.
>  3) Move this kexec_metadata to work on top of LUO (similarly to memfd), but
>     that would be an unnecessary dependency just to have this kexec_metadata.
>
> That said, for the next version, I am coming back to to FDT.

Please, no. Don't go back to it just for the sake of this bug.

I think KHO's assumption that the subtree will always point to an FDT is
broken, and we should fix that. I think KHO should expose the blob of
serialized data and let userspace figure out what the format is and how
to decode it.

To do that, we would need to update kho_add_subtree() to take a size
parameter from callers, and pass that down to debugfs code. I count 3
callers of kho_add_subtree() - memblock, LUO, and test_kho. I think all
3 should be fairly easy to update, but I am happy to help out if you
need.

>
> Link: https://lore.kernel.org/all/20260108-kho-v3-0-b1d6b7a89342@debian.org/ [1]
>
> --breno

-- 
Regards,
Pratyush Yadav
Re: [PATCH v4] kho: kexec-metadata: track previous kernel chain
Posted by Breno Leitao 1 week, 4 days ago
On Mon, Jan 26, 2026 at 02:28:30PM +0100, Pratyush Yadav wrote:
> > On Wed, Jan 21, 2026 at 06:50:38AM -0800, Breno Leitao wrote:
> >> +static __init int kho_populate_kexec_metadata(void)
> >> +{
> >> +	struct kho_kexec_metadata *metadata;
> >> +	int err;
> >> +
> >> +	metadata = kho_alloc_preserve(sizeof(*metadata));
> >> +	if (IS_ERR(metadata))
> >> +		return PTR_ERR(metadata);
> >> +
> >> +	strscpy(metadata->previous_release, init_uts_ns.name.release,
> >> +		sizeof(metadata->previous_release));
> >> +	/* kho_in.kexec_count is set to 0 on cold boot */
> >> +	metadata->kexec_count = kho_in.kexec_count + 1;
> >> +
> >> +	err = kho_add_subtree(KHO_METADATA_NODE_NAME, metadata);
> >
> > There is a hidden bug in here when CONFIG_KEXEC_HANDOVER_DEBUGFS is set.
> 
> Good catch!
> 
> >
> > kho_add_subtree() expects a fdt as the second argument, and we are
> > passing a pure C struct. That works fine, except for debugfs, which
> > does:
> >
> >  1. kho_add_subtree() calls kho_debugfs_fdt_add()
> >  2. kho_debugfs_fdt_add() calls __kho_debugfs_fdt_add()
> >  3. __kho_debugfs_fdt_add() executes fdt_totalsize(fdt)
> >
> > The fdt_totalsize() macro reads bytes 4-7 of the input as a big-endian u32, and
> > this will hit struct kho_kexec_metadata, given I am passing a C struct instead
> > of a FDT.
> >
> >   struct kho_kexec_metadata {
> >       char previous_release[__NEW_UTS_LEN + 1];  // 65 bytes
> >       u32 kexec_count;
> >   } __packed;
> >
> > Bytes 4-7 would be characters from previous_release (e.g., "0-rc" from
> > "6.19.0-rc4..."). Interpreted as big-endian u32, this gives a garbage size
> > value.
> >
> > The alternatives I see here are:
> >
> >  1) Come back to FDT instead of plain C struct, similarly to the previous
> >     version [1]
> >  2) Created some helpers to treat C struct fields specially just for this
> >     feature, and we can do it later if we have more users.
> >  3) Move this kexec_metadata to work on top of LUO (similarly to memfd), but
> >     that would be an unnecessary dependency just to have this kexec_metadata.
> >
> > That said, for the next version, I am coming back to to FDT.
> 
> Please, no. Don't go back to it just for the sake of this bug.
> 
> I think KHO's assumption that the subtree will always point to an FDT is
> broken, and we should fix that. I think KHO should expose the blob of
> serialized data and let userspace figure out what the format is and how
> to decode it.
> 
> To do that, we would need to update kho_add_subtree() to take a size
> parameter from callers, and pass that down to debugfs code. I count 3
> callers of kho_add_subtree() - memblock, LUO, and test_kho. I think all
> 3 should be fairly easy to update, but I am happy to help out if you
> need.

Sure, let me hack and see what I can get here.

Thanks for the direction,
--breno
Re: [PATCH v4] kho: kexec-metadata: track previous kernel chain
Posted by Pratyush Yadav 1 week, 4 days ago
On Mon, Jan 26 2026, Pratyush Yadav wrote:

> Hi Breno,
>
> On Mon, Jan 26 2026, Breno Leitao wrote:
>
>> On Wed, Jan 21, 2026 at 06:50:38AM -0800, Breno Leitao wrote:
>>> +static __init int kho_populate_kexec_metadata(void)
>>> +{
>>> +	struct kho_kexec_metadata *metadata;
>>> +	int err;
>>> +
>>> +	metadata = kho_alloc_preserve(sizeof(*metadata));
>>> +	if (IS_ERR(metadata))
>>> +		return PTR_ERR(metadata);
>>> +
>>> +	strscpy(metadata->previous_release, init_uts_ns.name.release,
>>> +		sizeof(metadata->previous_release));
>>> +	/* kho_in.kexec_count is set to 0 on cold boot */
>>> +	metadata->kexec_count = kho_in.kexec_count + 1;
>>> +
>>> +	err = kho_add_subtree(KHO_METADATA_NODE_NAME, metadata);
>>
>> There is a hidden bug in here when CONFIG_KEXEC_HANDOVER_DEBUGFS is set.
>
> Good catch!
>
>>
>> kho_add_subtree() expects a fdt as the second argument, and we are
>> passing a pure C struct. That works fine, except for debugfs, which
>> does:
>>
>>  1. kho_add_subtree() calls kho_debugfs_fdt_add()
>>  2. kho_debugfs_fdt_add() calls __kho_debugfs_fdt_add()
>>  3. __kho_debugfs_fdt_add() executes fdt_totalsize(fdt)
>>
>> The fdt_totalsize() macro reads bytes 4-7 of the input as a big-endian u32, and
>> this will hit struct kho_kexec_metadata, given I am passing a C struct instead
>> of a FDT.
>>
>>   struct kho_kexec_metadata {
>>       char previous_release[__NEW_UTS_LEN + 1];  // 65 bytes
>>       u32 kexec_count;
>>   } __packed;
>>
>> Bytes 4-7 would be characters from previous_release (e.g., "0-rc" from
>> "6.19.0-rc4..."). Interpreted as big-endian u32, this gives a garbage size
>> value.
>>
>> The alternatives I see here are:
>>
>>  1) Come back to FDT instead of plain C struct, similarly to the previous
>>     version [1]
>>  2) Created some helpers to treat C struct fields specially just for this
>>     feature, and we can do it later if we have more users.
>>  3) Move this kexec_metadata to work on top of LUO (similarly to memfd), but
>>     that would be an unnecessary dependency just to have this kexec_metadata.
>>
>> That said, for the next version, I am coming back to to FDT.
>
> Please, no. Don't go back to it just for the sake of this bug.
>
> I think KHO's assumption that the subtree will always point to an FDT is
> broken, and we should fix that. I think KHO should expose the blob of
> serialized data and let userspace figure out what the format is and how
> to decode it.
>
> To do that, we would need to update kho_add_subtree() to take a size
> parameter from callers, and pass that down to debugfs code. I count 3
> callers of kho_add_subtree() - memblock, LUO, and test_kho. I think all
> 3 should be fairly easy to update, but I am happy to help out if you
> need.

As an example, I'd do something along the lines of:

diff --git a/mm/memblock.c b/mm/memblock.c
index 905d06b16348..c06d6b9e390b 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -2512,7 +2512,7 @@ static int __init prepare_kho_fdt(void)
 	if (err)
 		goto err_unpreserve_fdt;
 
-	err = kho_add_subtree(MEMBLOCK_KHO_FDT, fdt);
+	err = kho_add_subtree(MEMBLOCK_KHO_FDT, fdt, fdt_totalsize(fdt));
 	if (err)
 		goto err_unpreserve_fdt;

-- 
Regards,
Pratyush Yadav
Re: [PATCH v4] kho: kexec-metadata: track previous kernel chain
Posted by Mike Rapoport 2 weeks, 1 day ago
Hi Breno,

On Wed, Jan 21, 2026 at 06:50:38AM -0800, Breno Leitao wrote:
> Use Kexec Handover (KHO) to pass the previous kernel's version string
> and the number of kexec reboots since the last cold boot to the next
> kernel, and print it at boot time.
> 
> Example output:
>     [    0.000000] KHO: exec from: 6.19.0-rc4-next-20260107 (count 1)
> 
> Motivation
> ==========
> 
> Bugs that only reproduce when kexecing from specific kernel versions
> are difficult to diagnose. These issues occur when a buggy kernel
> kexecs into a new kernel, with the bug manifesting only in the second
> kernel.
> 
> Recent examples include the following commits:
> 
>  * eb2266312507 ("x86/boot: Fix page table access in 5-level to 4-level paging transition")
>  * 77d48d39e991 ("efistub/tpm: Use ACPI reclaim memory for event log to avoid corruption")
>  * 64b45dd46e15 ("x86/efi: skip memattr table on kexec boot")
> 
> As kexec-based reboots become more common, these version-dependent bugs
> are appearing more frequently. At scale, correlating crashes to the
> previous kernel version is challenging, especially when issues only
> occur in specific transition scenarios.
> 
> Implementation
> ==============
> 
> The kexec metadata is stored as a plain C struct (struct kho_kexec_metadata)
> rather than FDT format, for simplicity and direct field access. It is
> registered via kho_add_subtree() as a separate subtree, keeping it
> independent from the core KHO ABI. This design choice:
> 
>  - Keeps the core KHO ABI minimal and stable
>  - Allows the metadata format to evolve independently
>  - Avoids requiring version bumps for all KHO consumers (LUO, etc.)
>    when the metadata format changes
> 
> The struct kho_metadata contains two fields:
>  - previous_release: The kernel version that initiated the kexec
>  - kexec_count: Number of kexec boots since last cold boot
> 
> On cold boot, kexec_count starts at 0 and increments with each kexec.
> The count helps identify issues that only manifest after multiple
> consecutive kexec reboots.
> 
> Signed-off-by: Breno Leitao <leitao@debian.org>
> Acked-by: SeongJae Park <sj@kernel.org>
> ---
> Changes in v4:
> - Squashed everything in a single commit
> - Moved from FDT to C structs (Pratyush)
> - Usage of subtress intead of FDT directly (Pratyush)
> - Renamed a bunch of variables and functions.
> - Link to v3: https://patch.msgid.link/20260108-kho-v3-0-b1d6b7a89342@debian.org
> 
> Changes in v3:
> - Remove the extra CONFIG for this feature.
> - Reworded some identifiers, properties and printks.
> - Better documented the questions raised during v2.
> - Link to v2: https://patch.msgid.link/20260102-kho-v2-0-1747b1a3a1d6@debian.org
> 
> Changes from v2 to v1 (RFC)
> - Track the number of kexecs since cold boot (Pasha)
> - Change the printk() order compared to KHO
> - Rewording of the commit summary
> - Link to RFC: https://patch.msgid.link/20251230-kho-v1-1-4d795a24da9e@debian.org
> ---
>  include/linux/kho/abi/kexec_handover.h | 29 +++++++++++++++
>  kernel/liveupdate/kexec_handover.c     | 65 ++++++++++++++++++++++++++++++++++
>  2 files changed, 94 insertions(+)
> 
> diff --git a/include/linux/kho/abi/kexec_handover.h b/include/linux/kho/abi/kexec_handover.h
> index 285eda8a36e45..e18022a4e664d 100644
> --- a/include/linux/kho/abi/kexec_handover.h
> +++ b/include/linux/kho/abi/kexec_handover.h
> @@ -11,6 +11,7 @@
>  #define _LINUX_KHO_ABI_KEXEC_HANDOVER_H
>  
>  #include <linux/types.h>
> +#include <linux/utsname.h>
>  
>  /**
>   * DOC: Kexec Handover ABI
> @@ -84,6 +85,34 @@
>  /* The FDT property for sub-FDTs. */
>  #define KHO_FDT_SUB_TREE_PROP_NAME "fdt"
>  
> +/**
> + * DOC: Kexec Metadata ABI
> + *

It would be nice to link it from Documentation/ as well ;-)

> + * The "kexec-metadata" subtree stores optional metadata about the kexec chain.
> + * It is registered via kho_add_subtree(), keeping it independent from the core
> + * KHO ABI. This allows the metadata format to evolve without affecting other
> + * KHO consumers.
> + *
> + * The metadata is stored as a plain C struct rather than FDT format for
> + * simplicity and direct field access.
> + */
> +
> +/**
> + * struct kho_kexec_metadata - Kexec metadata passed between kernels
> + * @previous_release: Kernel version string that initiated the kexec
> + * @kexec_count: Number of kexec boots since last cold boot
> + *
> + * This structure is preserved across kexec and allows the new kernel to
> + * identify which kernel it was booted from and how many kexec reboots
> + * have occurred.
> + */
> +struct kho_kexec_metadata {
> +	char previous_release[__NEW_UTS_LEN + 1];
> +	u32 kexec_count;
> +} __packed;
> +
> +#define KHO_METADATA_NODE_NAME "kexec-metadata"
> +
>  /**
>   * DOC: Kexec Handover ABI for vmalloc Preservation
>   *
> diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c

...

>  static __init int kho_init(void)
>  {
>  	const void *fdt = kho_get_fdt();
> @@ -1357,6 +1413,15 @@ static __init int kho_init(void)
>  	if (err)
>  		goto err_free_fdt;
>  
> +	if (fdt)
> +		kho_process_kexec_metadata();

Can't we move it into the existing if (fdt) below?
 
> +
> +	/* Populate kexec metadata for the possible next kexec */
> +	err = kho_populate_kexec_metadata();
> +	if (err)
> +		pr_warn("failed to initialize kexec-metadata subtree: %d\n",
> +			err);

Please follow if (err) goto err_ pattern.

kho_populate_kexec_metadata() failure essentially means that we failed to
allocate memory. This shouldn't happen that early in boot, but if it did,
then something is utterly wrong.

> +
>  	if (fdt) {
>  		kho_in_debugfs_init(&kho_in.dbg, fdt);
>  		return 0;

-- 
Sincerely yours,
Mike.
Re: [PATCH v4] kho: kexec-metadata: track previous kernel chain
Posted by Breno Leitao 2 weeks, 1 day ago
Hello Mike,

On Thu, Jan 22, 2026 at 12:57:50PM +0200, Mike Rapoport wrote:
> > +/**
> > + * DOC: Kexec Metadata ABI
> > + *
> 
> It would be nice to link it from Documentation/ as well ;-)

Ack! I am planning something as:

	commit 90e098ca0d611b44594f08e50ba1cff3c932dd2b
	Author: Breno Leitao <leitao@debian.org>
	Date:   Thu Jan 22 03:47:23 2026 -0800

	kho: document kexec-metadata tracking feature
	
	Add documentation for the kexec-metadata feature that tracks the
	previous kernel version and kexec boot count across kexec reboots.
	This helps diagnose bugs that only reproduce when kexecing from
	specific kernel versions.
	
	Suggested-by: Mike Rapoport <rppt@kernel.org>
	Signed-off-by: Breno Leitao <leitao@debian.org>

	diff --git a/Documentation/admin-guide/mm/kho.rst b/Documentation/admin-guide/mm/kho.rst
	index 6dc18ed4b8861..1faf2c3ba4620 100644
	--- a/Documentation/admin-guide/mm/kho.rst
	+++ b/Documentation/admin-guide/mm/kho.rst
	@@ -113,3 +113,42 @@ stabilized.
	``/sys/kernel/debug/kho/in/sub_fdts/``
	Similar to ``kho/out/sub_fdts/``, but contains sub FDT blobs
	of KHO producers passed from the old kernel.
	+
	+Kexec Metadata
	+==============
	+
	+KHO automatically tracks metadata about the kexec chain, passing information
	+about the previous kernel to the next kernel. This feature helps diagnose
	+bugs that only reproduce when kexecing from specific kernel versions.
	+
	+On each KHO kexec, the kernel logs the previous kernel's version and the
	+number of kexec reboots since the last cold boot::
	+
	+    [    0.000000] KHO: exec from: 6.19.0-rc4-next-20260107 (count 1)
	+
	+The metadata includes:
	+
	+``previous_release``
	+    The kernel version string (from ``uname -r``) of the kernel that
	+    initiated the kexec.
	+
	+``kexec_count``
	+    The number of kexec boots since the last cold boot. On cold boot,
	+    this counter starts at 0 and increments with each kexec. This helps
	+    identify issues that only manifest after multiple consecutive kexec
	+    reboots.
	+
	+Use Cases
	+---------
	+
	+This metadata is particularly useful for debugging kexec transition bugs,
	+where a buggy kernel kexecs into a new kernel and the bug manifests only
	+in the second kernel. Examples of such bugs include:
	+
	+- Memory corruption from the previous kernel affecting the new kernel
	+- Incorrect hardware state left by the previous kernel
	+- Firmware/ACPI state issues that only appear in kexec scenarios
	+
	+At scale, correlating crashes to the previous kernel version enables
	+faster root cause analysis when issues only occur in specific kernel
	+transition scenarios.


> > diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
> 
> ...
> 
> >  static __init int kho_init(void)
> >  {
> >  	const void *fdt = kho_get_fdt();
> > @@ -1357,6 +1413,15 @@ static __init int kho_init(void)
> >  	if (err)
> >  		goto err_free_fdt;
> >  
> > +	if (fdt)
> > +		kho_process_kexec_metadata();
> 
> Can't we move it into the existing if (fdt) below?

Unfortunately, that won't work due to a data dependency between the two
functions.

kho_process_kexec_metadata() reads from the FDT subtree and populates kho_in:

Basically:

	kho_in.kexec_count = metadata->kexec_count;

While kho_populate_kexec_metadata() increments metadata->kexec_count:

          /* kho_in.kexec_count is set to 0 on cold boot */
          metadata->kexec_count = kho_in.kexec_count + 1;

If kho_process_kexec_metadata() is moved after kho_populate_kexec_metadata(),
the count would always increment from 0 to 1, ignoring whatever was passed in
the FDT.

Restructuring to call kho_in_debugfs_init() earlier also doesn't work:


	if (fdt) {
		kho_in_debugfs_init(&kho_in.dbg, fdt);
		kho_process_kexec_metadata();
		return 0;
	}

	/* Populate kexec metadata for the possible next kexec */
	err = kho_populate_kexec_metadata();
	if (err)
                  pr_warn("failed to initialize kexec-metadata subtree: %d\n",
                          err);

This would return early without populating the kexec metadata for the next
kexec, breaking the chain on KHO boots.

Please let me know if I am missing any other option.

> > +
> > +	/* Populate kexec metadata for the possible next kexec */
> > +	err = kho_populate_kexec_metadata();
> > +	if (err)
> > +		pr_warn("failed to initialize kexec-metadata subtree: %d\n",
> > +			err);
> 
> Please follow if (err) goto err_ pattern.
> 
> kho_populate_kexec_metadata() failure essentially means that we failed to
> allocate memory. This shouldn't happen that early in boot, but if it did,
> then something is utterly wrong.

Ack!

Thanks for the review,
--breno
Re: [PATCH v4] kho: kexec-metadata: track previous kernel chain
Posted by Mike Rapoport 1 week, 5 days ago
On Thu, Jan 22, 2026 at 04:04:55AM -0800, Breno Leitao wrote:
> Hello Mike,
> 
> On Thu, Jan 22, 2026 at 12:57:50PM +0200, Mike Rapoport wrote:
> > > +/**
> > > + * DOC: Kexec Metadata ABI
> > > + *
> > 
> > It would be nice to link it from Documentation/ as well ;-)
> 
> Ack! I am planning something as:
> 
> 	commit 90e098ca0d611b44594f08e50ba1cff3c932dd2b
> 	Author: Breno Leitao <leitao@debian.org>
> 	Date:   Thu Jan 22 03:47:23 2026 -0800
> 
> 	kho: document kexec-metadata tracking feature
> 	
> 	Add documentation for the kexec-metadata feature that tracks the
> 	previous kernel version and kexec boot count across kexec reboots.
> 	This helps diagnose bugs that only reproduce when kexecing from
> 	specific kernel versions.
> 	
> 	Suggested-by: Mike Rapoport <rppt@kernel.org>
> 	Signed-off-by: Breno Leitao <leitao@debian.org>
> 
> 	diff --git a/Documentation/admin-guide/mm/kho.rst b/Documentation/admin-guide/mm/kho.rst
> 	index 6dc18ed4b8861..1faf2c3ba4620 100644
> 	--- a/Documentation/admin-guide/mm/kho.rst
> 	+++ b/Documentation/admin-guide/mm/kho.rst
> 	@@ -113,3 +113,42 @@ stabilized.
> 	``/sys/kernel/debug/kho/in/sub_fdts/``
> 	Similar to ``kho/out/sub_fdts/``, but contains sub FDT blobs
> 	of KHO producers passed from the old kernel.
> 	+
> 	+Kexec Metadata
> 	+==============

I'd move this section before "debugfs Interfaces", other than that LGTM.

> 	+
> 	+KHO automatically tracks metadata about the kexec chain, passing information
> 	+about the previous kernel to the next kernel. This feature helps diagnose
> 	+bugs that only reproduce when kexecing from specific kernel versions.

...

> > >  static __init int kho_init(void)
> > >  {
> > >  	const void *fdt = kho_get_fdt();
> > > @@ -1357,6 +1413,15 @@ static __init int kho_init(void)
> > >  	if (err)
> > >  		goto err_free_fdt;
> > >  
> > > +	if (fdt)
> > > +		kho_process_kexec_metadata();
> > 
> > Can't we move it into the existing if (fdt) below?
> 
> Unfortunately, that won't work due to a data dependency between the two
> functions.
> 
> kho_process_kexec_metadata() reads from the FDT subtree and populates kho_in:
> 
> Basically:
> 
> 	kho_in.kexec_count = metadata->kexec_count;
> 
> While kho_populate_kexec_metadata() increments metadata->kexec_count:
> 
>           /* kho_in.kexec_count is set to 0 on cold boot */
>           metadata->kexec_count = kho_in.kexec_count + 1;
> 
> If kho_process_kexec_metadata() is moved after kho_populate_kexec_metadata(),
> the count would always increment from 0 to 1, ignoring whatever was passed in
> the FDT.
> 
> Restructuring to call kho_in_debugfs_init() earlier also doesn't work:
> 
> 
> 	if (fdt) {
> 		kho_in_debugfs_init(&kho_in.dbg, fdt);
> 		kho_process_kexec_metadata();
> 		return 0;
> 	}
> 
> 	/* Populate kexec metadata for the possible next kexec */
> 	err = kho_populate_kexec_metadata();
> 	if (err)
>                   pr_warn("failed to initialize kexec-metadata subtree: %d\n",
>                           err);
> 
> This would return early without populating the kexec metadata for the next
> kexec, breaking the chain on KHO boots.

How about we rename kho_process_kexec_metadata() to
kho_retreive_kexec_metadata() and add kho_process_kexec_metadata() that
will first call _retrieve and then _populate? Something like

static int __init kho_process_kexec_metadata(const void *fdt)
{
	int err;

	if (fdt)
		kho_retrieve_kexec_metadata();

	/* Populate kexec metadata for the possible next kexec */
	err = kho_populate_kexec_metadata();
	if (err)
		pr_warn("failed to initialize kexec-metadata subtree: %d\n",
			err);

	return err;
}

> --breno

-- 
Sincerely yours,
Mike.
Re: [PATCH v4] kho: kexec-metadata: track previous kernel chain
Posted by Breno Leitao 1 week, 4 days ago
On Sun, Jan 25, 2026 at 01:32:27PM +0200, Mike Rapoport wrote:
> On Thu, Jan 22, 2026 at 04:04:55AM -0800, Breno Leitao wrote:
> > Ack! I am planning something as:
> > 
> > 	commit 90e098ca0d611b44594f08e50ba1cff3c932dd2b
> > 	Author: Breno Leitao <leitao@debian.org>
> > 	Date:   Thu Jan 22 03:47:23 2026 -0800
> > 
> > 	kho: document kexec-metadata tracking feature

...

> I'd move this section before "debugfs Interfaces", other than that LGTM.

Ack!

> > > Can't we move it into the existing if (fdt) below?
> > 
> > Unfortunately, that won't work due to a data dependency between the two
> > functions.
> > 
> > kho_process_kexec_metadata() reads from the FDT subtree and populates kho_in:
> > 
> > Basically:
> > 
> > 	kho_in.kexec_count = metadata->kexec_count;
> > 
> > While kho_populate_kexec_metadata() increments metadata->kexec_count:
> > 
> >           /* kho_in.kexec_count is set to 0 on cold boot */
> >           metadata->kexec_count = kho_in.kexec_count + 1;
> > 
> > If kho_process_kexec_metadata() is moved after kho_populate_kexec_metadata(),
> > the count would always increment from 0 to 1, ignoring whatever was passed in
> > the FDT.
> > 
> > Restructuring to call kho_in_debugfs_init() earlier also doesn't work:
> > 
> > 
> > 	if (fdt) {
> > 		kho_in_debugfs_init(&kho_in.dbg, fdt);
> > 		kho_process_kexec_metadata();
> > 		return 0;
> > 	}
> > 
> > 	/* Populate kexec metadata for the possible next kexec */
> > 	err = kho_populate_kexec_metadata();
> > 	if (err)
> >                   pr_warn("failed to initialize kexec-metadata subtree: %d\n",
> >                           err);
> > 
> > This would return early without populating the kexec metadata for the next
> > kexec, breaking the chain on KHO boots.
> 
> How about we rename kho_process_kexec_metadata() to
> kho_retreive_kexec_metadata() and add kho_process_kexec_metadata() that
> will first call _retrieve and then _populate? Something like
> 
> static int __init kho_process_kexec_metadata(const void *fdt)
> {
> 	int err;
> 
> 	if (fdt)
> 		kho_retrieve_kexec_metadata();
> 
> 	/* Populate kexec metadata for the possible next kexec */
> 	err = kho_populate_kexec_metadata();
> 	if (err)
> 		pr_warn("failed to initialize kexec-metadata subtree: %d\n",
> 			err);
> 
> 	return err;
> }

That would work, I will probably rename it to kho_kexec_metadata_init()
or kho_kexec_metadata_setup() to follow the same pattern already there.

Thanks for the review, I will send an updated version soon,
--breno