vmcore_info: expose hardware error recovery statistics via sysfs

[PATCH] vmcore_info: expose hardware error recovery statistics via sysfs

Posted by Breno Leitao 1 week, 2 days ago

Add a sysfs file at /sys/kernel/vmcore_stats and expose hardware error
recovery statistics that are already tracked by the kernel. This allows
userspace monitoring tools to track recovered hardware errors without
requiring kernel crashes.

This is useful to track recoverable hardware errors in a time series,
even if the host doesn't crash.

Create a generic vmcore_stats sysfs, and add a section for
hwerr_recovery that shows the counts per subsystem and timestamps:

  - cpu: CPU-related errors (MCE, ARM processor errors)
  - memory: Memory-related errors
  - pci: PCI/PCIe AER non-fatal errors
  - cxl: CXL errors
  - other: Other hardware errors

Example output:
  hwerr_recovery:
    cpu: 0 (0)
    memory: 2 (1738148257)
    pci: 1 (1738147000)
    cxl: 0 (0)
    other: 0 (0)

The value in parentheses is the timestamp (seconds since epoch) of the
last error of that type, or 0 if no errors have occurred.

These statistics provide visibility into the health of the system's
hardware and can be used by system administrators to proactively detect
failing components before they cause system crashes.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
To: akpm@linux-foundation.org
Cc: kexec@lists.infradead.org
To: bhe@redhat.com
Cc: linux-kernel@vger.kernel.org
Cc: dyoung@redhat.com
Cc: tony.luck@intel.com
Cc: xueshuai@linux.alibaba.com
Cc: vgoyal@redhat.com
Cc: zhiquan1.li@intel.com
Cc: olja@meta.com
---
 .../ABI/testing/sysfs-kernel-vmcore_stats          | 23 ++++++++++++++++
 kernel/vmcore_info.c                               | 31 ++++++++++++++++++++++
 2 files changed, 54 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-kernel-vmcore_stats b/Documentation/ABI/testing/sysfs-kernel-vmcore_stats
new file mode 100644
index 0000000000000..b42f18d24c00b
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-kernel-vmcore_stats
@@ -0,0 +1,23 @@
+What:		/sys/kernel/vmcore_stats
+Date:		January 2026
+KernelVersion:	6.20
+Contact:	Breno Leitao <leitao@debian.org>
+Description:
+		Shows statistics related to vmcore functionality. Currently
+		includes hardware error recovery statistics.
+
+		Format:
+		  Recovered hardware errors:
+		    metric: count (timestamp)
+
+		Statistics about recoverable hardware errors that the kernel
+		has handled since boot. Each metric shows the count and
+		timestamp (seconds since epoch) of the last error in
+		parentheses (0 if no errors have occurred).
+
+		Metrics:
+		    - cpu: CPU-related errors (MCE, ARM processor errors)
+		    - memory: Memory-related errors
+		    - pci: PCI/PCIe AER non-fatal errors
+		    - cxl: CXL (Compute Express Link) errors
+		    - other: Other hardware errors
diff --git a/kernel/vmcore_info.c b/kernel/vmcore_info.c
index fe9bf8db1922e..5974b4be08cbc 100644
--- a/kernel/vmcore_info.c
+++ b/kernel/vmcore_info.c
@@ -6,6 +6,8 @@
 
 #include <linux/buildid.h>
 #include <linux/init.h>
+#include <linux/kobject.h>
+#include <linux/sysfs.h>
 #include <linux/utsname.h>
 #include <linux/vmalloc.h>
 #include <linux/sizes.h>
@@ -135,6 +137,31 @@ void hwerr_log_error_type(enum hwerr_error_type src)
 }
 EXPORT_SYMBOL_GPL(hwerr_log_error_type);
 
+/* sysfs interface for hardware error recovery statistics */
+static ssize_t vmcore_stats_show(struct kobject *kobj,
+				 struct kobj_attribute *attr, char *buf)
+{
+	return sysfs_emit(buf,
+			  "Recovered hardware errors:\n"
+			  "  cpu: %d (%lld)\n"
+			  "  memory: %d (%lld)\n"
+			  "  pci: %d (%lld)\n"
+			  "  cxl: %d (%lld)\n"
+			  "  other: %d (%lld)\n",
+			  atomic_read(&hwerr_data[HWERR_RECOV_CPU].count),
+			  (long long)READ_ONCE(hwerr_data[HWERR_RECOV_CPU].timestamp),
+			  atomic_read(&hwerr_data[HWERR_RECOV_MEMORY].count),
+			  (long long)READ_ONCE(hwerr_data[HWERR_RECOV_MEMORY].timestamp),
+			  atomic_read(&hwerr_data[HWERR_RECOV_PCI].count),
+			  (long long)READ_ONCE(hwerr_data[HWERR_RECOV_PCI].timestamp),
+			  atomic_read(&hwerr_data[HWERR_RECOV_CXL].count),
+			  (long long)READ_ONCE(hwerr_data[HWERR_RECOV_CXL].timestamp),
+			  atomic_read(&hwerr_data[HWERR_RECOV_OTHERS].count),
+			  (long long)READ_ONCE(hwerr_data[HWERR_RECOV_OTHERS].timestamp));
+}
+
+static struct kobj_attribute vmcore_stats_attr = __ATTR_RO(vmcore_stats);
+
 static int __init crash_save_vmcoreinfo_init(void)
 {
 	vmcoreinfo_data = (unsigned char *)get_zeroed_page(GFP_KERNEL);
@@ -244,6 +271,10 @@ static int __init crash_save_vmcoreinfo_init(void)
 	arch_crash_save_vmcoreinfo();
 	update_vmcoreinfo_note();
 
+	/* Create /sys/kernel/vmcore_stats */
+	if (sysfs_create_file(kernel_kobj, &vmcore_stats_attr.attr))
+		pr_warn("Failed to create vmcore_stats sysfs file\n");
+
 	return 0;
 }
 

---
base-commit: 8dfce8991b95d8625d0a1d2896e42f93b9d7f68d
change-id: 20260129-vmcoreinfo_sysfs-ff4687979cd5

Best regards,
--  
Breno Leitao <leitao@debian.org>

Re: [PATCH] vmcore_info: expose hardware error recovery statistics via sysfs

Posted by Baoquan He 1 week, 2 days ago

On 01/29/26 at 05:34am, Breno Leitao wrote:
> Add a sysfs file at /sys/kernel/vmcore_stats and expose hardware error
> recovery statistics that are already tracked by the kernel. This allows
> userspace monitoring tools to track recovered hardware errors without
> requiring kernel crashes.

I don't understand. If w/o requring kernel crashes, why do you call it
vmcore_stats? It's a normal showing of hardware error recovery
statistics tracked by kernel, can we name it /sys/kernel/hwerr_stats?
It's obviously having nothiing to do with vmcore, isn't it?

> 
> This is useful to track recoverable hardware errors in a time series,
> even if the host doesn't crash.
> 
> Create a generic vmcore_stats sysfs, and add a section for
> hwerr_recovery that shows the counts per subsystem and timestamps:
> 
>   - cpu: CPU-related errors (MCE, ARM processor errors)
>   - memory: Memory-related errors
>   - pci: PCI/PCIe AER non-fatal errors
>   - cxl: CXL errors
>   - other: Other hardware errors
> 
> Example output:
>   hwerr_recovery:
>     cpu: 0 (0)
>     memory: 2 (1738148257)
>     pci: 1 (1738147000)
>     cxl: 0 (0)
>     other: 0 (0)
> 
> The value in parentheses is the timestamp (seconds since epoch) of the
> last error of that type, or 0 if no errors have occurred.
> 
> These statistics provide visibility into the health of the system's
> hardware and can be used by system administrators to proactively detect
> failing components before they cause system crashes.
> 
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
> To: akpm@linux-foundation.org
> Cc: kexec@lists.infradead.org
> To: bhe@redhat.com
> Cc: linux-kernel@vger.kernel.org
> Cc: dyoung@redhat.com
> Cc: tony.luck@intel.com
> Cc: xueshuai@linux.alibaba.com
> Cc: vgoyal@redhat.com
> Cc: zhiquan1.li@intel.com
> Cc: olja@meta.com
> ---
>  .../ABI/testing/sysfs-kernel-vmcore_stats          | 23 ++++++++++++++++
>  kernel/vmcore_info.c                               | 31 ++++++++++++++++++++++
>  2 files changed, 54 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-kernel-vmcore_stats b/Documentation/ABI/testing/sysfs-kernel-vmcore_stats
> new file mode 100644
> index 0000000000000..b42f18d24c00b
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-kernel-vmcore_stats
> @@ -0,0 +1,23 @@
> +What:		/sys/kernel/vmcore_stats
> +Date:		January 2026
> +KernelVersion:	6.20
> +Contact:	Breno Leitao <leitao@debian.org>
> +Description:
> +		Shows statistics related to vmcore functionality. Currently
> +		includes hardware error recovery statistics.
> +
> +		Format:
> +		  Recovered hardware errors:
> +		    metric: count (timestamp)
> +
> +		Statistics about recoverable hardware errors that the kernel
> +		has handled since boot. Each metric shows the count and
> +		timestamp (seconds since epoch) of the last error in
> +		parentheses (0 if no errors have occurred).
> +
> +		Metrics:
> +		    - cpu: CPU-related errors (MCE, ARM processor errors)
> +		    - memory: Memory-related errors
> +		    - pci: PCI/PCIe AER non-fatal errors
> +		    - cxl: CXL (Compute Express Link) errors
> +		    - other: Other hardware errors
> diff --git a/kernel/vmcore_info.c b/kernel/vmcore_info.c
> index fe9bf8db1922e..5974b4be08cbc 100644
> --- a/kernel/vmcore_info.c
> +++ b/kernel/vmcore_info.c
> @@ -6,6 +6,8 @@
>  
>  #include <linux/buildid.h>
>  #include <linux/init.h>
> +#include <linux/kobject.h>
> +#include <linux/sysfs.h>
>  #include <linux/utsname.h>
>  #include <linux/vmalloc.h>
>  #include <linux/sizes.h>
> @@ -135,6 +137,31 @@ void hwerr_log_error_type(enum hwerr_error_type src)
>  }
>  EXPORT_SYMBOL_GPL(hwerr_log_error_type);
>  
> +/* sysfs interface for hardware error recovery statistics */
> +static ssize_t vmcore_stats_show(struct kobject *kobj,
> +				 struct kobj_attribute *attr, char *buf)
> +{
> +	return sysfs_emit(buf,
> +			  "Recovered hardware errors:\n"
> +			  "  cpu: %d (%lld)\n"
> +			  "  memory: %d (%lld)\n"
> +			  "  pci: %d (%lld)\n"
> +			  "  cxl: %d (%lld)\n"
> +			  "  other: %d (%lld)\n",
> +			  atomic_read(&hwerr_data[HWERR_RECOV_CPU].count),
> +			  (long long)READ_ONCE(hwerr_data[HWERR_RECOV_CPU].timestamp),
> +			  atomic_read(&hwerr_data[HWERR_RECOV_MEMORY].count),
> +			  (long long)READ_ONCE(hwerr_data[HWERR_RECOV_MEMORY].timestamp),
> +			  atomic_read(&hwerr_data[HWERR_RECOV_PCI].count),
> +			  (long long)READ_ONCE(hwerr_data[HWERR_RECOV_PCI].timestamp),
> +			  atomic_read(&hwerr_data[HWERR_RECOV_CXL].count),
> +			  (long long)READ_ONCE(hwerr_data[HWERR_RECOV_CXL].timestamp),
> +			  atomic_read(&hwerr_data[HWERR_RECOV_OTHERS].count),
> +			  (long long)READ_ONCE(hwerr_data[HWERR_RECOV_OTHERS].timestamp));
> +}
> +
> +static struct kobj_attribute vmcore_stats_attr = __ATTR_RO(vmcore_stats);
> +
>  static int __init crash_save_vmcoreinfo_init(void)
>  {
>  	vmcoreinfo_data = (unsigned char *)get_zeroed_page(GFP_KERNEL);
> @@ -244,6 +271,10 @@ static int __init crash_save_vmcoreinfo_init(void)
>  	arch_crash_save_vmcoreinfo();
>  	update_vmcoreinfo_note();
>  
> +	/* Create /sys/kernel/vmcore_stats */
> +	if (sysfs_create_file(kernel_kobj, &vmcore_stats_attr.attr))
> +		pr_warn("Failed to create vmcore_stats sysfs file\n");
> +
>  	return 0;
>  }
>  
> 
> ---
> base-commit: 8dfce8991b95d8625d0a1d2896e42f93b9d7f68d
> change-id: 20260129-vmcoreinfo_sysfs-ff4687979cd5
> 
> Best regards,
> --  
> Breno Leitao <leitao@debian.org>
>

Re: [PATCH] vmcore_info: expose hardware error recovery statistics via sysfs

Posted by Breno Leitao 1 week, 1 day ago

On Fri, Jan 30, 2026 at 09:59:58AM +0800, Baoquan He wrote:
> On 01/29/26 at 05:34am, Breno Leitao wrote:
> > Add a sysfs file at /sys/kernel/vmcore_stats and expose hardware error
> > recovery statistics that are already tracked by the kernel. This allows
> > userspace monitoring tools to track recovered hardware errors without
> > requiring kernel crashes.
>
> I don't understand. If w/o requring kernel crashes, why do you call it
> vmcore_stats? It's a normal showing of hardware error recovery
> statistics tracked by kernel, can we name it /sys/kernel/hwerr_stats?

Agreed, /sys/kernel/hwerr_stats is a much better name. Thank you for the
suggestion!

> It's obviously having nothiing to do with vmcore, isn't it?

You're correct. The only connection is that this functionality currently
resides in kernel/vmcore_info.c. I initially thought it would make sense
to create a generic sysfs entry for vmcore, but I can see now that this
caused more confusion than clarification.

I'll update the patch accordingly.

Thanks,
--breno

Re: [PATCH] vmcore_info: expose hardware error recovery statistics via sysfs

Posted by Andrew Morton 1 week, 2 days ago

On Thu, 29 Jan 2026 05:34:10 -0800 Breno Leitao <leitao@debian.org> wrote:

> Add a sysfs file at /sys/kernel/vmcore_stats and expose hardware error
> recovery statistics that are already tracked by the kernel. This allows
> userspace monitoring tools to track recovered hardware errors without
> requiring kernel crashes.
> 
> This is useful to track recoverable hardware errors in a time series,
> even if the host doesn't crash.
> 
> Create a generic vmcore_stats sysfs, and add a section for
> hwerr_recovery that shows the counts per subsystem and timestamps:
> 
>   - cpu: CPU-related errors (MCE, ARM processor errors)
>   - memory: Memory-related errors
>   - pci: PCI/PCIe AER non-fatal errors
>   - cxl: CXL errors
>   - other: Other hardware errors
> 
> Example output:
>   hwerr_recovery:
>     cpu: 0 (0)
>     memory: 2 (1738148257)
>     pci: 1 (1738147000)
>     cxl: 0 (0)
>     other: 0 (0)

sysfs rules (which are widely ignored) say "one value per file".

As a compromise the above could be squished into a single line.  Harder
for humans to read, but it sounds like that isn't the expected use case.

Or screw sysfs rules ;)

> The value in parentheses is the timestamp (seconds since epoch) of the
> last error of that type, or 0 if no errors have occurred.
> 
> These statistics provide visibility into the health of the system's
> hardware and can be used by system administrators to proactively detect
> failing components before they cause system crashes.
> 
> ...
>
> +/* sysfs interface for hardware error recovery statistics */
> +static ssize_t vmcore_stats_show(struct kobject *kobj,
> +				 struct kobj_attribute *attr, char *buf)
> +{
> +	return sysfs_emit(buf,
> +			  "Recovered hardware errors:\n"
> +			  "  cpu: %d (%lld)\n"
> +			  "  memory: %d (%lld)\n"
> +			  "  pci: %d (%lld)\n"
> +			  "  cxl: %d (%lld)\n"
> +			  "  other: %d (%lld)\n",
> +			  atomic_read(&hwerr_data[HWERR_RECOV_CPU].count),
> +			  (long long)READ_ONCE(hwerr_data[HWERR_RECOV_CPU].timestamp),
> +			  atomic_read(&hwerr_data[HWERR_RECOV_MEMORY].count),
> +			  (long long)READ_ONCE(hwerr_data[HWERR_RECOV_MEMORY].timestamp),

vsprintf has `%ptT' for time64_t.  Is it usable here?

> +			  atomic_read(&hwerr_data[HWERR_RECOV_PCI].count),
> +			  (long long)READ_ONCE(hwerr_data[HWERR_RECOV_PCI].timestamp),
> +			  atomic_read(&hwerr_data[HWERR_RECOV_CXL].count),
> +			  (long long)READ_ONCE(hwerr_data[HWERR_RECOV_CXL].timestamp),
> +			  atomic_read(&hwerr_data[HWERR_RECOV_OTHERS].count),
> +			  (long long)READ_ONCE(hwerr_data[HWERR_RECOV_OTHERS].timestamp));
> +}
> +
>
> ...
>

Re: [PATCH] vmcore_info: expose hardware error recovery statistics via sysfs

Posted by Breno Leitao 1 week, 1 day ago

Hello Andrew,

On Thu, Jan 29, 2026 at 02:28:01PM -0800, Andrew Morton wrote:
> On Thu, 29 Jan 2026 05:34:10 -0800 Breno Leitao <leitao@debian.org> wrote:
> 
> > Example output:
> >   hwerr_recovery:
> >     cpu: 0 (0)
> >     memory: 2 (1738148257)
> >     pci: 1 (1738147000)
> >     cxl: 0 (0)
> >     other: 0 (0)
>
> sysfs rules (which are widely ignored) say "one value per file".
>
> As a compromise the above could be squished into a single line.  Harder
> for humans to read, but it sounds like that isn't the expected use case.

I'm fine with consolidating into a single line, though it would mean
removing the timestamp of the last recovery event. Since the primary
use case is tracking event counts in a time series, this is acceptable.

My proposal:

  # cat /sys/kernel/hwerr_stats
  cpu:0 memory:2 pci:1 cxl:0 other:0

I suggest keeping the field names rather than dropping them entirely.
While it would make the output more compact, having explicit names
makes the format more maintainable when adding new recovery types in
the future.

Without names, it would look like:

  # cat /sys/kernel/hwerr_stats
  0 2 1 0 0

What would you say?

Thanks for reviewing it,
--breno