[PATCH v5 00/10] Implement MPIPL for PowerNV

Aditya Gupta posted 10 patches 1 month ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20260310124619.3909045-1-adityag@linux.ibm.com
Maintainers: Nicholas Piggin <npiggin@gmail.com>, Chinmay Rath <rathc@linux.ibm.com>, Glenn Miles <milesg@linux.ibm.com>, Aditya Gupta <adityag@linux.ibm.com>, Hari Bathini <hbathini@linux.ibm.com>, Sourabh Jain <sourabhjain@linux.ibm.com>
MAINTAINERS                           |   8 +
hw/ppc/meson.build                    |   1 +
hw/ppc/pnv.c                          |  98 ++++++
hw/ppc/pnv_mpipl.c                    | 482 ++++++++++++++++++++++++++
hw/ppc/pnv_sbe.c                      |  84 ++++-
include/hw/ppc/pnv.h                  |   7 +
include/hw/ppc/pnv_mpipl.h            | 168 +++++++++
tests/functional/ppc64/test_fadump.py |  35 +-
8 files changed, 852 insertions(+), 31 deletions(-)
create mode 100644 hw/ppc/pnv_mpipl.c
create mode 100644 include/hw/ppc/pnv_mpipl.h
[PATCH v5 00/10] Implement MPIPL for PowerNV
Posted by Aditya Gupta 1 month ago
Overview
=========

Implemented MPIPL (Memory Preserving IPL, aka fadump) on PowerNV machine
in QEMU.

Fadump is an alternative dump mechanism to kdump, in which we the firmware
does a memory preserving boot, and the second/crashkernel is booted fresh
like a normal system reset, instead of the crashed kernel loading the
second/crashkernel in case of kdump.

MPIPL in PowerNV, is similar to fadump in Pseries. The idea is same, memory
preserving, where in PowerNV we are assisted by SBE (Self Boot Engine) &
Hostboot, while in Pseries we are assisted by PHyp (Power Hypervisor)

For implementing in baremetal/powernv QEMU, we need to export a
"ibm,opal/dump" node in the device tree, to tell the kernel we support
MPIPL

Once kernel sees the support, and "fadump=on" is passed on commandline,
kernel will register memory regions to preserve with Skiboot.

Kernel sends these data using OPAL calls, after which skiboot/opal saves
the memory region details to MDST and MDDT tables (S-source, D-destination)

Then in the event of a kernel crash, the kernel initiates MPIPL with another
OPAL code (opal_cec_reboot2), this request goes to Skiboot.
Skiboot then triggers the "S0 Interrupt" to the SBE (Self Boot Engine),
along with OPAL's relocated base address.

SBE then stops all core clocks, and only does particular ISteps for a
memory preserving boot.

Then, hostboot comes up, and with help of the relocated base address, it
accesses MDST & MDDT tables (S-source and D-destination), and preserves the
memory regions according to the data in these tables.
And after preserving, it writes the preserved memory region details to MDRT
tables (R-Result), for the kernel to know where/whether a memory region is
preserved.

Both SBE's and hostboot responsiblities are implemented in the SBE code
in QEMU.

Then in the second kernel/crashkernel boot, OPAL passes the "mpipl-boot"
property for the kernel to know that a dump is active, which kernel then
exports in /proc/vmcore

Testing
====================

1. Git tree for testing: https://gitlab.com/adi-g15-ibm/qemu/tree/fadump-powernv-v5

2. Gitlab pipeline: https://gitlab.com/adi-g15-ibm/qemu/-/pipelines/2375470651

3. Analysing generated vmcore:

	# ls -lh /proc/vmcore
	-r--------    1 root     root        4.5G Mar 10 12:30 /proc/vmcore

	# file /proc/vmcore
	/proc/vmcore: ELF 64-bit LSB core file, 64-bit PowerPC or cisco 7500, OpenPOWER ELF V2 ABI, version 1 (SYSV), SVR4-style

	# crash vmlinux-38fec10eb60d-network vmcore-powernv-10mar26
	...
	      KERNEL: vmlinux-38fec10eb60d-network
	    DUMPFILE: vmcore-powernv-10mar26
	        CPUS: 2
	        DATE: Thu Jan  1 05:30:00 IST 1970
	      UPTIME: 00:00:50
	LOAD AVERAGE: 0.57, 0.19, 0.07
	       TASKS: 83
	    NODENAME: buildroot
	     RELEASE: 6.14.0
	     VERSION: #1 SMP Thu Apr  3 08:06:13 CDT 2025
	     MACHINE: ppc64le  (1000 Mhz)
	      MEMORY: 6 GB
	       PANIC: "Kernel panic - not syncing: sysrq triggered crash"
	         PID: 238
	     COMMAND: "sh"
	        TASK: c00000000a0f3200  [THREAD_INFO: c00000000a0f3200]
	         CPU: 0
	       STATE: TASK_RUNNING (PANIC)

	crash> # ps and kmem -i works

Changelog
====================

v4 -> v5:
* #4/10: set chunk_id=0 before copying
* #7/10: remove unnecessary bool check, ie. 'if (b1) b2=b1 else b2=!b1' => 'b2=b1'

v3 -> v4:
* #2/10: s/recieves/receives
* #7/10: remove empty line at EOF

v2 -> v3:
* rebase to upstream, changes in patches below
* #2/10: no code change. add comment that skiboot triggers S0
* #3/10: stash command: handle invalid skiboot_base sent by guest
* #4/10: s/src_len/data_len/
* #4/10: use TARGET_FMT_lx/PRIx64 instead of %lx to prevent build errors
* #4/10: stop copying chunks once copying a chunk fails
* #5/10: use address_space_{read,write} instead of cpu_physical_memory_{read,write}
* #5/10: add more SPRs to be saved, same set of SPRs as spapr FADump, except CR and FPSCR
* #7/10: only export "mpipl-boot" property if preserving cpu states and writing MDRT was successful, otherwise continue with normal reboot
* #7/10: use address_space_{read,write} instead of cpu_physical_memory_{read,write}
* #8/10: reword commit description to mention fw-load-area, no code change
* #10/10: add entry in MAINTAINERS file

Aditya Gupta (10):
  ppc/pnv: Move SBE host doorbell function to top of file
  ppc/mpipl: Implement S0 SBE interrupt
  ppc/pnv: Handle stash command in PowerNV SBE
  pnv/mpipl: Preserve memory regions as per MDST/MDDT tables
  pnv/mpipl: Preserve CPU registers after crash
  pnv/mpipl: Set thread entry size to be allocated by firmware
  pnv/mpipl: Write the preserved CPU and MDRT state
  pnv/mpipl: Enable MPIPL support
  tests/functional: Add test for MPIPL in PowerNV
  MAINTAINERS: Add entry for MPIPL (PowerNV)

 MAINTAINERS                           |   8 +
 hw/ppc/meson.build                    |   1 +
 hw/ppc/pnv.c                          |  98 ++++++
 hw/ppc/pnv_mpipl.c                    | 482 ++++++++++++++++++++++++++
 hw/ppc/pnv_sbe.c                      |  84 ++++-
 include/hw/ppc/pnv.h                  |   7 +
 include/hw/ppc/pnv_mpipl.h            | 168 +++++++++
 tests/functional/ppc64/test_fadump.py |  35 +-
 8 files changed, 852 insertions(+), 31 deletions(-)
 create mode 100644 hw/ppc/pnv_mpipl.c
 create mode 100644 include/hw/ppc/pnv_mpipl.h

--
2.53.0
Re: [PATCH v5 00/10] Implement MPIPL for PowerNV
Posted by Sourabh Jain 1 month ago
Thanks for adding fadump support on PowerNV platform.

The whole patch series looks good to me.
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>


On 10/03/26 18:16, Aditya Gupta wrote:
> Overview
> =========
>
> Implemented MPIPL (Memory Preserving IPL, aka fadump) on PowerNV machine
> in QEMU.
>
> Fadump is an alternative dump mechanism to kdump, in which we the firmware
> does a memory preserving boot, and the second/crashkernel is booted fresh
> like a normal system reset, instead of the crashed kernel loading the
> second/crashkernel in case of kdump.
>
> MPIPL in PowerNV, is similar to fadump in Pseries. The idea is same, memory
> preserving, where in PowerNV we are assisted by SBE (Self Boot Engine) &
> Hostboot, while in Pseries we are assisted by PHyp (Power Hypervisor)
>
> For implementing in baremetal/powernv QEMU, we need to export a
> "ibm,opal/dump" node in the device tree, to tell the kernel we support
> MPIPL
>
> Once kernel sees the support, and "fadump=on" is passed on commandline,
> kernel will register memory regions to preserve with Skiboot.
>
> Kernel sends these data using OPAL calls, after which skiboot/opal saves
> the memory region details to MDST and MDDT tables (S-source, D-destination)
>
> Then in the event of a kernel crash, the kernel initiates MPIPL with another
> OPAL code (opal_cec_reboot2), this request goes to Skiboot.
> Skiboot then triggers the "S0 Interrupt" to the SBE (Self Boot Engine),
> along with OPAL's relocated base address.
>
> SBE then stops all core clocks, and only does particular ISteps for a
> memory preserving boot.
>
> Then, hostboot comes up, and with help of the relocated base address, it
> accesses MDST & MDDT tables (S-source and D-destination), and preserves the
> memory regions according to the data in these tables.
> And after preserving, it writes the preserved memory region details to MDRT
> tables (R-Result), for the kernel to know where/whether a memory region is
> preserved.
>
> Both SBE's and hostboot responsiblities are implemented in the SBE code
> in QEMU.
>
> Then in the second kernel/crashkernel boot, OPAL passes the "mpipl-boot"
> property for the kernel to know that a dump is active, which kernel then
> exports in /proc/vmcore
>
> Testing
> ====================
>
> 1. Git tree for testing: https://gitlab.com/adi-g15-ibm/qemu/tree/fadump-powernv-v5
>
> 2. Gitlab pipeline: https://gitlab.com/adi-g15-ibm/qemu/-/pipelines/2375470651
>
> 3. Analysing generated vmcore:
>
> 	# ls -lh /proc/vmcore
> 	-r--------    1 root     root        4.5G Mar 10 12:30 /proc/vmcore
>
> 	# file /proc/vmcore
> 	/proc/vmcore: ELF 64-bit LSB core file, 64-bit PowerPC or cisco 7500, OpenPOWER ELF V2 ABI, version 1 (SYSV), SVR4-style
>
> 	# crash vmlinux-38fec10eb60d-network vmcore-powernv-10mar26
> 	...
> 	      KERNEL: vmlinux-38fec10eb60d-network
> 	    DUMPFILE: vmcore-powernv-10mar26
> 	        CPUS: 2
> 	        DATE: Thu Jan  1 05:30:00 IST 1970
> 	      UPTIME: 00:00:50
> 	LOAD AVERAGE: 0.57, 0.19, 0.07
> 	       TASKS: 83
> 	    NODENAME: buildroot
> 	     RELEASE: 6.14.0
> 	     VERSION: #1 SMP Thu Apr  3 08:06:13 CDT 2025
> 	     MACHINE: ppc64le  (1000 Mhz)
> 	      MEMORY: 6 GB
> 	       PANIC: "Kernel panic - not syncing: sysrq triggered crash"
> 	         PID: 238
> 	     COMMAND: "sh"
> 	        TASK: c00000000a0f3200  [THREAD_INFO: c00000000a0f3200]
> 	         CPU: 0
> 	       STATE: TASK_RUNNING (PANIC)
>
> 	crash> # ps and kmem -i works
>
> Changelog
> ====================
>
> v4 -> v5:
> * #4/10: set chunk_id=0 before copying
> * #7/10: remove unnecessary bool check, ie. 'if (b1) b2=b1 else b2=!b1' => 'b2=b1'
>
> v3 -> v4:
> * #2/10: s/recieves/receives
> * #7/10: remove empty line at EOF
>
> v2 -> v3:
> * rebase to upstream, changes in patches below
> * #2/10: no code change. add comment that skiboot triggers S0
> * #3/10: stash command: handle invalid skiboot_base sent by guest
> * #4/10: s/src_len/data_len/
> * #4/10: use TARGET_FMT_lx/PRIx64 instead of %lx to prevent build errors
> * #4/10: stop copying chunks once copying a chunk fails
> * #5/10: use address_space_{read,write} instead of cpu_physical_memory_{read,write}
> * #5/10: add more SPRs to be saved, same set of SPRs as spapr FADump, except CR and FPSCR
> * #7/10: only export "mpipl-boot" property if preserving cpu states and writing MDRT was successful, otherwise continue with normal reboot
> * #7/10: use address_space_{read,write} instead of cpu_physical_memory_{read,write}
> * #8/10: reword commit description to mention fw-load-area, no code change
> * #10/10: add entry in MAINTAINERS file
>
> Aditya Gupta (10):
>    ppc/pnv: Move SBE host doorbell function to top of file
>    ppc/mpipl: Implement S0 SBE interrupt
>    ppc/pnv: Handle stash command in PowerNV SBE
>    pnv/mpipl: Preserve memory regions as per MDST/MDDT tables
>    pnv/mpipl: Preserve CPU registers after crash
>    pnv/mpipl: Set thread entry size to be allocated by firmware
>    pnv/mpipl: Write the preserved CPU and MDRT state
>    pnv/mpipl: Enable MPIPL support
>    tests/functional: Add test for MPIPL in PowerNV
>    MAINTAINERS: Add entry for MPIPL (PowerNV)
>
>   MAINTAINERS                           |   8 +
>   hw/ppc/meson.build                    |   1 +
>   hw/ppc/pnv.c                          |  98 ++++++
>   hw/ppc/pnv_mpipl.c                    | 482 ++++++++++++++++++++++++++
>   hw/ppc/pnv_sbe.c                      |  84 ++++-
>   include/hw/ppc/pnv.h                  |   7 +
>   include/hw/ppc/pnv_mpipl.h            | 168 +++++++++
>   tests/functional/ppc64/test_fadump.py |  35 +-
>   8 files changed, 852 insertions(+), 31 deletions(-)
>   create mode 100644 hw/ppc/pnv_mpipl.c
>   create mode 100644 include/hw/ppc/pnv_mpipl.h
>
> --
> 2.53.0
>
Re: [PATCH v5 00/10] Implement MPIPL for PowerNV
Posted by Aditya Gupta 1 month ago
On 10/03/26 20:20, Sourabh Jain wrote:

> Thanks for adding fadump support on PowerNV platform.
>
> The whole patch series looks good to me.
> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>

Thank you for the detailed reviews Sourabh !


- Aditya G
Re: [PATCH v5 00/10] Implement MPIPL for PowerNV
Posted by Shivang Upadhyay 1 month ago
On Tue, Mar 10, 2026 at 06:16:07PM +0530, Aditya Gupta wrote:
> Overview
> =========
> 
> Implemented MPIPL (Memory Preserving IPL, aka fadump) on PowerNV machine
> in QEMU.
> 
> Fadump is an alternative dump mechanism to kdump, in which we the firmware
> does a memory preserving boot, and the second/crashkernel is booted fresh
> like a normal system reset, instead of the crashed kernel loading the
> second/crashkernel in case of kdump.
> 
> MPIPL in PowerNV, is similar to fadump in Pseries. The idea is same, memory
> preserving, where in PowerNV we are assisted by SBE (Self Boot Engine) &
> Hostboot, while in Pseries we are assisted by PHyp (Power Hypervisor)
> 
> For implementing in baremetal/powernv QEMU, we need to export a
> "ibm,opal/dump" node in the device tree, to tell the kernel we support
> MPIPL
> 
> Once kernel sees the support, and "fadump=on" is passed on commandline,
> kernel will register memory regions to preserve with Skiboot.
> 
> Kernel sends these data using OPAL calls, after which skiboot/opal saves
> the memory region details to MDST and MDDT tables (S-source, D-destination)
> 
> Then in the event of a kernel crash, the kernel initiates MPIPL with another
> OPAL code (opal_cec_reboot2), this request goes to Skiboot.
> Skiboot then triggers the "S0 Interrupt" to the SBE (Self Boot Engine),
> along with OPAL's relocated base address.
> 
> SBE then stops all core clocks, and only does particular ISteps for a
> memory preserving boot.
> 
> Then, hostboot comes up, and with help of the relocated base address, it
> accesses MDST & MDDT tables (S-source and D-destination), and preserves the
> memory regions according to the data in these tables.
> And after preserving, it writes the preserved memory region details to MDRT
> tables (R-Result), for the kernel to know where/whether a memory region is
> preserved.
> 
> Both SBE's and hostboot responsiblities are implemented in the SBE code
> in QEMU.
> 
> Then in the second kernel/crashkernel boot, OPAL passes the "mpipl-boot"
> property for the kernel to know that a dump is active, which kernel then
> exports in /proc/vmcore
> 
> Testing
> ====================
> 
> 1. Git tree for testing: https://gitlab.com/adi-g15-ibm/qemu/tree/fadump-powernv-v5
> 
> 2. Gitlab pipeline: https://gitlab.com/adi-g15-ibm/qemu/-/pipelines/2375470651
> 
> 3. Analysing generated vmcore:
> 
> 	# ls -lh /proc/vmcore
> 	-r--------    1 root     root        4.5G Mar 10 12:30 /proc/vmcore
> 
> 	# file /proc/vmcore
> 	/proc/vmcore: ELF 64-bit LSB core file, 64-bit PowerPC or cisco 7500, OpenPOWER ELF V2 ABI, version 1 (SYSV), SVR4-style
> 
> 	# crash vmlinux-38fec10eb60d-network vmcore-powernv-10mar26
> 	...
> 	      KERNEL: vmlinux-38fec10eb60d-network
> 	    DUMPFILE: vmcore-powernv-10mar26
> 	        CPUS: 2
> 	        DATE: Thu Jan  1 05:30:00 IST 1970
> 	      UPTIME: 00:00:50
> 	LOAD AVERAGE: 0.57, 0.19, 0.07
> 	       TASKS: 83
> 	    NODENAME: buildroot
> 	     RELEASE: 6.14.0
> 	     VERSION: #1 SMP Thu Apr  3 08:06:13 CDT 2025
> 	     MACHINE: ppc64le  (1000 Mhz)
> 	      MEMORY: 6 GB
> 	       PANIC: "Kernel panic - not syncing: sysrq triggered crash"
> 	         PID: 238
> 	     COMMAND: "sh"
> 	        TASK: c00000000a0f3200  [THREAD_INFO: c00000000a0f3200]
> 	         CPU: 0
> 	       STATE: TASK_RUNNING (PANIC)
> 
> 	crash> # ps and kmem -i works
buildroot fadump test and `make check-functional-ppc64` are passing on V5.

	Welcome to Buildroot
	buildroot login: root
	# dmesg | grep fadump
	[    0.000000][    T0] opal fadump: Kernel metadata addr: 653902a8
	[    0.000000][    T0] fadump: Reserved 768MB of memory at 0x00000035390000 (System RAM: 5120MB)
	[    0.000000][    T0] fadump: Initialized [0x36000000, 752MB] cma area from [0x35390000, 768MB] bytes of memory reserved for firmware-assisted dump
	[    0.000000][    T0] Kernel command line: console=hvc0 rootwait root=/dev/nvme0n1 fadump=on
	[    0.473711][    T1] opal fadump: Registration is successful!
	# echo c > /proc/sysrq-trigger

	<snip />

	Welcome to Buildroot
	buildroot login: root
	# ls -alh /proc/vmcore
	-r--------    1 root     root        4.3G Mar 10 14:07 /proc/vmcore
	#


	> make check-functional-ppc64
	<snip />
	16/16 func-thorough+func-ppc64-thorough+thorough - qemu:func-ppc64-fadump                OK               65.42s   2 subtests passed

	Ok:                9
	Fail:              0
	Skipped:           7


Tested-by: Shivang Upadhyay <shivangu@linux.ibm.com>


~Shivang.
Re: [PATCH v5 00/10] Implement MPIPL for PowerNV
Posted by Aditya Gupta 1 month ago
On 10/03/26 19:42, Shivang Upadhyay wrote:
> On Tue, Mar 10, 2026 at 06:16:07PM +0530, Aditya Gupta wrote:
>> <...snip...>
> buildroot fadump test and `make check-functional-ppc64` are passing on V5.
>
> 	Welcome to Buildroot
> 	buildroot login: root
> 	# dmesg | grep fadump
> 	[    0.000000][    T0] opal fadump: Kernel metadata addr: 653902a8
> 	[    0.000000][    T0] fadump: Reserved 768MB of memory at 0x00000035390000 (System RAM: 5120MB)
> 	[    0.000000][    T0] fadump: Initialized [0x36000000, 752MB] cma area from [0x35390000, 768MB] bytes of memory reserved for firmware-assisted dump
> 	[    0.000000][    T0] Kernel command line: console=hvc0 rootwait root=/dev/nvme0n1 fadump=on
> 	[    0.473711][    T1] opal fadump: Registration is successful!
> 	# echo c > /proc/sysrq-trigger
>
> 	<snip />
>
> 	Welcome to Buildroot
> 	buildroot login: root
> 	# ls -alh /proc/vmcore
> 	-r--------    1 root     root        4.3G Mar 10 14:07 /proc/vmcore
> 	#
>
>
> 	> make check-functional-ppc64
> 	<snip />
> 	16/16 func-thorough+func-ppc64-thorough+thorough - qemu:func-ppc64-fadump                OK               65.42s   2 subtests passed
>
> 	Ok:                9
> 	Fail:              0
> 	Skipped:           7
>
>
> Tested-by: Shivang Upadhyay <shivangu@linux.ibm.com>

Thank you for testing this series, shivang !


- Aditya G

>
> ~Shivang.