[PATCH v6 00/10] Implement MPIPL for PowerNV

Aditya Gupta posted 10 patches 2 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20260424083837.214947-1-adityag@linux.ibm.com
Maintainers: Nicholas Piggin <npiggin@gmail.com>, Chinmay Rath <rathc@linux.ibm.com>, Glenn Miles <milesg@linux.ibm.com>, Aditya Gupta <adityag@linux.ibm.com>, Hari Bathini <hbathini@linux.ibm.com>, Sourabh <sourabhjain@linux.ibm.com>
MAINTAINERS                           |   9 +
hw/ppc/meson.build                    |   1 +
hw/ppc/pnv.c                          | 108 +++++-
hw/ppc/pnv_mpipl.c                    | 482 ++++++++++++++++++++++++++
hw/ppc/pnv_sbe.c                      |  85 ++++-
include/hw/ppc/pnv.h                  |   6 +
include/hw/ppc/pnv_mpipl.h            | 168 +++++++++
tests/functional/ppc64/test_fadump.py |  35 +-
8 files changed, 862 insertions(+), 32 deletions(-)
create mode 100644 hw/ppc/pnv_mpipl.c
create mode 100644 include/hw/ppc/pnv_mpipl.h
[PATCH v6 00/10] Implement MPIPL for PowerNV
Posted by Aditya Gupta 2 weeks ago
Overview
=========

Implemented MPIPL (Memory Preserving IPL, aka fadump) on PowerNV machine
in QEMU.

Fadump is an alternative dump mechanism to kdump, in which we the firmware
does a memory preserving boot, and the second/crashkernel is booted fresh
like a normal system reset, instead of the crashed kernel loading the
second/crashkernel in case of kdump.

MPIPL in PowerNV, is similar to fadump in Pseries. The idea is same, memory
preserving, where in PowerNV we are assisted by SBE (Self Boot Engine) &
Hostboot, while in Pseries we are assisted by PHyp (Power Hypervisor)

For implementing in baremetal/powernv QEMU, we need to export a
"ibm,opal/dump" node in the device tree, to tell the kernel we support
MPIPL

Once kernel sees the support, and "fadump=on" is passed on commandline,
kernel will register memory regions to preserve with Skiboot.

Kernel sends these data using OPAL calls, after which skiboot/opal saves
the memory region details to MDST and MDDT tables (S-source, D-destination)

Then in the event of a kernel crash, the kernel initiates MPIPL with another
OPAL code (opal_cec_reboot2), this request goes to Skiboot.
Skiboot then triggers the "S0 Interrupt" to the SBE (Self Boot Engine),
along with OPAL's relocated base address.

SBE then stops all core clocks, and only does particular ISteps for a
memory preserving boot.

Then, hostboot comes up, and with help of the relocated base address, it
accesses MDST & MDDT tables (S-source and D-destination), and preserves the
memory regions according to the data in these tables.
And after preserving, it writes the preserved memory region details to MDRT
tables (R-Result), for the kernel to know where/whether a memory region is
preserved.

Both SBE's and hostboot responsiblities are implemented in the SBE code
in QEMU.

Then in the second kernel/crashkernel boot, OPAL passes the "mpipl-boot"
property for the kernel to know that a dump is active, which kernel then
exports in /proc/vmcore

Testing
====================

1. Git tree for testing: https://gitlab.com/adi-g15-ibm/qemu/tree/fadump-powernv-v6

2. Gitlab pipeline: https://gitlab.com/adi-g15-ibm/qemu/-/pipelines/2476290852

3. QEMU ppc64 functional tests:
    Ok:                13  
    Fail:              0   
    Skipped:           3   

4. Analysing generated vmcore:

    # file /proc/vmcore
    /proc/vmcore: ELF 64-bit LSB core file, 64-bit PowerPC or cisco 7500, OpenPOWER ELF V2 ABI, version 1 (SYSV), SVR4-style

    # crash vmlinux-38fec10eb60d-network vmcore-powernv-24apr26
    ...
          KERNEL: vmlinux-38fec10eb60d-network
        DUMPFILE: vmcore-powernv-24apr26
            CPUS: 2
            DATE: Thu Jan  1 05:30:00 IST 1970
          UPTIME: 00:01:38
    LOAD AVERAGE: 0.13, 0.08, 0.03
           TASKS: 84
        NODENAME: buildroot
         RELEASE: 6.14.0
         VERSION: #1 SMP Thu Apr  3 08:06:13 CDT 2025
         MACHINE: ppc64le  (1000 Mhz)
          MEMORY: 6 GB
           PANIC: "Kernel panic - not syncing: sysrq triggered crash"
             PID: 239
         COMMAND: "sh"
            TASK: c00000000933cb00  [THREAD_INFO: c00000000933cb00]
             CPU: 0
           STATE: TASK_RUNNING (PANIC)
    
    crash> bt
    PID: 239      TASK: c00000000933cb00  CPU: 0    COMMAND: "sh"
     R0:  c000000000059604    R1:  c0000000093df9c0    R2:  c0000000020eaa00
     R3:  c0000000bffffff8    R4:  c0000000071e0dc2    R5:  0000000000000002
     R6:  5245424d554e0a32    R7:  4d5f584944415228    R8:  454b0a313d29554d
     R9:  0000000000000000    R10: 4152430a303d5445    R11: 313d454d49544853
     R12: 3837383030373737    R13: c00000017ffff480    R14: 00000001172c0460
     R15: 0000000000000000    R16: 0000000000000000    R17: 00007fffe17bac78
     R18: 000000012cd00700    R19: 0000000000000000    R20: 000000012cd006f8
     R21: fcffffffffffffff    R22: c00000000933cb00    R23: 0000000000000000
     R24: a8aaaaaaaaaaaaaa    R25: c000000003bb7eb8    R26: c000000003e5eab8
     R27: c0000000c0000000    R28: 0000000000000000    R29: 0000000000000002
     R30: c000000003e5eab8    R31: c000000003e16b30
     NIP: c000000000059720    MSR: 9000000000001033    OR3: 0000000000000000
     CTR: 0000000000000000    LR:  c00000000002f69c    XER: 0000000020040006
     CCR: 0000000028002202    MQ:  0000000000000003    DAR: 0000000000000000
     DSISR: 0000000000000000     Syscall Result: 0000000000000000
     [NIP  : crash_fadump+560]
     [LR   : ppc_panic_fadump_handler+84]
     #0 [c0000000093df9c0] crash_fadump at c00000000005966c
     #1 [c0000000093dfa20] ppc_panic_fadump_handler at c00000000002f69c
     #2 [c0000000093dfa40] notifier_call_chain at c0000000001ab390
     #3 [c0000000093dfaa0] atomic_notifier_call_chain at c0000000001ab4a4
     #4 [c0000000093dfac0] panic at c000000000163598
     #5 [c0000000093dfb60] sysrq_handle_crash at c000000000beafd4
     #6 [c0000000093dfbc0] __handle_sysrq at c000000000beb9b4
     #7 [c0000000093dfc60] write_sysrq_trigger at c000000000bec34c
     #8 [c0000000093dfce0] proc_reg_write at c00000000071027c
     #9 [c0000000093dfd10] vfs_write at c000000000620a28
    #10 [c0000000093dfdc0] ksys_write at c000000000621010
    #11 [c0000000093dfe10] system_call_exception at c0000000000324b8
    #12 [c0000000093dfe50] system_call_vectored_common at c00000000000bff0
    crash> kmem -i
                     PAGES        TOTAL      PERCENTAGE
        TOTAL MEM    92396       5.6 GB         ----
             FREE    89710       5.5 GB   97% of TOTAL MEM
             USED     2686     167.9 MB    2% of TOTAL MEM
          BUFFERS        0            0    0% of TOTAL MEM
           CACHED     1500      93.8 MB    1% of TOTAL MEM
             SLAB      674      42.1 MB    0% of TOTAL MEM
    
       TOTAL HUGE        0            0         ----
        HUGE FREE        0            0    0% of TOTAL HUGE
    
       TOTAL SWAP        0            0         ----
        SWAP USED        0            0    0% of TOTAL SWAP
        SWAP FREE        0            0    0% of TOTAL SWAP
    
     COMMIT LIMIT    46198       2.8 GB         ----
        COMMITTED        0            0    0% of TOTAL LIMIT
	crash> # ps and kmem -i works

Changelog
====================

v5 -> v6:
* rebased to upstream commit bb230769b4d01de714bed686161ad39a8f4f3fd1
* #2,3,4: remove osdep.h include from .h file
* #3: use RAM_ADDR_FMT format for ram size
* #4,5: replace '__packed' with 'QEMU_PACKED'
* #8: add pnv_dt_create in pnv_reset, to create dt after mpipl
* #10: add sourabh as reviewer

v4 -> v5:
* #4/10: set chunk_id=0 before copying
* #7/10: remove unnecessary bool check, ie. 'if (b1) b2=b1 else b2=!b1' => 'b2=b1'

v3 -> v4:
* #2/10: s/recieves/receives
* #7/10: remove empty line at EOF

v2 -> v3:
* rebase to upstream, changes in patches below
* #2/10: no code change. add comment that skiboot triggers S0
* #3/10: stash command: handle invalid skiboot_base sent by guest
* #4/10: s/src_len/data_len/
* #4/10: use TARGET_FMT_lx/PRIx64 instead of %lx to prevent build errors
* #4/10: stop copying chunks once copying a chunk fails
* #5/10: use address_space_{read,write} instead of cpu_physical_memory_{read,write}
* #5/10: add more SPRs to be saved, same set of SPRs as spapr FADump, except CR and FPSCR
* #7/10: only export "mpipl-boot" property if preserving cpu states and writing MDRT was successful, otherwise continue with normal reboot
* #7/10: use address_space_{read,write} instead of cpu_physical_memory_{read,write}
* #8/10: reword commit description to mention fw-load-area, no code change
* #10/10: add entry in MAINTAINERS file

Aditya Gupta (10):
  ppc/pnv: Move SBE host doorbell function to top of file
  ppc/mpipl: Implement S0 SBE interrupt
  ppc/pnv: Handle stash command in PowerNV SBE
  pnv/mpipl: Preserve memory regions as per MDST/MDDT tables
  pnv/mpipl: Preserve CPU registers after crash
  pnv/mpipl: Set thread entry size to be allocated by firmware
  pnv/mpipl: Write the preserved CPU and MDRT state
  pnv/mpipl: Enable MPIPL support
  tests/functional: Add test for MPIPL in PowerNV
  MAINTAINERS: Add entry for MPIPL (PowerNV)

 MAINTAINERS                           |   9 +
 hw/ppc/meson.build                    |   1 +
 hw/ppc/pnv.c                          | 108 +++++-
 hw/ppc/pnv_mpipl.c                    | 482 ++++++++++++++++++++++++++
 hw/ppc/pnv_sbe.c                      |  85 ++++-
 include/hw/ppc/pnv.h                  |   6 +
 include/hw/ppc/pnv_mpipl.h            | 168 +++++++++
 tests/functional/ppc64/test_fadump.py |  35 +-
 8 files changed, 862 insertions(+), 32 deletions(-)
 create mode 100644 hw/ppc/pnv_mpipl.c
 create mode 100644 include/hw/ppc/pnv_mpipl.h

-- 
2.53.0
Re: [PATCH v6 00/10] Implement MPIPL for PowerNV
Posted by Shivang Upadhyay 1 week, 3 days ago
On Fri, 2026-04-24 at 14:08 +0530, Aditya Gupta wrote:
> Overview
> =========
> 
> Implemented MPIPL (Memory Preserving IPL, aka fadump) on PowerNV
> machine
> in QEMU.
> 
> Fadump is an alternative dump mechanism to kdump, in which we the
> firmware
> does a memory preserving boot, and the second/crashkernel is booted
> fresh
> like a normal system reset, instead of the crashed kernel loading the
> second/crashkernel in case of kdump.
> 
> MPIPL in PowerNV, is similar to fadump in Pseries. The idea is same,
> memory
> preserving, where in PowerNV we are assisted by SBE (Self Boot
> Engine) &
> Hostboot, while in Pseries we are assisted by PHyp (Power Hypervisor)
> 
> For implementing in baremetal/powernv QEMU, we need to export a
> "ibm,opal/dump" node in the device tree, to tell the kernel we
> support
> MPIPL
> 
> Once kernel sees the support, and "fadump=on" is passed on
> commandline,
> kernel will register memory regions to preserve with Skiboot.
> 
> Kernel sends these data using OPAL calls, after which skiboot/opal
> saves
> the memory region details to MDST and MDDT tables (S-source, D-
> destination)
> 
> Then in the event of a kernel crash, the kernel initiates MPIPL with
> another
> OPAL code (opal_cec_reboot2), this request goes to Skiboot.
> Skiboot then triggers the "S0 Interrupt" to the SBE (Self Boot
> Engine),
> along with OPAL's relocated base address.
> 
> SBE then stops all core clocks, and only does particular ISteps for a
> memory preserving boot.
> 
> Then, hostboot comes up, and with help of the relocated base address,
> it
> accesses MDST & MDDT tables (S-source and D-destination), and
> preserves the
> memory regions according to the data in these tables.
> And after preserving, it writes the preserved memory region details
> to MDRT
> tables (R-Result), for the kernel to know where/whether a memory
> region is
> preserved.
> 
> Both SBE's and hostboot responsiblities are implemented in the SBE
> code
> in QEMU.
> 
> Then in the second kernel/crashkernel boot, OPAL passes the "mpipl-
> boot"
> property for the kernel to know that a dump is active, which kernel
> then
> exports in /proc/vmcore
> 
> Testing
> ====================
> 
> 1. Git tree for testing:
> https://gitlab.com/adi-g15-ibm/qemu/tree/fadump-powernv-v6
> 
> 2. Gitlab pipeline:
> https://gitlab.com/adi-g15-ibm/qemu/-/pipelines/2476290852
> 
> 3. QEMU ppc64 functional tests:
>     Ok:                13  
>     Fail:              0   
>     Skipped:           3   
> 
> 4. Analysing generated vmcore:
> 
>     # file /proc/vmcore
>     /proc/vmcore: ELF 64-bit LSB core file, 64-bit PowerPC or cisco
> 7500, OpenPOWER ELF V2 ABI, version 1 (SYSV), SVR4-style
> 
>     # crash vmlinux-38fec10eb60d-network vmcore-powernv-24apr26
>     ...
>           KERNEL: vmlinux-38fec10eb60d-network
>         DUMPFILE: vmcore-powernv-24apr26
>             CPUS: 2
>             DATE: Thu Jan  1 05:30:00 IST 1970
>           UPTIME: 00:01:38
>     LOAD AVERAGE: 0.13, 0.08, 0.03
>            TASKS: 84
>         NODENAME: buildroot
>          RELEASE: 6.14.0
>          VERSION: #1 SMP Thu Apr  3 08:06:13 CDT 2025
>          MACHINE: ppc64le  (1000 Mhz)
>           MEMORY: 6 GB
>            PANIC: "Kernel panic - not syncing: sysrq triggered crash"
>              PID: 239
>          COMMAND: "sh"
>             TASK: c00000000933cb00  [THREAD_INFO: c00000000933cb00]
>              CPU: 0
>            STATE: TASK_RUNNING (PANIC)
>     
>     crash> bt
>     PID: 239      TASK: c00000000933cb00  CPU: 0    COMMAND: "sh"
>      R0:  c000000000059604    R1:  c0000000093df9c0    R2: 
> c0000000020eaa00
>      R3:  c0000000bffffff8    R4:  c0000000071e0dc2    R5: 
> 0000000000000002
>      R6:  5245424d554e0a32    R7:  4d5f584944415228    R8: 
> 454b0a313d29554d
>      R9:  0000000000000000    R10: 4152430a303d5445    R11:
> 313d454d49544853
>      R12: 3837383030373737    R13: c00000017ffff480    R14:
> 00000001172c0460
>      R15: 0000000000000000    R16: 0000000000000000    R17:
> 00007fffe17bac78
>      R18: 000000012cd00700    R19: 0000000000000000    R20:
> 000000012cd006f8
>      R21: fcffffffffffffff    R22: c00000000933cb00    R23:
> 0000000000000000
>      R24: a8aaaaaaaaaaaaaa    R25: c000000003bb7eb8    R26:
> c000000003e5eab8
>      R27: c0000000c0000000    R28: 0000000000000000    R29:
> 0000000000000002
>      R30: c000000003e5eab8    R31: c000000003e16b30
>      NIP: c000000000059720    MSR: 9000000000001033    OR3:
> 0000000000000000
>      CTR: 0000000000000000    LR:  c00000000002f69c    XER:
> 0000000020040006
>      CCR: 0000000028002202    MQ:  0000000000000003    DAR:
> 0000000000000000
>      DSISR: 0000000000000000     Syscall Result: 0000000000000000
>      [NIP  : crash_fadump+560]
>      [LR   : ppc_panic_fadump_handler+84]
>      #0 [c0000000093df9c0] crash_fadump at c00000000005966c
>      #1 [c0000000093dfa20] ppc_panic_fadump_handler at
> c00000000002f69c
>      #2 [c0000000093dfa40] notifier_call_chain at c0000000001ab390
>      #3 [c0000000093dfaa0] atomic_notifier_call_chain at
> c0000000001ab4a4
>      #4 [c0000000093dfac0] panic at c000000000163598
>      #5 [c0000000093dfb60] sysrq_handle_crash at c000000000beafd4
>      #6 [c0000000093dfbc0] __handle_sysrq at c000000000beb9b4
>      #7 [c0000000093dfc60] write_sysrq_trigger at c000000000bec34c
>      #8 [c0000000093dfce0] proc_reg_write at c00000000071027c
>      #9 [c0000000093dfd10] vfs_write at c000000000620a28
>     #10 [c0000000093dfdc0] ksys_write at c000000000621010
>     #11 [c0000000093dfe10] system_call_exception at c0000000000324b8
>     #12 [c0000000093dfe50] system_call_vectored_common at
> c00000000000bff0
>     crash> kmem -i
>                      PAGES        TOTAL      PERCENTAGE
>         TOTAL MEM    92396       5.6 GB         ----
>              FREE    89710       5.5 GB   97% of TOTAL MEM
>              USED     2686     167.9 MB    2% of TOTAL MEM
>           BUFFERS        0            0    0% of TOTAL MEM
>            CACHED     1500      93.8 MB    1% of TOTAL MEM
>              SLAB      674      42.1 MB    0% of TOTAL MEM
>     
>        TOTAL HUGE        0            0         ----
>         HUGE FREE        0            0    0% of TOTAL HUGE
>     
>        TOTAL SWAP        0            0         ----
>         SWAP USED        0            0    0% of TOTAL SWAP
>         SWAP FREE        0            0    0% of TOTAL SWAP
>     
>      COMMIT LIMIT    46198       2.8 GB         ----
>         COMMITTED        0            0    0% of TOTAL LIMIT
> 	crash> # ps and kmem -i works
> 
> Changelog
> ====================
> 
> v5 -> v6:
> * rebased to upstream commit bb230769b4d01de714bed686161ad39a8f4f3fd1
> * #2,3,4: remove osdep.h include from .h file
> * #3: use RAM_ADDR_FMT format for ram size
> * #4,5: replace '__packed' with 'QEMU_PACKED'
> * #8: add pnv_dt_create in pnv_reset, to create dt after mpipl
> * #10: add sourabh as reviewer
> 
> v4 -> v5:
> * #4/10: set chunk_id=0 before copying
> * #7/10: remove unnecessary bool check, ie. 'if (b1) b2=b1 else
> b2=!b1' => 'b2=b1'
> 
> v3 -> v4:
> * #2/10: s/recieves/receives
> * #7/10: remove empty line at EOF
> 
> v2 -> v3:
> * rebase to upstream, changes in patches below
> * #2/10: no code change. add comment that skiboot triggers S0
> * #3/10: stash command: handle invalid skiboot_base sent by guest
> * #4/10: s/src_len/data_len/
> * #4/10: use TARGET_FMT_lx/PRIx64 instead of %lx to prevent build
> errors
> * #4/10: stop copying chunks once copying a chunk fails
> * #5/10: use address_space_{read,write} instead of
> cpu_physical_memory_{read,write}
> * #5/10: add more SPRs to be saved, same set of SPRs as spapr FADump,
> except CR and FPSCR
> * #7/10: only export "mpipl-boot" property if preserving cpu states
> and writing MDRT was successful, otherwise continue with normal
> reboot
> * #7/10: use address_space_{read,write} instead of
> cpu_physical_memory_{read,write}
> * #8/10: reword commit description to mention fw-load-area, no code
> change
> * #10/10: add entry in MAINTAINERS file
> 
> Aditya Gupta (10):
>   ppc/pnv: Move SBE host doorbell function to top of file
>   ppc/mpipl: Implement S0 SBE interrupt
>   ppc/pnv: Handle stash command in PowerNV SBE
>   pnv/mpipl: Preserve memory regions as per MDST/MDDT tables
>   pnv/mpipl: Preserve CPU registers after crash
>   pnv/mpipl: Set thread entry size to be allocated by firmware
>   pnv/mpipl: Write the preserved CPU and MDRT state
>   pnv/mpipl: Enable MPIPL support
>   tests/functional: Add test for MPIPL in PowerNV
>   MAINTAINERS: Add entry for MPIPL (PowerNV)
> 
>  MAINTAINERS                           |   9 +
>  hw/ppc/meson.build                    |   1 +
>  hw/ppc/pnv.c                          | 108 +++++-
>  hw/ppc/pnv_mpipl.c                    | 482
> ++++++++++++++++++++++++++
>  hw/ppc/pnv_sbe.c                      |  85 ++++-
>  include/hw/ppc/pnv.h                  |   6 +
>  include/hw/ppc/pnv_mpipl.h            | 168 +++++++++
>  tests/functional/ppc64/test_fadump.py |  35 +-
>  8 files changed, 862 insertions(+), 32 deletions(-)
>  create mode 100644 hw/ppc/pnv_mpipl.c
>  create mode 100644 include/hw/ppc/pnv_mpipl.h

Hi Aditya,

I've tested this patch series. 
- fadump vmcore collection.
- make check-functional
	Ok:                9   
	Fail:              0   
	Skipped:           7  	

Tested-by: Shivang Upadhyay <shivangu@linux.ibm.com>

Regards
~Shivang.