[PATCH] hw/ppc/e500: fix broken snapshot replay

Maksim Kostin posted 1 patch 9 months, 2 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20230809100733.32189-1-maksim.kostin@ispras.ru
hw/ppc/e500.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH] hw/ppc/e500: fix broken snapshot replay
Posted by Maksim Kostin 9 months, 2 weeks ago
ppce500_reset_device_tree is registered for system reset, but after
c4b075318eb1 this function rerandomizes rng-seed via
qemu_guest_getrandom_nofail. And when loading a snapshot, it tries to read
EVENT_RANDOM that doesn't exist, so we have an error:

  qemu-system-ppc: Missing random event in the replay log

To fix this, use qemu_register_reset_nosnapshotload instead of
qemu_register_reset.

Reported-by: Vitaly Cheptsov <cheptsov@ispras.ru>
Fixes: c4b075318eb1 ("hw/ppc: pass random seed to fdt ")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1634
Signed-off-by: Maksim Kostin <maksim.kostin@ispras.ru>
---
 hw/ppc/e500.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c
index 67793a86f1..d5b6820d1d 100644
--- a/hw/ppc/e500.c
+++ b/hw/ppc/e500.c
@@ -712,7 +712,7 @@ static int ppce500_prep_device_tree(PPCE500MachineState *machine,
     p->kernel_base = kernel_base;
     p->kernel_size = kernel_size;
 
-    qemu_register_reset(ppce500_reset_device_tree, p);
+    qemu_register_reset_nosnapshotload(ppce500_reset_device_tree, p);
     p->notifier.notify = ppce500_init_notify;
     qemu_add_machine_init_done_notifier(&p->notifier);
 
-- 
2.34.1
Re: [PATCH] hw/ppc/e500: fix broken snapshot replay
Posted by Nicholas Piggin 9 months, 2 weeks ago
On Wed Aug 9, 2023 at 8:07 PM AEST, Maksim Kostin wrote:
> ppce500_reset_device_tree is registered for system reset, but after
> c4b075318eb1 this function rerandomizes rng-seed via
> qemu_guest_getrandom_nofail. And when loading a snapshot, it tries to read
> EVENT_RANDOM that doesn't exist, so we have an error:
>
>   qemu-system-ppc: Missing random event in the replay log
>
> To fix this, use qemu_register_reset_nosnapshotload instead of
> qemu_register_reset.

This is the same issue that spapr machine hit, so that looks good.

But is there a problem that the device tree can change after the
machine reset? In that case your snapshot could resume with a
different device tree and replay will diverge.

It looks like software could just overwrite the device tree value
in memory. That seems to be why it's rebuilt at reset time. But
maybe you could just copy the machine->fdt again.

There is also qemu_fdt_randomize_seeds that some archs use that
we might be able to use for this, if it helps.

But this is better than nothing and probably a minimal fix, so
probably good to go upstream before more complicated changes.

Thanks,
Nick

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

>
> Reported-by: Vitaly Cheptsov <cheptsov@ispras.ru>
> Fixes: c4b075318eb1 ("hw/ppc: pass random seed to fdt ")
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1634
> Signed-off-by: Maksim Kostin <maksim.kostin@ispras.ru>
> ---
>  hw/ppc/e500.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c
> index 67793a86f1..d5b6820d1d 100644
> --- a/hw/ppc/e500.c
> +++ b/hw/ppc/e500.c
> @@ -712,7 +712,7 @@ static int ppce500_prep_device_tree(PPCE500MachineState *machine,
>      p->kernel_base = kernel_base;
>      p->kernel_size = kernel_size;
>  
> -    qemu_register_reset(ppce500_reset_device_tree, p);
> +    qemu_register_reset_nosnapshotload(ppce500_reset_device_tree, p);
>      p->notifier.notify = ppce500_init_notify;
>      qemu_add_machine_init_done_notifier(&p->notifier);
>