[PATCH] migration/vmstate: replace NULL pointer assert with proper error

Trieu Huynh posted 1 patch 1 month, 1 week ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20260421155219.14506-1-viking4@gmail.com
Maintainers: Peter Xu <peterx@redhat.com>, Fabiano Rosas <farosas@suse.de>
migration/vmstate.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
[PATCH] migration/vmstate: replace NULL pointer assert with proper error
Posted by Trieu Huynh 1 month, 1 week ago
From: Trieu Huynh <vikingtc4@gmail.com>

vmstate_load_state() contains an assert added in commit
("migration/vmstate: split up vmstate_base_addr" [1])
to catch NULL pointer dereferences early:

  assert(first_elem || !n_elems || !size);

However, the assert fires as an unrecoverable crash in a valid user
scenario, for instance: loading a snapshot saved with -icount auto
into a QEMU started without -icount. The "timer/icount/warp_timer"
subsection is present in the stream, but icount_warp_timer is NULL
because icount was never initialized. QEMU aborts with:

  qemu-system-x86_64: ../migration/vmstate.c:185:
  vmstate_load_state: Assertion `first_elem || !n_elems || !size'
  failed.

Replace the assert with a proper error_setg() and -EINVAL return so
QEMU reports a clear diagnostic instead of crashing.

Also remove the error_prepend() in vmstate_subsection_load() that
wrapped the error at each subsection level, producing a redundant and
noisy error chain. The original error message is now propagated as-is.

[1] https://lore.kernel.org/qemu-devel/20170228124056.5074-4-dgilbert@redhat.com/
Resolves: https://gitlab.com/qemu-project/qemu/-/work_items/2736

Signed-off-by: Trieu Huynh <vikingtc4@gmail.com>
---
Steps to reproduce:
* 1. Start src with icount config:
./qemu-system-x86_64 \
  -icount auto \
  -<other_options>

* 2. Save snapshot
(qemu) savevm snap1

* 3. Start dest without icount config:
./qemu-system-x86_64 \
  -<other_options>

* 4. Load snapshot
* As-is:
~ # QEMU 10.2.94 monitor - type 'help' for more information
(qemu) loadvm snap1
qemu-system-x86_64: ../migration/vmstate.c:185: vmstate_load_state:
Assertion `first_elem || !n_elems || !size' failed.
Aborted (core dumped)

* To-be:
** Before removing error_prepend()
~ # QEMU 10.2.94 monitor - type 'help' for more information
(qemu) loadvm snap1
Error: error while loading state for instance 0x0 of device 'timer':
Loading VM subsection 'timer/icount' in 'timer' failed: -22:
Loading VM subsection 'timer/icount/warp_timer' in 'timer/icount' failed: -22:
VM load of device 'timer/icount/warp_timer': field 'icount_warp_timer'
is a NULL pointer but snapshot contains data for it.

** After removing error_prepend()
~ # QEMU 10.2.94 monitor - type 'help' for more information
(qemu) loadvm snap1
Error: error while loading state for instance 0x0 of device 'timer':
VM load of device 'timer/icount/warp_timer': field 'icount_warp_timer'
is a NULL pointer but snapshot contains data for it.

 migration/vmstate.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/migration/vmstate.c b/migration/vmstate.c
index 4d28364f7b..32ecc8ed35 100644
--- a/migration/vmstate.c
+++ b/migration/vmstate.c
@@ -182,7 +182,13 @@ int vmstate_load_state(QEMUFile *f, const VMStateDescription *vmsd,
             vmstate_handle_alloc(first_elem, field, opaque);
             if (field->flags & VMS_POINTER) {
                 first_elem = *(void **)first_elem;
-                assert(first_elem || !n_elems || !size);
+                if (!first_elem && n_elems && size) {
+                    error_setg(errp,
+                               "VM load of device '%s': field '%s' is a NULL "
+                               "pointer but snapshot contains data for it.",
+                               vmsd->name, field->name);
+                    return -EINVAL;
+                }
             }
             for (i = 0; i < n_elems; i++) {
                 void *curr_elem = first_elem + size * i;
@@ -651,9 +657,6 @@ static int vmstate_subsection_load(QEMUFile *f, const VMStateDescription *vmsd,
         ret = vmstate_load_state(f, sub_vmsd, opaque, version_id, errp);
         if (ret) {
             trace_vmstate_subsection_load_bad(vmsd->name, idstr, "(child)");
-            error_prepend(errp,
-                          "Loading VM subsection '%s' in '%s' failed: %d: ",
-                          idstr, vmsd->name, ret);
             return ret;
         }
     }
-- 
2.43.0
Re: [PATCH] migration/vmstate: replace NULL pointer assert with proper error
Posted by Peter Xu 1 month, 1 week ago
On Tue, Apr 21, 2026 at 10:52:19PM +0700, Trieu Huynh wrote:
> From: Trieu Huynh <vikingtc4@gmail.com>
> 
> vmstate_load_state() contains an assert added in commit
> ("migration/vmstate: split up vmstate_base_addr" [1])
> to catch NULL pointer dereferences early:
> 
>   assert(first_elem || !n_elems || !size);
> 
> However, the assert fires as an unrecoverable crash in a valid user
> scenario, for instance: loading a snapshot saved with -icount auto
> into a QEMU started without -icount. The "timer/icount/warp_timer"
> subsection is present in the stream, but icount_warp_timer is NULL
> because icount was never initialized. QEMU aborts with:
> 
>   qemu-system-x86_64: ../migration/vmstate.c:185:
>   vmstate_load_state: Assertion `first_elem || !n_elems || !size'
>   failed.
> 
> Replace the assert with a proper error_setg() and -EINVAL return so
> QEMU reports a clear diagnostic instead of crashing.
> 
> Also remove the error_prepend() in vmstate_subsection_load() that
> wrapped the error at each subsection level, producing a redundant and
> noisy error chain. The original error message is now propagated as-is.
> 
> [1] https://lore.kernel.org/qemu-devel/20170228124056.5074-4-dgilbert@redhat.com/
> Resolves: https://gitlab.com/qemu-project/qemu/-/work_items/2736
> 
> Signed-off-by: Trieu Huynh <vikingtc4@gmail.com>
> ---
> Steps to reproduce:
> * 1. Start src with icount config:
> ./qemu-system-x86_64 \
>   -icount auto \
>   -<other_options>
> 
> * 2. Save snapshot
> (qemu) savevm snap1
> 
> * 3. Start dest without icount config:
> ./qemu-system-x86_64 \
>   -<other_options>
> 
> * 4. Load snapshot
> * As-is:
> ~ # QEMU 10.2.94 monitor - type 'help' for more information
> (qemu) loadvm snap1
> qemu-system-x86_64: ../migration/vmstate.c:185: vmstate_load_state:
> Assertion `first_elem || !n_elems || !size' failed.
> Aborted (core dumped)
> 
> * To-be:
> ** Before removing error_prepend()
> ~ # QEMU 10.2.94 monitor - type 'help' for more information
> (qemu) loadvm snap1
> Error: error while loading state for instance 0x0 of device 'timer':
> Loading VM subsection 'timer/icount' in 'timer' failed: -22:
> Loading VM subsection 'timer/icount/warp_timer' in 'timer/icount' failed: -22:
> VM load of device 'timer/icount/warp_timer': field 'icount_warp_timer'
> is a NULL pointer but snapshot contains data for it.
> 
> ** After removing error_prepend()
> ~ # QEMU 10.2.94 monitor - type 'help' for more information
> (qemu) loadvm snap1
> Error: error while loading state for instance 0x0 of device 'timer':
> VM load of device 'timer/icount/warp_timer': field 'icount_warp_timer'
> is a NULL pointer but snapshot contains data for it.

Thanks again for help looking into QEMU opened issues, Trieu.

I had a feeling the user was requesting for successful migrations with
modified -icount parameters, rather than a prettier failure.  In that case
this patch won't be enough.

I've added a comment in the issue to check with reporter.  If we need that
feature, we may want to check with TCG developers on whether it'll work
out.

PS: we shouldn't mention "snapshot" in the error message, because it can
also be a live migration that triggers this path.

Thanks,

-- 
Peter Xu