migration: Fix writing mapped_ram + ignore_shared snapshots

[PATCH] migration: Fix writing mapped_ram + ignore_shared snapshots

Posted by Pawel Zmarzly 2 months, 1 week ago

Currently if you set these flags and have any shared memory object, saving
a snapshot will fail with:

    Failed to write bitmap to file: Unable to write to file: Bad address

We need to skip writing RAMBlocks that are backed by shared objects.

Also, we should mark these RAMBlocks as skipped, so the snapshot format stays
readable to tools that later don't know QEMU's command line (for example
scripts/analyze-migration.py). I used bitmap_offset=0 pages_offset=0 for this.

This minor change to snapshot format should be safe, as offset=0 should not
have ever been possible.

Signed-off-by: Pawel Zmarzly <pzmarzly0@gmail.com>
---
This requires my previous patch "migration: fix parsing snapshots with
x-ignore-shared flag". To make things easier, you can see the stack at
https://gitlab.com/pzmarzly/qemu/-/commits/pzmarzly?ref_type=heads
---
 migration/ram.c | 51 ++++++++++++++++++++++++++++++++-----------------
 1 file changed, 33 insertions(+), 18 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 7d024b88b5..8063522a14 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3042,28 +3042,36 @@ static void mapped_ram_setup_ramblock(QEMUFile *file, RAMBlock *block)
     header = g_new0(MappedRamHeader, 1);
     header_size = sizeof(MappedRamHeader);
 
-    num_pages = block->used_length >> TARGET_PAGE_BITS;
-    bitmap_size = BITS_TO_LONGS(num_pages) * sizeof(unsigned long);
-
-    /*
-     * Save the file offsets of where the bitmap and the pages should
-     * go as they are written at the end of migration and during the
-     * iterative phase, respectively.
-     */
-    block->bitmap_offset = qemu_get_offset(file) + header_size;
-    block->pages_offset = ROUND_UP(block->bitmap_offset +
-                                   bitmap_size,
-                                   MAPPED_RAM_FILE_OFFSET_ALIGNMENT);
-
     header->version = cpu_to_be32(MAPPED_RAM_HDR_VERSION);
     header->page_size = cpu_to_be64(TARGET_PAGE_SIZE);
-    header->bitmap_offset = cpu_to_be64(block->bitmap_offset);
-    header->pages_offset = cpu_to_be64(block->pages_offset);
+
+    if (migrate_ram_is_ignored(block)) {
+        header->bitmap_offset = 0;
+        header->pages_offset = 0;
+    } else {
+        num_pages = block->used_length >> TARGET_PAGE_BITS;
+        bitmap_size = BITS_TO_LONGS(num_pages) * sizeof(unsigned long);
+
+        /*
+         * Save the file offsets of where the bitmap and the pages should
+         * go as they are written at the end of migration and during the
+         * iterative phase, respectively.
+         */
+        block->bitmap_offset = qemu_get_offset(file) + header_size;
+        block->pages_offset = ROUND_UP(block->bitmap_offset +
+                                       bitmap_size,
+                                       MAPPED_RAM_FILE_OFFSET_ALIGNMENT);
+
+        header->bitmap_offset = cpu_to_be64(block->bitmap_offset);
+        header->pages_offset = cpu_to_be64(block->pages_offset);
+    }
 
     qemu_put_buffer(file, (uint8_t *) header, header_size);
 
-    /* prepare offset for next ramblock */
-    qemu_set_offset(file, block->pages_offset + block->used_length, SEEK_SET);
+    if (!migrate_ram_is_ignored(block)) {
+        /* leave space for block data */
+        qemu_set_offset(file, block->pages_offset + block->used_length, SEEK_SET);
+    }
 }
 
 static bool mapped_ram_read_header(QEMUFile *file, MappedRamHeader *header,
@@ -3146,7 +3154,6 @@ static int ram_save_setup(QEMUFile *f, void *opaque, Error **errp)
             if (migrate_ignore_shared()) {
                 qemu_put_be64(f, block->mr->addr);
             }
-
             if (migrate_mapped_ram()) {
                 mapped_ram_setup_ramblock(f, block);
             }
@@ -3217,6 +3224,10 @@ static void ram_save_file_bmap(QEMUFile *f)
     RAMBlock *block;
 
     RAMBLOCK_FOREACH_MIGRATABLE(block) {
+        if (migrate_ram_is_ignored(block)) {
+            continue;
+        }
+
         long num_pages = block->used_length >> TARGET_PAGE_BITS;
         long bitmap_size = BITS_TO_LONGS(num_pages) * sizeof(unsigned long);
 
@@ -4162,6 +4173,10 @@ static void parse_ramblock_mapped_ram(QEMUFile *f, RAMBlock *block,
         return;
     }
 
+    if (migrate_ignore_shared() && header.bitmap_offset == 0 && header.pages_offset == 0) {
+        return;
+    }
+
     block->pages_offset = header.pages_offset;
 
     /*
-- 
2.52.0

Re: [PATCH] migration: Fix writing mapped_ram + ignore_shared snapshots

Posted by Peter Xu 2 months, 1 week ago

On Wed, Nov 26, 2025 at 03:47:34PM +0000, Pawel Zmarzly wrote:
> Currently if you set these flags and have any shared memory object, saving
> a snapshot will fail with:
> 
>     Failed to write bitmap to file: Unable to write to file: Bad address

This one is slightly more involved..  I suggest we put the stack dump
otherwise it's not clear what this error is about.

I reproduced it, so you can take this if you see fit:

#2  0x000055ca83dbb903 in qio_channel_file_pwritev (ioc=0x7f84e0010130, iov=0x7f841649e880, niov=1, offset=160, errp=0x7f841649e8d8) at ../io/channel-file.c:209
#3  0x000055ca83dc5044 in qio_channel_pwritev (ioc=0x7f84e0010130, iov=0x7f841649e880, niov=1, offset=160, errp=0x7f841649e8d8) at ../io/channel.c:467
#4  0x000055ca83dc50b2 in qio_channel_pwrite (ioc=0x7f84e0010130, buf=0x0, buflen=131072, offset=160, errp=0x7f841649e8d8) at ../io/channel.c:478
#5  0x000055ca83db8e29 in qemu_put_buffer_at (f=0x55caab660c40, buf=0x0, buflen=131072, pos=160) at ../migration/qemu-file.c:536
#6  0x000055ca83aa651c in ram_save_file_bmap (f=0x55caab660c40) at ../migration/ram.c:3223
#7  0x000055ca83aa6bcc in ram_save_complete (f=0x55caab660c40, opaque=0x55ca856b7020 <ram_state>) at ../migration/ram.c:3426
#8  0x000055ca83aaf2f2 in qemu_savevm_complete (se=0x55caab112420, f=0x55caab660c40) at ../migration/savevm.c:1521
#9  0x000055ca83aaf707 in qemu_savevm_state_complete_precopy_iterable (f=0x55caab660c40, in_postcopy=false) at ../migration/savevm.c:1627
#10 0x000055ca83aafa3a in qemu_savevm_state_complete_precopy (f=0x55caab660c40, iterable_only=false) at ../migration/savevm.c:1719
#11 0x000055ca83a8cca3 in migration_completion_precopy (s=0x55caab111bf0) at ../migration/migration.c:3027
#12 0x000055ca83a8cd6b in migration_completion (s=0x55caab111bf0) at ../migration/migration.c:3064
#13 0x000055ca83a8dcd6 in migration_iteration_run (s=0x55caab111bf0) at ../migration/migration.c:3542
#14 0x000055ca83a8e410 in migration_thread (opaque=0x55caab111bf0) at ../migration/migration.c:3834

IMHO the important bit is stack #4/#5 where buf==0x0.

So the issue is that ram_list_init_bitmaps() will not initialize file
bitmap for shared ramblocks if x-ignore-share=on.

This is another hint that the two features being used together never
worked.  The commit log should also mention that then it's fine we break
the file format in this specific case.

> 
> We need to skip writing RAMBlocks that are backed by shared objects.

Correct.

Fabiano had a proposal in the other email that we could skip calling
mapped_ram_setup_ramblock() completely for migrate_ram_is_ignored()
ramblocks.

But now I doubt if that's the best we can have.

The tricky bit here is x-ignore-share=on has one special use case, where
the src QEMU saves the snapshot with ramblocks share=on, while load the
snapshot with the exact same ramblocks but instead specify share=off (see
b17fbbe55cb).

Then to make such work, we may need to always still dump the header exactly
as what you did in this patch, otherwise migrate_ram_is_ignored() may
return differently on src / dst on the same ramblock in this case.

Above will also better be mentioned in the commit message.. to justify why
we still need the mapped-ram headers at all for shared ramblocks, rather
than skipping them.

> 
> Also, we should mark these RAMBlocks as skipped, so the snapshot format stays
> readable to tools that later don't know QEMU's command line (for example
> scripts/analyze-migration.py). I used bitmap_offset=0 pages_offset=0 for this.

Right, this is another reason, thanks for pointing this out.

Likely the root reason here is the same as I replied to Fabiano elsewhere,
that we treat ram_bytes_total_with_ignored() as total size of all the
ramblocks, hence we need to include shared ones here in headers.

> 
> This minor change to snapshot format should be safe, as offset=0 should not
> have ever been possible.

Maybe we should update mapped-ram.rst for this.  Please do so in one patch
if possible, we can also update it separately if we want.

> 
> Signed-off-by: Pawel Zmarzly <pzmarzly0@gmail.com>
> ---
> This requires my previous patch "migration: fix parsing snapshots with
> x-ignore-shared flag".

Now I start to question whether I should have that other fix of yours to be
for this release or next.

If this use case is completely broken, we shouldn't need to rush -rc
window, now I plan to merge all these fixes later when 11.0 dev window
opens.  Let me know if you, or Fabiano, has any comments.

> To make things easier, you can see the stack at
> https://gitlab.com/pzmarzly/qemu/-/commits/pzmarzly?ref_type=heads
> ---
>  migration/ram.c | 51 ++++++++++++++++++++++++++++++++-----------------
>  1 file changed, 33 insertions(+), 18 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 7d024b88b5..8063522a14 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -3042,28 +3042,36 @@ static void mapped_ram_setup_ramblock(QEMUFile *file, RAMBlock *block)
>      header = g_new0(MappedRamHeader, 1);
>      header_size = sizeof(MappedRamHeader);
>  
> -    num_pages = block->used_length >> TARGET_PAGE_BITS;
> -    bitmap_size = BITS_TO_LONGS(num_pages) * sizeof(unsigned long);
> -
> -    /*
> -     * Save the file offsets of where the bitmap and the pages should
> -     * go as they are written at the end of migration and during the
> -     * iterative phase, respectively.
> -     */
> -    block->bitmap_offset = qemu_get_offset(file) + header_size;
> -    block->pages_offset = ROUND_UP(block->bitmap_offset +
> -                                   bitmap_size,
> -                                   MAPPED_RAM_FILE_OFFSET_ALIGNMENT);
> -
>      header->version = cpu_to_be32(MAPPED_RAM_HDR_VERSION);
>      header->page_size = cpu_to_be64(TARGET_PAGE_SIZE);
> -    header->bitmap_offset = cpu_to_be64(block->bitmap_offset);
> -    header->pages_offset = cpu_to_be64(block->pages_offset);
> +
> +    if (migrate_ram_is_ignored(block)) {

nitpick: we can some comment here on what does "setting both to 0" imply.

> +        header->bitmap_offset = 0;
> +        header->pages_offset = 0;
> +    } else {
> +        num_pages = block->used_length >> TARGET_PAGE_BITS;
> +        bitmap_size = BITS_TO_LONGS(num_pages) * sizeof(unsigned long);
> +
> +        /*
> +         * Save the file offsets of where the bitmap and the pages should
> +         * go as they are written at the end of migration and during the
> +         * iterative phase, respectively.
> +         */
> +        block->bitmap_offset = qemu_get_offset(file) + header_size;
> +        block->pages_offset = ROUND_UP(block->bitmap_offset +
> +                                       bitmap_size,
> +                                       MAPPED_RAM_FILE_OFFSET_ALIGNMENT);
> +
> +        header->bitmap_offset = cpu_to_be64(block->bitmap_offset);
> +        header->pages_offset = cpu_to_be64(block->pages_offset);
> +    }
>  
>      qemu_put_buffer(file, (uint8_t *) header, header_size);
>  
> -    /* prepare offset for next ramblock */
> -    qemu_set_offset(file, block->pages_offset + block->used_length, SEEK_SET);
> +    if (!migrate_ram_is_ignored(block)) {
> +        /* leave space for block data */

Nit, the comment is touched up but "block data" might be misleading.  IMHO
we could spell it out, here essentially it's both the bitmap and the page
contents.

IMHO it can be something like:

           /*
            * Reserve space for both the ramblock bitmap and page contents.
            * Only needed if the ramblock will be migrated.
            */

> +        qemu_set_offset(file, block->pages_offset + block->used_length, SEEK_SET);
> +    }
>  }
>  
>  static bool mapped_ram_read_header(QEMUFile *file, MappedRamHeader *header,
> @@ -3146,7 +3154,6 @@ static int ram_save_setup(QEMUFile *f, void *opaque, Error **errp)
>              if (migrate_ignore_shared()) {
>                  qemu_put_be64(f, block->mr->addr);
>              }
> -
>              if (migrate_mapped_ram()) {
>                  mapped_ram_setup_ramblock(f, block);
>              }
> @@ -3217,6 +3224,10 @@ static void ram_save_file_bmap(QEMUFile *f)
>      RAMBlock *block;
>  
>      RAMBLOCK_FOREACH_MIGRATABLE(block) {
> +        if (migrate_ram_is_ignored(block)) {
> +            continue;
> +        }
> +
>          long num_pages = block->used_length >> TARGET_PAGE_BITS;
>          long bitmap_size = BITS_TO_LONGS(num_pages) * sizeof(unsigned long);
>  
> @@ -4162,6 +4173,10 @@ static void parse_ramblock_mapped_ram(QEMUFile *f, RAMBlock *block,
>          return;
>      }
>  
> +    if (migrate_ignore_shared() && header.bitmap_offset == 0 && header.pages_offset == 0) {

Over long line which will fail checkpatch.pl.

Optionally, we may introduce a tiny inline helper to check both offsets and
x-ignore-shared, then the helper can be e.g. ramblock_mapped_ram_ignored()
to help readability.

> +        return;
> +    }
> +
>      block->pages_offset = header.pages_offset;
>  
>      /*
> -- 
> 2.52.0
> 

Thanks,

-- 
Peter Xu

Re: [PATCH] migration: Fix writing mapped_ram + ignore_shared snapshots

Posted by Paweł Zmarzły 2 months, 1 week ago

On Wed, 26 Nov 2025 at 21:42, Peter Xu <peterx@redhat.com> wrote:
> Now I start to question whether I should have that other fix of yours to be
> for this release or next.
>
> If this use case is completely broken, we shouldn't need to rush -rc
> window, now I plan to merge all these fixes later when 11.0 dev window
> opens.  Let me know if you, or Fabiano, has any comments.

It is broken if you set ignore-shared and actually have any shared
block, but what could work today is if you just toggle the
ignore-shared flag on without setting up any shared blocks. In that
case, writing will work fine, but reading will crash. That's how I
stumbled upon this rabbithole in the first place: I forgot to unset
the flag and was surprised by parsing error. Whether it is worth
fixing now - I don't know, setting ignore-shared when there are no
shared blocks doesn't really make sense, so most likely nobody does it
on purpose.

In either case, I need to stop working on this for now, I thought this
will be a tiny side project that'll help me get my first patches in
(and get used to collaborating over email), but now it's growing in
complexity and I have other things that I need to prioritize.
Hopefully I'll come back to this within 11.x window. Thanks for all
the help so far, Peter, Fabiano!

Re: [PATCH] migration: Fix writing mapped_ram + ignore_shared snapshots

Posted by Peter Xu 2 months, 1 week ago

On Thu, Nov 27, 2025 at 11:35:06AM +0000, Paweł Zmarzły wrote:
> On Wed, 26 Nov 2025 at 21:42, Peter Xu <peterx@redhat.com> wrote:
> > Now I start to question whether I should have that other fix of yours to be
> > for this release or next.
> >
> > If this use case is completely broken, we shouldn't need to rush -rc
> > window, now I plan to merge all these fixes later when 11.0 dev window
> > opens.  Let me know if you, or Fabiano, has any comments.
> 
> It is broken if you set ignore-shared and actually have any shared
> block, but what could work today is if you just toggle the
> ignore-shared flag on without setting up any shared blocks. In that
> case, writing will work fine, but reading will crash. That's how I

Yep, I suppose either side of reliable failure means it's completely
broken. :(

That's IMHO an important evaluation because we could modify the image
layout without worrying breaking others only if it's completely broken..

> stumbled upon this rabbithole in the first place: I forgot to unset
> the flag and was surprised by parsing error. Whether it is worth
> fixing now - I don't know, setting ignore-shared when there are no
> shared blocks doesn't really make sense, so most likely nobody does it
> on purpose.
> 
> In either case, I need to stop working on this for now, I thought this
> will be a tiny side project that'll help me get my first patches in
> (and get used to collaborating over email), but now it's growing in
> complexity and I have other things that I need to prioritize.
> Hopefully I'll come back to this within 11.x window. Thanks for all
> the help so far, Peter, Fabiano!

Don't worry, thanks for all the contributions even so far!

Your patch actually looks pretty good already and mergeable, I just
nitpicked things here and there as I want to double check on things I
stated, and make it slow to get thoroughly discussed.

Personally, I think it's ok we queue this one already into -next together
with the other one, then we clean things on top.

Fabiano, sounds good to you?  PS: take your time reading, as long as you
agree we put it in -next only, then there's no rush. :)

Thanks,

-- 
Peter Xu

Re: [PATCH] migration: Fix writing mapped_ram + ignore_shared snapshots

Posted by Fabiano Rosas 2 months, 1 week ago

Peter Xu <peterx@redhat.com> writes:

> On Thu, Nov 27, 2025 at 11:35:06AM +0000, Paweł Zmarzły wrote:
>> On Wed, 26 Nov 2025 at 21:42, Peter Xu <peterx@redhat.com> wrote:
>> > Now I start to question whether I should have that other fix of yours to be
>> > for this release or next.
>> >
>> > If this use case is completely broken, we shouldn't need to rush -rc
>> > window, now I plan to merge all these fixes later when 11.0 dev window
>> > opens.  Let me know if you, or Fabiano, has any comments.
>> 
>> It is broken if you set ignore-shared and actually have any shared
>> block, but what could work today is if you just toggle the
>> ignore-shared flag on without setting up any shared blocks. In that
>> case, writing will work fine, but reading will crash. That's how I
>
> Yep, I suppose either side of reliable failure means it's completely
> broken. :(
>
> That's IMHO an important evaluation because we could modify the image
> layout without worrying breaking others only if it's completely broken..
>
>> stumbled upon this rabbithole in the first place: I forgot to unset
>> the flag and was surprised by parsing error. Whether it is worth
>> fixing now - I don't know, setting ignore-shared when there are no
>> shared blocks doesn't really make sense, so most likely nobody does it
>> on purpose.
>> 
>> In either case, I need to stop working on this for now, I thought this
>> will be a tiny side project that'll help me get my first patches in
>> (and get used to collaborating over email), but now it's growing in
>> complexity and I have other things that I need to prioritize.
>> Hopefully I'll come back to this within 11.x window. Thanks for all
>> the help so far, Peter, Fabiano!
>
> Don't worry, thanks for all the contributions even so far!
>
> Your patch actually looks pretty good already and mergeable, I just
> nitpicked things here and there as I want to double check on things I
> stated, and make it slow to get thoroughly discussed.
>
> Personally, I think it's ok we queue this one already into -next together
> with the other one, then we clean things on top.
>
> Fabiano, sounds good to you?  PS: take your time reading, as long as you
> agree we put it in -next only, then there's no rush. :)

Yep, that's fine.

>
> Thanks,