[v2] migration: File based migration with multifd and fixed-ram

[PATCH v2 16/29] migration/ram: Add support for 'fixed-ram' migration restore

Posted by Fabiano Rosas 2 years, 3 months ago

From: Nikolay Borisov <nborisov@suse.com>

Add the necessary code to parse the format changes for the 'fixed-ram'
capability.

One of the more notable changes in behavior is that in the 'fixed-ram'
case ram pages are restored in one go rather than constantly looping
through the migration stream.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
(farosas) reused more of the common code by making the fixed-ram
function take only one ramblock and calling it from inside
parse_ramblock.
---
 migration/ram.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 93 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 152a03604f..cea6971ab2 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3032,6 +3032,32 @@ static void fixed_ram_insert_header(QEMUFile *file, RAMBlock *block)
     qemu_put_buffer(file, (uint8_t *) header, header_size);
 }
 
+static int fixed_ram_read_header(QEMUFile *file, struct FixedRamHeader *header)
+{
+    size_t ret, header_size = sizeof(struct FixedRamHeader);
+
+    ret = qemu_get_buffer(file, (uint8_t *)header, header_size);
+    if (ret != header_size) {
+        return -1;
+    }
+
+    /* migration stream is big-endian */
+    be32_to_cpus(&header->version);
+
+    if (header->version > FIXED_RAM_HDR_VERSION) {
+        error_report("Migration fixed-ram capability version mismatch (expected %d, got %d)",
+                     FIXED_RAM_HDR_VERSION, header->version);
+        return -1;
+    }
+
+    be64_to_cpus(&header->page_size);
+    be64_to_cpus(&header->bitmap_offset);
+    be64_to_cpus(&header->pages_offset);
+
+
+    return 0;
+}
+
 /*
  * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
  * long-running RCU critical section.  When rcu-reclaims in the code
@@ -3932,6 +3958,68 @@ void colo_flush_ram_cache(void)
     trace_colo_flush_ram_cache_end();
 }
 
+static void read_ramblock_fixed_ram(QEMUFile *f, RAMBlock *block,
+                                    long num_pages, unsigned long *bitmap)
+{
+    unsigned long set_bit_idx, clear_bit_idx;
+    unsigned long len;
+    ram_addr_t offset;
+    void *host;
+    size_t read, completed, read_len;
+
+    for (set_bit_idx = find_first_bit(bitmap, num_pages);
+         set_bit_idx < num_pages;
+         set_bit_idx = find_next_bit(bitmap, num_pages, clear_bit_idx + 1)) {
+
+        clear_bit_idx = find_next_zero_bit(bitmap, num_pages, set_bit_idx + 1);
+
+        len = TARGET_PAGE_SIZE * (clear_bit_idx - set_bit_idx);
+        offset = set_bit_idx << TARGET_PAGE_BITS;
+
+        for (read = 0, completed = 0; completed < len; offset += read) {
+            host = host_from_ram_block_offset(block, offset);
+            read_len = MIN(len, TARGET_PAGE_SIZE);
+
+            read = qemu_get_buffer_at(f, host, read_len,
+                                      block->pages_offset + offset);
+            completed += read;
+        }
+    }
+}
+
+static int parse_ramblock_fixed_ram(QEMUFile *f, RAMBlock *block, ram_addr_t length)
+{
+    g_autofree unsigned long *bitmap = NULL;
+    struct FixedRamHeader header;
+    size_t bitmap_size;
+    long num_pages;
+    int ret = 0;
+
+    ret = fixed_ram_read_header(f, &header);
+    if (ret < 0) {
+        error_report("Error reading fixed-ram header");
+        return -EINVAL;
+    }
+
+    block->pages_offset = header.pages_offset;
+    num_pages = length / header.page_size;
+    bitmap_size = BITS_TO_LONGS(num_pages) * sizeof(unsigned long);
+
+    bitmap = g_malloc0(bitmap_size);
+    if (qemu_get_buffer_at(f, (uint8_t *)bitmap, bitmap_size,
+                           header.bitmap_offset) != bitmap_size) {
+        error_report("Error parsing dirty bitmap");
+        return -EINVAL;
+    }
+
+    read_ramblock_fixed_ram(f, block, num_pages, bitmap);
+
+    /* Skip pages array */
+    qemu_set_offset(f, block->pages_offset + length, SEEK_SET);
+
+    return ret;
+}
+
 static int parse_ramblock(QEMUFile *f, RAMBlock *block, ram_addr_t length)
 {
     int ret = 0;
@@ -3940,6 +4028,10 @@ static int parse_ramblock(QEMUFile *f, RAMBlock *block, ram_addr_t length)
 
     assert(block);
 
+    if (migrate_fixed_ram()) {
+        return parse_ramblock_fixed_ram(f, block, length);
+    }
+
     if (!qemu_ram_is_migratable(block)) {
         error_report("block %s should not be migrated !", block->idstr);
         return -EINVAL;
@@ -4142,6 +4234,7 @@ static int ram_load_precopy(QEMUFile *f)
                 migrate_multifd_flush_after_each_section()) {
                 multifd_recv_sync_main();
             }
+
             break;
         case RAM_SAVE_FLAG_HOOK:
             ret = rdma_registration_handle(f);
-- 
2.35.3

Re: [PATCH v2 16/29] migration/ram: Add support for 'fixed-ram' migration restore

Posted by Daniel P. Berrangé 2 years, 3 months ago

On Mon, Oct 23, 2023 at 05:35:55PM -0300, Fabiano Rosas wrote:
> From: Nikolay Borisov <nborisov@suse.com>
> 
> Add the necessary code to parse the format changes for the 'fixed-ram'
> capability.
> 
> One of the more notable changes in behavior is that in the 'fixed-ram'
> case ram pages are restored in one go rather than constantly looping
> through the migration stream.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
> (farosas) reused more of the common code by making the fixed-ram
> function take only one ramblock and calling it from inside
> parse_ramblock.
> ---
>  migration/ram.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 93 insertions(+)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 152a03604f..cea6971ab2 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -3032,6 +3032,32 @@ static void fixed_ram_insert_header(QEMUFile *file, RAMBlock *block)
>      qemu_put_buffer(file, (uint8_t *) header, header_size);
>  }
>  
> +static int fixed_ram_read_header(QEMUFile *file, struct FixedRamHeader *header)
> +{
> +    size_t ret, header_size = sizeof(struct FixedRamHeader);
> +
> +    ret = qemu_get_buffer(file, (uint8_t *)header, header_size);
> +    if (ret != header_size) {
> +        return -1;
> +    }
> +
> +    /* migration stream is big-endian */
> +    be32_to_cpus(&header->version);
> +
> +    if (header->version > FIXED_RAM_HDR_VERSION) {
> +        error_report("Migration fixed-ram capability version mismatch (expected %d, got %d)",
> +                     FIXED_RAM_HDR_VERSION, header->version);
> +        return -1;
> +    }
> +
> +    be64_to_cpus(&header->page_size);
> +    be64_to_cpus(&header->bitmap_offset);
> +    be64_to_cpus(&header->pages_offset);
> +
> +
> +    return 0;
> +}
> +
>  /*
>   * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
>   * long-running RCU critical section.  When rcu-reclaims in the code
> @@ -3932,6 +3958,68 @@ void colo_flush_ram_cache(void)
>      trace_colo_flush_ram_cache_end();
>  }
>  
> +static void read_ramblock_fixed_ram(QEMUFile *f, RAMBlock *block,
> +                                    long num_pages, unsigned long *bitmap)
> +{
> +    unsigned long set_bit_idx, clear_bit_idx;
> +    unsigned long len;
> +    ram_addr_t offset;
> +    void *host;
> +    size_t read, completed, read_len;
> +
> +    for (set_bit_idx = find_first_bit(bitmap, num_pages);
> +         set_bit_idx < num_pages;
> +         set_bit_idx = find_next_bit(bitmap, num_pages, clear_bit_idx + 1)) {
> +
> +        clear_bit_idx = find_next_zero_bit(bitmap, num_pages, set_bit_idx + 1);
> +
> +        len = TARGET_PAGE_SIZE * (clear_bit_idx - set_bit_idx);
> +        offset = set_bit_idx << TARGET_PAGE_BITS;
> +
> +        for (read = 0, completed = 0; completed < len; offset += read) {
> +            host = host_from_ram_block_offset(block, offset);
> +            read_len = MIN(len, TARGET_PAGE_SIZE);
> +
> +            read = qemu_get_buffer_at(f, host, read_len,
> +                                      block->pages_offset + offset);
> +            completed += read;
> +        }
> +    }
> +}
> +
> +static int parse_ramblock_fixed_ram(QEMUFile *f, RAMBlock *block, ram_addr_t length)
> +{
> +    g_autofree unsigned long *bitmap = NULL;
> +    struct FixedRamHeader header;
> +    size_t bitmap_size;
> +    long num_pages;
> +    int ret = 0;
> +
> +    ret = fixed_ram_read_header(f, &header);
> +    if (ret < 0) {
> +        error_report("Error reading fixed-ram header");
> +        return -EINVAL;
> +    }
> +
> +    block->pages_offset = header.pages_offset;

Do you think it is worth sanity checking that 'pages_offset' is aligned
in some way.

It is nice that we have flexibility to change the alignment in future
if we find 1 MB is not optimal, so I wouldn't want to force 1MB align
check htere. Perhaps we could at least sanity check for alignment at
TARGET_PAGE_SIZE, to detect a gross data corruption problem ?

> +    num_pages = length / header.page_size;
> +    bitmap_size = BITS_TO_LONGS(num_pages) * sizeof(unsigned long);
> +
> +    bitmap = g_malloc0(bitmap_size);
> +    if (qemu_get_buffer_at(f, (uint8_t *)bitmap, bitmap_size,
> +                           header.bitmap_offset) != bitmap_size) {
> +        error_report("Error parsing dirty bitmap");

s/parsing/reading/ since we're not actually parsing any semantic
info here.

> +        return -EINVAL;
> +    }
> +
> +    read_ramblock_fixed_ram(f, block, num_pages, bitmap);
> +
> +    /* Skip pages array */
> +    qemu_set_offset(f, block->pages_offset + length, SEEK_SET);
> +
> +    return ret;
> +}
> +
>  static int parse_ramblock(QEMUFile *f, RAMBlock *block, ram_addr_t length)
>  {
>      int ret = 0;
> @@ -3940,6 +4028,10 @@ static int parse_ramblock(QEMUFile *f, RAMBlock *block, ram_addr_t length)
>  
>      assert(block);
>  
> +    if (migrate_fixed_ram()) {
> +        return parse_ramblock_fixed_ram(f, block, length);
> +    }
> +
>      if (!qemu_ram_is_migratable(block)) {
>          error_report("block %s should not be migrated !", block->idstr);
>          return -EINVAL;
> @@ -4142,6 +4234,7 @@ static int ram_load_precopy(QEMUFile *f)
>                  migrate_multifd_flush_after_each_section()) {
>                  multifd_recv_sync_main();
>              }
> +
>              break;

Spurious whitespace


>          case RAM_SAVE_FLAG_HOOK:
>              ret = rdma_registration_handle(f);
> -- 
> 2.35.3
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH v2 16/29] migration/ram: Add support for 'fixed-ram' migration restore

Posted by Fabiano Rosas 2 years, 3 months ago

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Mon, Oct 23, 2023 at 05:35:55PM -0300, Fabiano Rosas wrote:
>> From: Nikolay Borisov <nborisov@suse.com>
>> 
>> Add the necessary code to parse the format changes for the 'fixed-ram'
>> capability.
>> 
>> One of the more notable changes in behavior is that in the 'fixed-ram'
>> case ram pages are restored in one go rather than constantly looping
>> through the migration stream.
>> 
>> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>> (farosas) reused more of the common code by making the fixed-ram
>> function take only one ramblock and calling it from inside
>> parse_ramblock.
>> ---
>>  migration/ram.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 93 insertions(+)
>> 
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 152a03604f..cea6971ab2 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -3032,6 +3032,32 @@ static void fixed_ram_insert_header(QEMUFile *file, RAMBlock *block)
>>      qemu_put_buffer(file, (uint8_t *) header, header_size);
>>  }
>>  
>> +static int fixed_ram_read_header(QEMUFile *file, struct FixedRamHeader *header)
>> +{
>> +    size_t ret, header_size = sizeof(struct FixedRamHeader);
>> +
>> +    ret = qemu_get_buffer(file, (uint8_t *)header, header_size);
>> +    if (ret != header_size) {
>> +        return -1;
>> +    }
>> +
>> +    /* migration stream is big-endian */
>> +    be32_to_cpus(&header->version);
>> +
>> +    if (header->version > FIXED_RAM_HDR_VERSION) {
>> +        error_report("Migration fixed-ram capability version mismatch (expected %d, got %d)",
>> +                     FIXED_RAM_HDR_VERSION, header->version);
>> +        return -1;
>> +    }
>> +
>> +    be64_to_cpus(&header->page_size);
>> +    be64_to_cpus(&header->bitmap_offset);
>> +    be64_to_cpus(&header->pages_offset);
>> +
>> +
>> +    return 0;
>> +}
>> +
>>  /*
>>   * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
>>   * long-running RCU critical section.  When rcu-reclaims in the code
>> @@ -3932,6 +3958,68 @@ void colo_flush_ram_cache(void)
>>      trace_colo_flush_ram_cache_end();
>>  }
>>  
>> +static void read_ramblock_fixed_ram(QEMUFile *f, RAMBlock *block,
>> +                                    long num_pages, unsigned long *bitmap)
>> +{
>> +    unsigned long set_bit_idx, clear_bit_idx;
>> +    unsigned long len;
>> +    ram_addr_t offset;
>> +    void *host;
>> +    size_t read, completed, read_len;
>> +
>> +    for (set_bit_idx = find_first_bit(bitmap, num_pages);
>> +         set_bit_idx < num_pages;
>> +         set_bit_idx = find_next_bit(bitmap, num_pages, clear_bit_idx + 1)) {
>> +
>> +        clear_bit_idx = find_next_zero_bit(bitmap, num_pages, set_bit_idx + 1);
>> +
>> +        len = TARGET_PAGE_SIZE * (clear_bit_idx - set_bit_idx);
>> +        offset = set_bit_idx << TARGET_PAGE_BITS;
>> +
>> +        for (read = 0, completed = 0; completed < len; offset += read) {
>> +            host = host_from_ram_block_offset(block, offset);
>> +            read_len = MIN(len, TARGET_PAGE_SIZE);
>> +
>> +            read = qemu_get_buffer_at(f, host, read_len,
>> +                                      block->pages_offset + offset);
>> +            completed += read;
>> +        }
>> +    }
>> +}
>> +
>> +static int parse_ramblock_fixed_ram(QEMUFile *f, RAMBlock *block, ram_addr_t length)
>> +{
>> +    g_autofree unsigned long *bitmap = NULL;
>> +    struct FixedRamHeader header;
>> +    size_t bitmap_size;
>> +    long num_pages;
>> +    int ret = 0;
>> +
>> +    ret = fixed_ram_read_header(f, &header);
>> +    if (ret < 0) {
>> +        error_report("Error reading fixed-ram header");
>> +        return -EINVAL;
>> +    }
>> +
>> +    block->pages_offset = header.pages_offset;
>
> Do you think it is worth sanity checking that 'pages_offset' is aligned
> in some way.
>
> It is nice that we have flexibility to change the alignment in future
> if we find 1 MB is not optimal, so I wouldn't want to force 1MB align
> check htere. Perhaps we could at least sanity check for alignment at
> TARGET_PAGE_SIZE, to detect a gross data corruption problem ?
>

I don't see why not. I'll add it.

Re: [PATCH v2 16/29] migration/ram: Add support for 'fixed-ram' migration restore

Posted by Peter Xu 2 years, 3 months ago

On Wed, Oct 25, 2023 at 11:07:33AM -0300, Fabiano Rosas wrote:
> >> +static int parse_ramblock_fixed_ram(QEMUFile *f, RAMBlock *block, ram_addr_t length)
> >> +{
> >> +    g_autofree unsigned long *bitmap = NULL;
> >> +    struct FixedRamHeader header;
> >> +    size_t bitmap_size;
> >> +    long num_pages;
> >> +    int ret = 0;
> >> +
> >> +    ret = fixed_ram_read_header(f, &header);
> >> +    if (ret < 0) {
> >> +        error_report("Error reading fixed-ram header");
> >> +        return -EINVAL;
> >> +    }
> >> +
> >> +    block->pages_offset = header.pages_offset;
> >
> > Do you think it is worth sanity checking that 'pages_offset' is aligned
> > in some way.
> >
> > It is nice that we have flexibility to change the alignment in future
> > if we find 1 MB is not optimal, so I wouldn't want to force 1MB align
> > check htere. Perhaps we could at least sanity check for alignment at
> > TARGET_PAGE_SIZE, to detect a gross data corruption problem ?
> >
> 
> I don't see why not. I'll add it.

Is there any explanation on why that 1MB offset, and how the number is
chosen?  Thanks,

-- 
Peter Xu

Re: [PATCH v2 16/29] migration/ram: Add support for 'fixed-ram' migration restore

Posted by Daniel P. Berrangé 2 years, 3 months ago

On Tue, Oct 31, 2023 at 03:03:50PM -0400, Peter Xu wrote:
> On Wed, Oct 25, 2023 at 11:07:33AM -0300, Fabiano Rosas wrote:
> > >> +static int parse_ramblock_fixed_ram(QEMUFile *f, RAMBlock *block, ram_addr_t length)
> > >> +{
> > >> +    g_autofree unsigned long *bitmap = NULL;
> > >> +    struct FixedRamHeader header;
> > >> +    size_t bitmap_size;
> > >> +    long num_pages;
> > >> +    int ret = 0;
> > >> +
> > >> +    ret = fixed_ram_read_header(f, &header);
> > >> +    if (ret < 0) {
> > >> +        error_report("Error reading fixed-ram header");
> > >> +        return -EINVAL;
> > >> +    }
> > >> +
> > >> +    block->pages_offset = header.pages_offset;
> > >
> > > Do you think it is worth sanity checking that 'pages_offset' is aligned
> > > in some way.
> > >
> > > It is nice that we have flexibility to change the alignment in future
> > > if we find 1 MB is not optimal, so I wouldn't want to force 1MB align
> > > check htere. Perhaps we could at least sanity check for alignment at
> > > TARGET_PAGE_SIZE, to detect a gross data corruption problem ?
> > >
> > 
> > I don't see why not. I'll add it.
> 
> Is there any explanation on why that 1MB offset, and how the number is
> chosen?  Thanks,

The fixed-ram format is anticipating the use of O_DIRECT.

With O_DIRECT both the buffers in memory, and the file handle offset
have alignment requirements. The buffer alignments are usually page
sized, and QEMU RAM blocks will trivially satisfy those.

The file handle offset alignment varies per filesystem. While you can
query the alignment for the FS holding the file with statx(), that is
not appropriate todo. If a user saves/restores QEMU state to file, we
must assume there is a chance the user will copy the saved state to a
different filesystem.

IOW, we want alignment to satisfy the likely worst case.

Picking 1 MB is a nice round number that is large enough that it is
almost certainly going to satisfy any filesystem alignment. In fact
it is likely massive overkill. None the less 1 MB is also still tiny
in the context of guest RAM sizes, so no one is going to notice the
padding holes in the file from this.

IOW, the 1 MB choice is an arbitrary, but somewhat informed choice.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|