migration/rdma: change default rdma chunk size to 64MiB

[PATCH] migration/rdma: change default rdma chunk size to 64MiB

Posted by Samuel Zhang 2 weeks, 2 days ago

The 1MiB default dates back to the original RDMA implementation in
2013 (commit 2da776db48), and is too conservative for modern hardware.

64MiB captures most of the throughput gain (~10x over 1MiB) while
keeping transferred data low.  Larger chunks cause more data to be
retransferred per dirty page, so the largest chunk size is not
necessarily optimal (see 1024MiB row).  The x-rdma-chunk-size
parameter remains available for user tuning.

Test config: BlueField-3 ConnectX-7, 8GB VM RAM, pin-all off,
  `stress-ng --vm 4 --vm-bytes 1G --vm-method rand-set`

chunk_size  total(ms)  down(ms)  Throughput(Mbps)  transferred
1m            45,156    1,166          1,252.50     6.46 GiB
32m           15,034    1,864          3,401.26     5.57 GiB
64m            4,492    1,554         13,637.46     5.75 GiB
128m           3,940    1,662         16,860.59     6.06 GiB
1024m          3,665    2,238         24,676.59     8.04 GiB

Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
---
 migration/options.c | 2 +-
 qapi/migration.json | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration/options.c b/migration/options.c
index 5cbfd29099..ea2137372c 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -91,7 +91,7 @@ const PropertyInfo qdev_prop_StrOrNull;
 
 #define DEFAULT_MIGRATE_VCPU_DIRTY_LIMIT_PERIOD     1000    /* milliseconds */
 #define DEFAULT_MIGRATE_VCPU_DIRTY_LIMIT            1       /* MB/s */
-#define DEFAULT_MIGRATE_X_RDMA_CHUNK_SIZE           MiB
+#define DEFAULT_MIGRATE_X_RDMA_CHUNK_SIZE           (64 * MiB)
 
 const Property migration_properties[] = {
     DEFINE_PROP_BOOL("store-global-state", MigrationState,
diff --git a/qapi/migration.json b/qapi/migration.json
index 0db115ec5e..b3815f0594 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1009,7 +1009,7 @@
 #     the remainder its arguments.  (Since 10.2)
 #
 # @x-rdma-chunk-size: RDMA memory registration chunk size in bytes.
-#     Default is 1MiB.  Must be a power of 2 in the range
+#     Default is 64MiB.  Must be a power of 2 in the range
 #     [1MiB, 1024MiB].  Only applies when migrating via RDMA.
 #     Must be set to the same value on both source and destination
 #     before migration starts.  (Since 11.1)
-- 
2.43.7

Re: [PATCH] migration/rdma: change default rdma chunk size to 64MiB

Posted by Zhijian Li (Fujitsu) 1 week, 5 days ago

Samuel,


Thanks for the patch.

On 14/05/2026 11:18, Samuel Zhang wrote:
> The 1MiB default dates back to the original RDMA implementation in
> 2013 (commit 2da776db48), and is too conservative for modern hardware.
> 
> 64MiB captures most of the throughput gain (~10x over 1MiB) while
> keeping transferred data low.  Larger chunks cause more data to be
> retransferred per dirty page, so the largest chunk size is not
> necessarily optimal (see 1024MiB row).  The x-rdma-chunk-size
> parameter remains available for user tuning.
> 
> Test config: BlueField-3 ConnectX-7, 8GB VM RAM, pin-all off,
>    `stress-ng --vm 4 --vm-bytes 1G --vm-method rand-set`
> 
> chunk_size  total(ms)  down(ms)  Throughput(Mbps)  transferred
> 1m            45,156    1,166          1,252.50     6.46 GiB
> 32m           15,034    1,864          3,401.26     5.57 GiB
> 64m            4,492    1,554         13,637.46     5.75 GiB
> 128m           3,940    1,662         16,860.59     6.06 GiB
> 1024m          3,665    2,238         24,676.59     8.04 GiB
> 
> Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
> ---
>   migration/options.c | 2 +-
>   qapi/migration.json | 2 +-
>   2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/options.c b/migration/options.c
> index 5cbfd29099..ea2137372c 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -91,7 +91,7 @@ const PropertyInfo qdev_prop_StrOrNull;
>   
>   #define DEFAULT_MIGRATE_VCPU_DIRTY_LIMIT_PERIOD     1000    /* milliseconds */
>   #define DEFAULT_MIGRATE_VCPU_DIRTY_LIMIT            1       /* MB/s */
> -#define DEFAULT_MIGRATE_X_RDMA_CHUNK_SIZE           MiB
> +#define DEFAULT_MIGRATE_X_RDMA_CHUNK_SIZE           (64 * MiB)


I have a concern about backward compatibility.

AFAIK, changing the default chunk size could break RDMA migration between hosts running different QEMU versions.
If this happens, the error message is not clear enough for a user to understand that the failure
is due to a mismatch in 'x-rdma-chunk-size'?

[1] https://lore.kernel.org/qemu-devel/6f1df732-5c2c-4f1b-a59a-aa6af5566505@fujitsu.com/

Thanks
Zhijian


>   
>   const Property migration_properties[] = {
>       DEFINE_PROP_BOOL("store-global-state", MigrationState,
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 0db115ec5e..b3815f0594 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -1009,7 +1009,7 @@
>   #     the remainder its arguments.  (Since 10.2)
>   #
>   # @x-rdma-chunk-size: RDMA memory registration chunk size in bytes.
> -#     Default is 1MiB.  Must be a power of 2 in the range
> +#     Default is 64MiB.  Must be a power of 2 in the range
>   #     [1MiB, 1024MiB].  Only applies when migrating via RDMA.
>   #     Must be set to the same value on both source and destination
>   #     before migration starts.  (Since 11.1)

Re: [PATCH] migration/rdma: change default rdma chunk size to 64MiB

Posted by Daniel P. Berrangé 1 week, 5 days ago

On Mon, May 18, 2026 at 02:17:58AM +0000, Zhijian Li (Fujitsu) wrote:
> Samuel,
> 
> 
> Thanks for the patch.
> 
> On 14/05/2026 11:18, Samuel Zhang wrote:
> > The 1MiB default dates back to the original RDMA implementation in
> > 2013 (commit 2da776db48), and is too conservative for modern hardware.
> > 
> > 64MiB captures most of the throughput gain (~10x over 1MiB) while
> > keeping transferred data low.  Larger chunks cause more data to be
> > retransferred per dirty page, so the largest chunk size is not
> > necessarily optimal (see 1024MiB row).  The x-rdma-chunk-size
> > parameter remains available for user tuning.
> > 
> > Test config: BlueField-3 ConnectX-7, 8GB VM RAM, pin-all off,
> >    `stress-ng --vm 4 --vm-bytes 1G --vm-method rand-set`
> > 
> > chunk_size  total(ms)  down(ms)  Throughput(Mbps)  transferred
> > 1m            45,156    1,166          1,252.50     6.46 GiB
> > 32m           15,034    1,864          3,401.26     5.57 GiB
> > 64m            4,492    1,554         13,637.46     5.75 GiB
> > 128m           3,940    1,662         16,860.59     6.06 GiB
> > 1024m          3,665    2,238         24,676.59     8.04 GiB
> > 
> > Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
> > ---
> >   migration/options.c | 2 +-
> >   qapi/migration.json | 2 +-
> >   2 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/migration/options.c b/migration/options.c
> > index 5cbfd29099..ea2137372c 100644
> > --- a/migration/options.c
> > +++ b/migration/options.c
> > @@ -91,7 +91,7 @@ const PropertyInfo qdev_prop_StrOrNull;
> >   
> >   #define DEFAULT_MIGRATE_VCPU_DIRTY_LIMIT_PERIOD     1000    /* milliseconds */
> >   #define DEFAULT_MIGRATE_VCPU_DIRTY_LIMIT            1       /* MB/s */
> > -#define DEFAULT_MIGRATE_X_RDMA_CHUNK_SIZE           MiB
> > +#define DEFAULT_MIGRATE_X_RDMA_CHUNK_SIZE           (64 * MiB)
> 
> 
> I have a concern about backward compatibility.
> 
> AFAIK, changing the default chunk size could break RDMA migration between hosts running different QEMU versions.
> If this happens, the error message is not clear enough for a user to understand that the failure
> is due to a mismatch in 'x-rdma-chunk-size'?

Oh that's rather unfortunate. Even though x-rdma-chunk-size is marked
experimental/unstable, we can't change its default value if it breaks
migration compatibility out of the box :-(  Libvirt (or equivalent)
would need to negotiate the chunk size to a larger value, which would
mean we need to declare x-rdma-chunk-size stable by removing the x-
prefix.

With regards,
Daniel
-- 
|: https://berrange.com       ~~        https://hachyderm.io/@berrange :|
|: https://libvirt.org          ~~          https://entangle-photo.org :|
|: https://pixelfed.art/berrange   ~~    https://fstop138.berrange.com :|

Re: [PATCH] migration/rdma: change default rdma chunk size to 64MiB

Posted by Markus Armbruster 1 week, 5 days ago

"Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes:

> Samuel,
>
>
> Thanks for the patch.
>
> On 14/05/2026 11:18, Samuel Zhang wrote:
>> The 1MiB default dates back to the original RDMA implementation in
>> 2013 (commit 2da776db48), and is too conservative for modern hardware.
>> 
>> 64MiB captures most of the throughput gain (~10x over 1MiB) while
>> keeping transferred data low.  Larger chunks cause more data to be
>> retransferred per dirty page, so the largest chunk size is not
>> necessarily optimal (see 1024MiB row).  The x-rdma-chunk-size
>> parameter remains available for user tuning.
>> 
>> Test config: BlueField-3 ConnectX-7, 8GB VM RAM, pin-all off,
>>    `stress-ng --vm 4 --vm-bytes 1G --vm-method rand-set`
>> 
>> chunk_size  total(ms)  down(ms)  Throughput(Mbps)  transferred
>> 1m            45,156    1,166          1,252.50     6.46 GiB
>> 32m           15,034    1,864          3,401.26     5.57 GiB
>> 64m            4,492    1,554         13,637.46     5.75 GiB
>> 128m           3,940    1,662         16,860.59     6.06 GiB
>> 1024m          3,665    2,238         24,676.59     8.04 GiB
>> 
>> Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
>> ---
>>  migration/options.c | 2 +-
>>  qapi/migration.json | 2 +-
>>  2 files changed, 2 insertions(+), 2 deletions(-)
>> 
>> diff --git a/migration/options.c b/migration/options.c
>> index 5cbfd29099..ea2137372c 100644
>> --- a/migration/options.c
>> +++ b/migration/options.c
>> @@ -91,7 +91,7 @@ const PropertyInfo qdev_prop_StrOrNull;
>>   
>>  #define DEFAULT_MIGRATE_VCPU_DIRTY_LIMIT_PERIOD     1000    /* milliseconds */
>>  #define DEFAULT_MIGRATE_VCPU_DIRTY_LIMIT            1       /* MB/s */
>> -#define DEFAULT_MIGRATE_X_RDMA_CHUNK_SIZE           MiB
>> +#define DEFAULT_MIGRATE_X_RDMA_CHUNK_SIZE           (64 * MiB)
>
>
> I have a concern about backward compatibility.

@x-rdma-chunk-size is marked unstable, thus doesn't come with
compatibility promises.

Does unstable still make sense?

> AFAIK, changing the default chunk size could break RDMA migration between hosts running different QEMU versions.
> If this happens, the error message is not clear enough for a user to understand that the failure
> is due to a mismatch in 'x-rdma-chunk-size'?
>
> [1] https://lore.kernel.org/qemu-devel/6f1df732-5c2c-4f1b-a59a-aa6af5566505@fujitsu.com/
>
> Thanks
> Zhijian