[v2] migration/vfio: Fix a few issues on API misuse or statistic reports

[PATCH v2 00/16] migration/vfio: Fix a few issues on API misuse or statistic reports

Peter Xu posted 16 patches 1 month, 1 week ago

Diff against v1 v1
Download series mbox

Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20260421202110.306051-1-peterx@redhat.com

Maintainers: Pierrick Bouvier <pierrick.bouvier@oss.qualcomm.com>, Peter Xu <peterx@redhat.com>, Fabiano Rosas <farosas@suse.de>, Alex Williamson <alex@shazbot.org>, "Cédric Le Goater" <clg@redhat.com>, Halil Pasic <pasic@linux.ibm.com>, Christian Borntraeger <borntraeger@linux.ibm.com>, Jason Herne <jjherne@linux.ibm.com>, Eric Farman <farman@linux.ibm.com>, Matthew Rosato <mjrosato@linux.ibm.com>, Richard Henderson <richard.henderson@linaro.org>, Ilya Leoshkevich <iii@linux.ibm.com>, David Hildenbrand <david@kernel.org>, Cornelia Huck <cohuck@redhat.com>, Eric Blake <eblake@redhat.com>, Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>, John Snow <jsnow@redhat.com>, Markus Armbruster <armbru@redhat.com>, Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>

docs/about/removed-features.rst               |   2 +-
docs/devel/migration/main.rst                 |   9 +-
docs/devel/migration/vfio.rst                 |   9 +-
qapi/migration.json                           |  32 ++--
hw/vfio/vfio-migration-internal.h             |   8 +
include/migration/register.h                  |  59 +++---
migration/migration-stats.h                   |  20 +-
migration/migration.h                         |   2 +-
migration/savevm.h                            |   7 +-
hw/s390x/s390-stattrib.c                      |   9 +-
hw/vfio/migration.c                           | 123 +++++++-----
migration/block-dirty-bitmap.c                |  10 +-
migration/migration-hmp-cmds.c                |   5 +
migration/migration.c                         | 177 +++++++++++++-----
migration/ram.c                               |  40 +---
migration/savevm.c                            |  42 ++---
hw/vfio/trace-events                          |   5 +-
migration/trace-events                        |   3 +-
.../tests/migrate-bitmaps-postcopy-test       |   6 +
19 files changed, 322 insertions(+), 246 deletions(-)

Expand all Fold all

[PATCH v2 00/16] migration/vfio: Fix a few issues on API misuse or statistic reports

Posted by Peter Xu 1 month, 1 week ago

CI:  https://gitlab.com/peterx/qemu/-/pipelines/2469074018
rfc: https://lore.kernel.org/r/20260319231302.123135-1-peterx@redhat.com
v1:  https://lore.kernel.org/r/20260408165559.157108-1-peterx@redhat.com

v2:
- Added tags
- Patch 4
  - Fix and rework doc for @save_query_pending [Juraj]
  - Trace "exact" in trace_vfio_state_pending() [Avihai]
  - Avoid mentioning "pre-copy" in vfio.rst doc for query [Avihai]
- Patch 12
  - English errors [Fabiano]
- Patch 13
  - Remove " (bytes)" in HMP line [Fabiano]
- Added patch "qemu-iotests: Add query-migrate test for dirty-bitmap"
  - This covers a bug that I found when testing v1
- Added patch "vfio/migration: Add tracepoints for precopy/stopcopy query
  ioctls" to be able to dump the raw results from the two VFIO ioctls
- Replace patch "migration: Make qemu_savevm_query_pending() available
  anytime" with patch "migration: Remember total dirty bytes in mig_stats"
  - I fell back to "cache the total dirty bytes" idea on this one to avoid
    complication of save_query_pending() invoked anywhere.

Overview
========

VFIO migration was merged quite a while, but we do still see things off
here and there.  This series tries to address some of them, but only based
on my limited understandings.

Two major issues I wanted to resolve:

(1) VFIO reports state_pending_{exact|estimate}() differently

It reports stop-only sizes in exact() only (which includes both precopy and
stopcopy data), while in estimate() it only reports precopy data.  This is
violating the API.  It was done like it to trigger proper sync on the VFIO
ioctls only but it was only a workaround.  This series should fix it by
introducing stopcopy size reporting facility for vmstate handlers.

(2) expected_downtime / remaining doesn't take VFIO devices into account

When query migration, QEMU reports one field called "expected-downtime".
The document was phrasing this almost from RAM perspective, but ideally it
should be about an estimated blackout window (in milliseconds) if we
switchover anytime, based on known information.

This didn't yet took VFIO into account, especially in the case of VFIO
devices that may contain a large amount of device states (like GPUs).

For problem (2), the use case should be that an mgmt app when migrating a
VFIO GPU device needs to always adjust downtime for migration to converge,
because when it's involved normal downtime like 300ms will normally not
suffice.

Now the issue with that is the mgmt doesn't have a good way to know exactly
how well the precopy goes with the whole system and the GPU device.

The hope is fixed expected_downtime will provide one way for the mgmt app
to have a reasonable hint for downtime to setup to converge a migration.

Meanwhile, with a system-wise "remaining" field introduced, mgmt can query
this results at beginning of each iteration to know if a stall is
happening, IOW, if it's likely that this migration will not converge at
all.  When detected, mgmt can start to consider the expected_downtime value
reported above for converging this migration.  See more on testing below.

Tests
=====

Thanks to Cédric on help testing v2.  One thing to mention is we did
encounter one case where we observed reported dirty size overflowed for
uint64_t (on both expected_downtime and system remaining data).

Quotes from test results from Cédric, migrating a RHEL9 VM with a vGPU
(NVIDIA L4-2B) and an MLX5 VF, from a RHEL9 host (vGPU mdev) to a RHEL10
host (vGPU VF), with the vGPU under load (glxgears):

(qemu) info migrate
Status:                 active
Time (ms):              total=21140, setup=86, exp_down=152455434886355 <---- !?!
Remaining:              16 EiB                                          <---- !?!
RAM info:
  Throughput (Mbps):    967.98
  Sizes:                pagesize=4 KiB, total=4 GiB
  Transfers:            transferred=2.29 GiB, remain=4.7 MiB
    Channels:           precopy=1.91 GiB, multifd=0 B, postcopy=0 B, vfio=387 MiB
    Page Types:         normal=499427, zero=559708
  Page Rates (pps):     transfer=0, dirty=1892
  Others:               dirty_syncs=3

It got fixed itself after a few more rounds of iterations, so it also
didn't affects migration ultimately.  Further attempts didn't reproduce it
after I added the tracepoint patch. It would be good if someone knows if it
was a known driver issue.

For detailed testing steps, please refer to v1's cover letter.

Peter Xu (16):
  qemu-iotests: Add query-migrate test for dirty-bitmap
  migration: Fix low possibility downtime violation
  migration/qapi: Rename MigrationStats to MigrationRAMStats
  vfio/migration: Cache stop size in VFIOMigration
  migration/treewide: Merge @state_pending_{exact|estimate} APIs
  migration: Use the new save_query_pending() API directly
  migration: Introduce stopcopy_bytes in save_query_pending()
  vfio/migration: Fix incorrect reporting for VFIO pending data
  migration: Move iteration counter out of RAM
  migration: Introduce a helper to return switchover bw estimate
  migration: Calculate expected downtime on demand
  migration: Fix calculation of expected_downtime to take VFIO info
  migration: Remember total dirty bytes in mig_stats
  migration/qapi: Introduce system-wise "remaining" reports
  migration/qapi: Update unit for avail-switchover-bandwidth
  vfio/migration: Add tracepoints for precopy/stopcopy query ioctls

 docs/about/removed-features.rst               |   2 +-
 docs/devel/migration/main.rst                 |   9 +-
 docs/devel/migration/vfio.rst                 |   9 +-
 qapi/migration.json                           |  32 ++--
 hw/vfio/vfio-migration-internal.h             |   8 +
 include/migration/register.h                  |  59 +++---
 migration/migration-stats.h                   |  20 +-
 migration/migration.h                         |   2 +-
 migration/savevm.h                            |   7 +-
 hw/s390x/s390-stattrib.c                      |   9 +-
 hw/vfio/migration.c                           | 123 +++++++-----
 migration/block-dirty-bitmap.c                |  10 +-
 migration/migration-hmp-cmds.c                |   5 +
 migration/migration.c                         | 177 +++++++++++++-----
 migration/ram.c                               |  40 +---
 migration/savevm.c                            |  42 ++---
 hw/vfio/trace-events                          |   5 +-
 migration/trace-events                        |   3 +-
 .../tests/migrate-bitmaps-postcopy-test       |   6 +
 19 files changed, 322 insertions(+), 246 deletions(-)

-- 
2.53.0

Re: [PATCH v2 00/16] migration/vfio: Fix a few issues on API misuse or statistic reports

Posted by Peter Xu 1 month ago

On Tue, Apr 21, 2026 at 04:20:54PM -0400, Peter Xu wrote:
> Peter Xu (16):
>   qemu-iotests: Add query-migrate test for dirty-bitmap
>   migration: Fix low possibility downtime violation
>   migration/qapi: Rename MigrationStats to MigrationRAMStats
>   vfio/migration: Cache stop size in VFIOMigration
>   migration/treewide: Merge @state_pending_{exact|estimate} APIs
>   migration: Use the new save_query_pending() API directly
>   migration: Introduce stopcopy_bytes in save_query_pending()
>   vfio/migration: Fix incorrect reporting for VFIO pending data
>   migration: Move iteration counter out of RAM
>   migration: Introduce a helper to return switchover bw estimate
>   migration: Calculate expected downtime on demand
>   migration: Fix calculation of expected_downtime to take VFIO info
>   migration: Remember total dirty bytes in mig_stats
>   migration/qapi: Introduce system-wise "remaining" reports
>   migration/qapi: Update unit for avail-switchover-bandwidth
>   vfio/migration: Add tracepoints for precopy/stopcopy query ioctls

I queued patch 2-16, with slight amendments per reviewers on some patches.

-- 
Peter Xu