[PATCH RFC 00/12] migration/vfio: Fix a few issues on API misuse or statistic reports

Peter Xu posted 12 patches 2 days, 13 hours ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20260319231302.123135-1-peterx@redhat.com
Maintainers: Pierrick Bouvier <pierrick.bouvier@linaro.org>, Peter Xu <peterx@redhat.com>, Fabiano Rosas <farosas@suse.de>, Alex Williamson <alex@shazbot.org>, "Cédric Le Goater" <clg@redhat.com>, Halil Pasic <pasic@linux.ibm.com>, Christian Borntraeger <borntraeger@linux.ibm.com>, Jason Herne <jjherne@linux.ibm.com>, Richard Henderson <richard.henderson@linaro.org>, Ilya Leoshkevich <iii@linux.ibm.com>, David Hildenbrand <david@kernel.org>, Eric Farman <farman@linux.ibm.com>, Matthew Rosato <mjrosato@linux.ibm.com>, Cornelia Huck <cohuck@redhat.com>, Eric Blake <eblake@redhat.com>, Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>, John Snow <jsnow@redhat.com>, Markus Armbruster <armbru@redhat.com>
docs/about/removed-features.rst   |   2 +-
docs/devel/migration/main.rst     |   5 +-
docs/devel/migration/vfio.rst     |   9 +-
qapi/migration.json               |   8 +-
hw/vfio/vfio-migration-internal.h |   1 +
include/migration/register.h      |  64 ++++++------
migration/migration-stats.h       |  15 +--
migration/migration.h             |   2 +-
migration/savevm.h                |   7 +-
hw/s390x/s390-stattrib.c          |   8 +-
hw/vfio/migration.c               | 114 ++++++++++++---------
migration/block-dirty-bitmap.c    |   9 +-
migration/migration.c             | 159 +++++++++++++++++++++---------
migration/ram.c                   |  39 ++------
migration/savevm.c                |  37 ++-----
hw/vfio/trace-events              |   3 +-
migration/trace-events            |   1 +
17 files changed, 254 insertions(+), 229 deletions(-)
[PATCH RFC 00/12] migration/vfio: Fix a few issues on API misuse or statistic reports
Posted by Peter Xu 2 days, 13 hours ago
CI: https://gitlab.com/peterx/qemu/-/pipelines/2396755777

VFIO migration was merged quite a while, but we do still see things off
here and there.  This series tries to address some of them, but only based
on my limited understandings.

This is RFC series as I don't have VFIO devices to test.  So one can also
see this as a raw proposal on raising the issues first with a solution that
hasn't been well tested.  However I tested non-vfio side and so far no
issue I'm aware.

Two major issues I wanted to resolve here:

(1) VFIO reports state_pending_{exact|estimate}() differently

It reports stop-only sizes in exact() only (which includes both precopy and
stopcopy data), while in estimate() it only reports precopy data.  This is
violating the API.  The guess was it was done like it to trigger proper
sync on the VFIO ioctls only.  This series should fix it by introducing
stopcopy size reporting facility for vmstate handlers.

(2) expected_downtime doesn't take VFIO devices into account

When query migration, QEMU reports one field called "expected-downtime".
The document was phrasing this almost from RAM perspective, but ideally it
should be about an estimated blackout window (in milliseconds) if we
switchover anytime, based on known information.

This didn't yet took VFIO into account, especially in the case of VFIO
devices that may contain a large amount of device states (like GPUs).

For problem (2), the use case should be that an mgmt app when migrating a
VFIO GPU device needs to always adjust downtime for migration to converge,
because when it's involved normal downtime like 300ms will normally not
suffice.

Now the issue with that is the mgmt doesn't have a good way to know exactly
how well the precopy goes with the whole system and the GPU device.

The hope is fixing expected_downtime may at least provide one way for the
mgmt app so that it can monitor this field in query-migrate at the start of
each iteration (by enabling events) to guess the progress, and it also may
provide a relatively reasonable hint for downtime at least in case of GPUs.
For this part, I tested nothing, so it's only a guess for now, but that's
the wish it'll be something easier to use than before.

When without GPU, mgmt can also monitor this field (which so far is the
only global field that one can query) so as to know how iterations are
making progresses, so the mgmt should expect to see expected_downtime
shrinking over iterations, and when it stops shrinking for a few iterations
maybe it's wise to do something about it.

Tests or reviews will be very much welcomed.

Thanks,

Peter Xu (12):
  migration: Fix low possibility downtime violation
  migration/qapi: Rename MigrationStats to MigrationRAMStats
  vfio/migration: Throttle vfio_save_block() on data size to read
  vfio/migration: Cache stop size in VFIOMigration
  migration/treewide: Merge @state_pending_{exact|estimate} APIs
  migration: Use the new save_query_pending() API directly
  migration: Introduce stopcopy_bytes in save_query_pending()
  vfio/migration: Fix incorrect reporting for VFIO pending data
  migration: Make iteration counter out of RAM
  migration: Introduce a helper to return switchover bw estimate
  migration: Calculate expected downtime on demand
  migration: Fix calculation of expected_downtime to take VFIO info

 docs/about/removed-features.rst   |   2 +-
 docs/devel/migration/main.rst     |   5 +-
 docs/devel/migration/vfio.rst     |   9 +-
 qapi/migration.json               |   8 +-
 hw/vfio/vfio-migration-internal.h |   1 +
 include/migration/register.h      |  64 ++++++------
 migration/migration-stats.h       |  15 +--
 migration/migration.h             |   2 +-
 migration/savevm.h                |   7 +-
 hw/s390x/s390-stattrib.c          |   8 +-
 hw/vfio/migration.c               | 114 ++++++++++++---------
 migration/block-dirty-bitmap.c    |   9 +-
 migration/migration.c             | 159 +++++++++++++++++++++---------
 migration/ram.c                   |  39 ++------
 migration/savevm.c                |  37 ++-----
 hw/vfio/trace-events              |   3 +-
 migration/trace-events            |   1 +
 17 files changed, 254 insertions(+), 229 deletions(-)

-- 
2.50.1