[PATCH] migration: Fix possible division by zero on calc expected downtime

Peter Xu posted 1 patch 2 weeks, 5 days ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20260511141049.1293449-1-peterx@redhat.com
Maintainers: Peter Xu <peterx@redhat.com>, Fabiano Rosas <farosas@suse.de>
There is a newer version of this series
migration/migration.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
[PATCH] migration: Fix possible division by zero on calc expected downtime
Posted by Peter Xu 2 weeks, 5 days ago
Commit dd4fe8844b changed the reporting of expected downtime behavior, so
that the value will be calculated on-demand.  One side effect on the change
is QEMU will allow the calculation to happen anytime even if there's no
transfer happening for a short while.

PeterM reported an ubsan report from clang when running migration-test with
aarch64 binary on x86_64 hosts.  I can also reproduce if I run the test
concurrently so some of the src QEMU may not get chance to push any data,
causing mbps to be 0:

../migration/migration.c:1051:12: runtime error: -nan is outside the range of representable values of type 'long'

Fix it by checking mbps non-zero before the devision.

Link: https://lore.kernel.org/r/CAFEAcA-MYH6C39xO0OLx4-M5pKurJpurwRsMqZe9q=W-NShAbw@mail.gmail.com
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Fixes: dd4fe8844b ("migration: Calculate expected downtime on demand")
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index b6f78eb3ac..b06e577328 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1044,12 +1044,18 @@ static bool migrate_show_downtime(MigrationState *s)
 /* Return expected downtime (unit: milliseconds) */
 int64_t migration_downtime_calc_expected(MigrationState *s)
 {
+    double bw;
+
     if (mig_stats.dirty_sync_count <= 1) {
         return migrate_downtime_limit();
     }
 
-    return mig_stats.dirty_bytes_last_sync /
-        migration_get_switchover_bw(s) * 1000;
+    bw = migration_get_switchover_bw(s) * 1000;
+    if (!bw) {
+        return INT64_MAX;
+    }
+
+    return mig_stats.dirty_bytes_last_sync / bw;
 }
 
 static void populate_time_info(MigrationInfo *info, MigrationState *s)
-- 
2.53.0
Re: [PATCH] migration: Fix possible division by zero on calc expected downtime
Posted by Peter Maydell 2 weeks, 5 days ago
On Mon, 11 May 2026 at 15:10, Peter Xu <peterx@redhat.com> wrote:
>
> Commit dd4fe8844b changed the reporting of expected downtime behavior, so
> that the value will be calculated on-demand.  One side effect on the change
> is QEMU will allow the calculation to happen anytime even if there's no
> transfer happening for a short while.
>
> PeterM reported an ubsan report from clang when running migration-test with
> aarch64 binary on x86_64 hosts.  I can also reproduce if I run the test
> concurrently so some of the src QEMU may not get chance to push any data,
> causing mbps to be 0:
>
> ../migration/migration.c:1051:12: runtime error: -nan is outside the range of representable values of type 'long'
>
> Fix it by checking mbps non-zero before the devision.
>
> Link: https://lore.kernel.org/r/CAFEAcA-MYH6C39xO0OLx4-M5pKurJpurwRsMqZe9q=W-NShAbw@mail.gmail.com
> Reported-by: Peter Maydell <peter.maydell@linaro.org>
> Fixes: dd4fe8844b ("migration: Calculate expected downtime on demand")
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index b6f78eb3ac..b06e577328 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1044,12 +1044,18 @@ static bool migrate_show_downtime(MigrationState *s)
>  /* Return expected downtime (unit: milliseconds) */
>  int64_t migration_downtime_calc_expected(MigrationState *s)
>  {
> +    double bw;
> +
>      if (mig_stats.dirty_sync_count <= 1) {
>          return migrate_downtime_limit();
>      }
>
> -    return mig_stats.dirty_bytes_last_sync /
> -        migration_get_switchover_bw(s) * 1000;
> +    bw = migration_get_switchover_bw(s) * 1000;
> +    if (!bw) {
> +        return INT64_MAX;
> +    }
> +
> +    return mig_stats.dirty_bytes_last_sync / bw;

I think this is still UB if bw is a value that's nonzero
but still small enough that mig_stats.dirty_bytes_last_sync / bw
is greater than INT64_MAX. It might be more robust to do
something like:

    double expected = mig_stats.dirty_bytes_last_sync /
        migration_get_switchover_bw(s) * 1000;

    if (expected >= (double)INT64_MAX) {
        return INT64_MAX;
    }
    return (int64_t)expected;

(which will also handle the infinity case, because
in IEEE fp infinity is larger than any finite value).

-- PMM
Re: [PATCH] migration: Fix possible division by zero on calc expected downtime
Posted by Peter Xu 2 weeks, 5 days ago
On Mon, May 11, 2026 at 03:26:06PM +0100, Peter Maydell wrote:
> On Mon, 11 May 2026 at 15:10, Peter Xu <peterx@redhat.com> wrote:
> >
> > Commit dd4fe8844b changed the reporting of expected downtime behavior, so
> > that the value will be calculated on-demand.  One side effect on the change
> > is QEMU will allow the calculation to happen anytime even if there's no
> > transfer happening for a short while.
> >
> > PeterM reported an ubsan report from clang when running migration-test with
> > aarch64 binary on x86_64 hosts.  I can also reproduce if I run the test
> > concurrently so some of the src QEMU may not get chance to push any data,
> > causing mbps to be 0:
> >
> > ../migration/migration.c:1051:12: runtime error: -nan is outside the range of representable values of type 'long'
> >
> > Fix it by checking mbps non-zero before the devision.
> >
> > Link: https://lore.kernel.org/r/CAFEAcA-MYH6C39xO0OLx4-M5pKurJpurwRsMqZe9q=W-NShAbw@mail.gmail.com
> > Reported-by: Peter Maydell <peter.maydell@linaro.org>
> > Fixes: dd4fe8844b ("migration: Calculate expected downtime on demand")
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  migration/migration.c | 10 ++++++++--
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/migration/migration.c b/migration/migration.c
> > index b6f78eb3ac..b06e577328 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -1044,12 +1044,18 @@ static bool migrate_show_downtime(MigrationState *s)
> >  /* Return expected downtime (unit: milliseconds) */
> >  int64_t migration_downtime_calc_expected(MigrationState *s)
> >  {
> > +    double bw;
> > +
> >      if (mig_stats.dirty_sync_count <= 1) {
> >          return migrate_downtime_limit();
> >      }
> >
> > -    return mig_stats.dirty_bytes_last_sync /
> > -        migration_get_switchover_bw(s) * 1000;
> > +    bw = migration_get_switchover_bw(s) * 1000;
> > +    if (!bw) {
> > +        return INT64_MAX;
> > +    }
> > +
> > +    return mig_stats.dirty_bytes_last_sync / bw;
> 
> I think this is still UB if bw is a value that's nonzero
> but still small enough that mig_stats.dirty_bytes_last_sync / bw
> is greater than INT64_MAX. It might be more robust to do
> something like:
> 
>     double expected = mig_stats.dirty_bytes_last_sync /
>         migration_get_switchover_bw(s) * 1000;
> 
>     if (expected >= (double)INT64_MAX) {
>         return INT64_MAX;
>     }
>     return (int64_t)expected;
> 
> (which will also handle the infinity case, because
> in IEEE fp infinity is larger than any finite value).

I almost treated mbps as integer, because when it's non-zero it almost
should be at least ~10byte/s (our unit is (uint)transferred / 100ms).. so I
doubt if the issue can happen. But indeed it's theoretically flawed.

I'll repost, thanks for the careful review, Peter.

-- 
Peter Xu