[PATCH] tests/migration: Add some slack to auto converge

Dr. David Alan Gilbert (git) posted 1 patch 5 years, 9 months ago
Test docker-quick@centos7 passed
Test FreeBSD passed
Test docker-mingw@fedora passed
Test checkpatch passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20200210195731.177595-1-dgilbert@redhat.com
Maintainers: "Dr. David Alan Gilbert" <dgilbert@redhat.com>, Juan Quintela <quintela@redhat.com>, Thomas Huth <thuth@redhat.com>, Laurent Vivier <lvivier@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>
tests/qtest/migration-test.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
[PATCH] tests/migration: Add some slack to auto converge
Posted by Dr. David Alan Gilbert (git) 5 years, 9 months ago
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

There's an assert in autoconverge that checks that we quit the
iteration when we go below the expected threshold.  Philippe
saw a case where this assert fired with the measured value
slightly over the threshold. (about 3k out of a few million).

I can think of two reasons:
  a) Rounding errors
  b) That after we make the decision to quit iteration we do one
    more sync and that sees a few more dirty pages.

So add 1% slack to the assertion, that should cover a and
most cases of b, probably all we'll see for the test.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tests/qtest/migration-test.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index cf27ebbc9d..a78ac0c7da 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1237,7 +1237,8 @@ static void test_migrate_auto_converge(void)
     g_assert_cmpint(percentage, <=, max_pct);
 
     remaining = read_ram_property_int(from, "remaining");
-    g_assert_cmpint(remaining, <, expected_threshold);
+    g_assert_cmpint(remaining, <,
+                    (expected_threshold + expected_threshold / 100));
 
     migrate_continue(from, "pre-switchover");
 
-- 
2.24.1


Re: [PATCH] tests/migration: Add some slack to auto converge
Posted by Juan Quintela 5 years, 9 months ago
"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> There's an assert in autoconverge that checks that we quit the
> iteration when we go below the expected threshold.  Philippe
> saw a case where this assert fired with the measured value
> slightly over the threshold. (about 3k out of a few million).
>
> I can think of two reasons:
>   a) Rounding errors
>   b) That after we make the decision to quit iteration we do one
>     more sync and that sees a few more dirty pages.
>
> So add 1% slack to the assertion, that should cover a and
> most cases of b, probably all we'll see for the test.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

It shouldn't matter really.  And if we are seeing that problem.


Re: [PATCH] tests/migration: Add some slack to auto converge
Posted by Peter Xu 5 years, 9 months ago
On Mon, Feb 10, 2020 at 07:57:31PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> There's an assert in autoconverge that checks that we quit the
> iteration when we go below the expected threshold.  Philippe
> saw a case where this assert fired with the measured value
> slightly over the threshold. (about 3k out of a few million).
> 
> I can think of two reasons:
>   a) Rounding errors

Yes, I feel like we're having approximate values here and there in
migration code. One simple example is that we calculate bandwidth
using sent bytes and timestamps, however our sent bytes should in most
cases increase with target page size only (4096 on x86), and then to
divide an ~100ms value which changes too from time to time, then
everything comes out of these inaccurate numbers could be inaccurate..

>   b) That after we make the decision to quit iteration we do one
>     more sync and that sees a few more dirty pages.

True..  I also didn't see how to guarantee the extra 1% will cover
this case either...

> 
> So add 1% slack to the assertion, that should cover a and
> most cases of b, probably all we'll see for the test.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

... But if what we see are always 3K or around then maybe we only need
to fixup that special value.  Assuming this is for sure going to
unbreak tests:

Reviewed-by: Peter Xu <peterx@redhat.com>

(PS: I would probably remove this check as a whole :)

> ---
>  tests/qtest/migration-test.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index cf27ebbc9d..a78ac0c7da 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -1237,7 +1237,8 @@ static void test_migrate_auto_converge(void)
>      g_assert_cmpint(percentage, <=, max_pct);
>  
>      remaining = read_ram_property_int(from, "remaining");
> -    g_assert_cmpint(remaining, <, expected_threshold);
> +    g_assert_cmpint(remaining, <,
> +                    (expected_threshold + expected_threshold / 100));
>  
>      migrate_continue(from, "pre-switchover");
>  
> -- 
> 2.24.1
> 

-- 
Peter Xu