[PATCH 0/5] tests: improve reliability of migration test

Daniel P. Berrangé posted 5 patches 3 years, 7 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20220628105434.295905-1-berrange@redhat.com
Maintainers: Thomas Huth <thuth@redhat.com>, Laurent Vivier <lvivier@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, Juan Quintela <quintela@redhat.com>, "Dr. David Alan Gilbert" <dgilbert@redhat.com>
tests/qtest/migration-helpers.c | 14 ++++++
tests/qtest/migration-test.c    | 80 ++++++++++-----------------------
2 files changed, 38 insertions(+), 56 deletions(-)
[PATCH 0/5] tests: improve reliability of migration test
Posted by Daniel P. Berrangé 3 years, 7 months ago
Since the TLS tests were added a few people have reported seeing
hangs in some of the TLS test cases for migration. Debugging
has revealed that in all cases the test was waiting for a STOP
event that never arrived.

The problem is that TLS performance is highly dependant on the
crypto impl. Some people have been running tests on machines
which are highly efficient at running the guest dirtying workload
but relatively slow at TLS. This has prevented convergance from
being reliably achieved in the configured max downtime.

Since this test design has been long standing I suspect the
lack of convergance is a likely cause of previous hangs we've
seen in various scenarios that resulted in us disabling the test
on s390 TCG, ppc TCG and ppc KVM-PR.

Thus I have suggested we drop this skip conditions, though I would
note that I've not had the ability to actually test the effect that
this has. 

Daniel P. Berrangé (5):
  tests: wait max 120 seconds for migration test status changes
  tests: wait for migration completion before looking for STOP event
  tests: increase migration test converge downtime to 30 seconds
  tests: use consistent bandwidth/downtime limits in migration tests
  tests: stop skipping migration test on s390x/ppc64

 tests/qtest/migration-helpers.c | 14 ++++++
 tests/qtest/migration-test.c    | 80 ++++++++++-----------------------
 2 files changed, 38 insertions(+), 56 deletions(-)

-- 
2.36.1


Re: [PATCH 0/5] tests: improve reliability of migration test
Posted by Thomas Huth 3 years, 7 months ago
On 28/06/2022 12.54, Daniel P. Berrangé wrote:
> Since the TLS tests were added a few people have reported seeing
> hangs in some of the TLS test cases for migration. Debugging
> has revealed that in all cases the test was waiting for a STOP
> event that never arrived.
> 
> The problem is that TLS performance is highly dependant on the
> crypto impl. Some people have been running tests on machines
> which are highly efficient at running the guest dirtying workload
> but relatively slow at TLS. This has prevented convergance from
> being reliably achieved in the configured max downtime.
> 
> Since this test design has been long standing I suspect the
> lack of convergance is a likely cause of previous hangs we've
> seen in various scenarios that resulted in us disabling the test
> on s390 TCG, ppc TCG and ppc KVM-PR.
> 
> Thus I have suggested we drop this skip conditions, though I would
> note that I've not had the ability to actually test the effect that
> this has.
> 
> Daniel P. Berrangé (5):
>    tests: wait max 120 seconds for migration test status changes
>    tests: wait for migration completion before looking for STOP event
>    tests: increase migration test converge downtime to 30 seconds
>    tests: use consistent bandwidth/downtime limits in migration tests
>    tests: stop skipping migration test on s390x/ppc64
> 
>   tests/qtest/migration-helpers.c | 14 ++++++
>   tests/qtest/migration-test.c    | 80 ++++++++++-----------------------
>   2 files changed, 38 insertions(+), 56 deletions(-)

FYI, this is fixing the issue with the hang that I saw with the 
precopy/unix/tls/x509/override-host test on my RHEL8 s390x host.

Tested-by: Thomas Huth <thuth@redhat.com>