[PATCH 0/2] tests/migration: Fix migration-test slowdown

Juan Quintela posted 2 patches 1 year ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20230412142001.16501-1-quintela@redhat.com
Maintainers: Juan Quintela <quintela@redhat.com>, "Dr. David Alan Gilbert" <dgilbert@redhat.com>, Thomas Huth <thuth@redhat.com>, Laurent Vivier <lvivier@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>
tests/qtest/migration-test.c | 26 ++++++++++++++++++++++----
1 file changed, 22 insertions(+), 4 deletions(-)
[PATCH 0/2] tests/migration: Fix migration-test slowdown
Posted by Juan Quintela 1 year ago
Since commit:

commit 1bfc8dde505f1e6a92697c52aa9b09e81b54c78f
Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
Date:   Mon Mar 6 15:26:12 2023 +0000

    tests/migration: Tweek auto converge limits check

    Thomas found an autoconverge test failure where the
    migration completed before the autoconverge had kicked in.
    [...]

migration-test has become very slow.
On my laptop, before that commit migration-test takes 2min10seconds
After that commit, it takes around 11minutes

We can't revert it because it fixes a real problem when the host
machine is overloaded.  See the comment on test_migrate_auto_converge().

So I did two things here:

- Once that we have setup up precopy, we can move to full speed, no
  need to continue at 3MB/s.  That slowed everything a lot.

- Only run auto_converge tests when we are asking for slow tests.

  Only that test on my hardware requires more than 1min to run.  We
  need to run it at 3MB/s, but we are asking it to do 15 iterations
  throgh 150MB of RAM.  We can have a test that is (reasonably) fast,
  or one that also works when machine is very loaded.

To test that things still works over load, I used my desktop (ancient
core i9900), and run migration-test in a loop in 20 terminals (load
was 40) and didn't see a single failure in more than 1 hour run.

Please, review.

PD.  Yes, I am still looking at the dreaded multifd_cancel test, but
even on this setup, I am unable to get a failure.  I have never seen a
failure on it when I am running it, but I am only running x86 kvm
linux.  Moving to arm or tcg and see how well that goes.

Juan Quintela (2):
  tests/migration: Make precopy fast
  tests/migration: Only run auto_converge in slow mode

 tests/qtest/migration-test.c | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

-- 
2.39.2
Re: [PATCH 0/2] tests/migration: Fix migration-test slowdown
Posted by Thomas Huth 1 year ago
On 12/04/2023 16.19, Juan Quintela wrote:
> Since commit:
> 
> commit 1bfc8dde505f1e6a92697c52aa9b09e81b54c78f
> Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Date:   Mon Mar 6 15:26:12 2023 +0000
> 
>      tests/migration: Tweek auto converge limits check
> 
>      Thomas found an autoconverge test failure where the
>      migration completed before the autoconverge had kicked in.
>      [...]
> 
> migration-test has become very slow.
> On my laptop, before that commit migration-test takes 2min10seconds
> After that commit, it takes around 11minutes
> 
> We can't revert it because it fixes a real problem when the host
> machine is overloaded.  See the comment on test_migrate_auto_converge().

Thanks, your patches decrease the time to run the migration-test from 16 
minutes down to 5 minutes on my system, that's a great improvement, indeed!

Tested-by: Thomas Huth <thuth@redhat.com>

(though 5 minutes are still quite a lot for qtests ... maybe some other 
parts could be moved to only run with g_test_slow() ?)
Re: [PATCH 0/2] tests/migration: Fix migration-test slowdown
Posted by Juan Quintela 1 year ago
Thomas Huth <thuth@redhat.com> wrote:
> On 12/04/2023 16.19, Juan Quintela wrote:
>> Since commit:
>> commit 1bfc8dde505f1e6a92697c52aa9b09e81b54c78f
>> Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> Date:   Mon Mar 6 15:26:12 2023 +0000
>>      tests/migration: Tweek auto converge limits check
>>      Thomas found an autoconverge test failure where the
>>      migration completed before the autoconverge had kicked in.
>>      [...]
>> migration-test has become very slow.
>> On my laptop, before that commit migration-test takes 2min10seconds
>> After that commit, it takes around 11minutes
>> We can't revert it because it fixes a real problem when the host
>> machine is overloaded.  See the comment on test_migrate_auto_converge().
>
> Thanks, your patches decrease the time to run the migration-test from
> 16 minutes down to 5 minutes on my system, that's a great improvement,
> indeed!
>
> Tested-by: Thomas Huth <thuth@redhat.com>
>
> (though 5 minutes are still quite a lot for qtests ... maybe some
> other parts could be moved to only run with g_test_slow() ?)

And once that we are on this topic.  Is there a way to launch several
tests on the same binary on parallel?
i.e. every migration thread uses a maximum of 2 cores, so in a server I
can run several at the same time (I know that migration-test.c tests
need to be modified so they don't interfere, but I have that changes on
my tree), but I don't know of a way to launch them.

Thanks, Juan.

PD, and I don't know why launching a qemu is so slow, the minimal time
that I am able to get for launching the two qemus is around 0.5 seconds.

Later, Juan.
Re: [PATCH 0/2] tests/migration: Fix migration-test slowdown
Posted by Juan Quintela 1 year ago
Thomas Huth <thuth@redhat.com> wrote:
> On 12/04/2023 16.19, Juan Quintela wrote:
>> Since commit:
>> commit 1bfc8dde505f1e6a92697c52aa9b09e81b54c78f
>> Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> Date:   Mon Mar 6 15:26:12 2023 +0000
>>      tests/migration: Tweek auto converge limits check
>>      Thomas found an autoconverge test failure where the
>>      migration completed before the autoconverge had kicked in.
>>      [...]
>> migration-test has become very slow.
>> On my laptop, before that commit migration-test takes 2min10seconds
>> After that commit, it takes around 11minutes
>> We can't revert it because it fixes a real problem when the host
>> machine is overloaded.  See the comment on test_migrate_auto_converge().
>
> Thanks, your patches decrease the time to run the migration-test from
> 16 minutes down to 5 minutes on my system, that's a great improvement,
> indeed!
>
> Tested-by: Thomas Huth <thuth@redhat.com>

Thanks

> (though 5 minutes are still quite a lot for qtests ... maybe some
> other parts could be moved to only run with g_test_slow() ?)

Hi

Could you gime the output of:

time for i in $(./tests/qtest/migration-test -l | grep "^/"); do echo $i; time ./tests/qtest/migration-test -p $i; done

To see what tests are taking so long on your system?

On my system (i9900K processor, i.e. not the latest) and auto_converge
moved to slow the total of the tests take a bit more than 1 minute.

qemu-system-x86_64 on x86_64 host:
real	0m54.295s
user	0m47.283s
sys	0m16.969s

qemu-system-aarch64 on x86_64 host:

real	0m42.466s
user	0m42.247s
sys	0m13.747s

s390x and ppc64 refuse to run non-natively.

Later, Juan.
Re: [PATCH 0/2] tests/migration: Fix migration-test slowdown
Posted by Thomas Huth 1 year ago
On 18/04/2023 13.42, Juan Quintela wrote:
> Thomas Huth <thuth@redhat.com> wrote:
>> On 12/04/2023 16.19, Juan Quintela wrote:
>>> Since commit:
>>> commit 1bfc8dde505f1e6a92697c52aa9b09e81b54c78f
>>> Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>> Date:   Mon Mar 6 15:26:12 2023 +0000
>>>       tests/migration: Tweek auto converge limits check
>>>       Thomas found an autoconverge test failure where the
>>>       migration completed before the autoconverge had kicked in.
>>>       [...]
>>> migration-test has become very slow.
>>> On my laptop, before that commit migration-test takes 2min10seconds
>>> After that commit, it takes around 11minutes
>>> We can't revert it because it fixes a real problem when the host
>>> machine is overloaded.  See the comment on test_migrate_auto_converge().
>>
>> Thanks, your patches decrease the time to run the migration-test from
>> 16 minutes down to 5 minutes on my system, that's a great improvement,
>> indeed!
>>
>> Tested-by: Thomas Huth <thuth@redhat.com>
> 
> Thanks
> 
>> (though 5 minutes are still quite a lot for qtests ... maybe some
>> other parts could be moved to only run with g_test_slow() ?)
> 
> Hi
> 
> Could you gime the output of:
> 
> time for i in $(./tests/qtest/migration-test -l | grep "^/"); do echo $i; time ./tests/qtest/migration-test -p $i; done
> 
> To see what tests are taking so long on your system?
> 
> On my system (i9900K processor, i.e. not the latest) and auto_converge
> moved to slow the total of the tests take a bit more than 1 minute.

This is with both of your patches applied:

/x86_64/migration/bad_dest
/x86_64/migration/bad_dest: OK

real	0m0,342s
user	0m0,123s
sys	0m0,088s
/x86_64/migration/fd_proto
/x86_64/migration/fd_proto: OK

real	0m1,135s
user	0m0,730s
sys	0m0,514s
/x86_64/migration/validate_uuid
/x86_64/migration/validate_uuid: OK

real	0m0,519s
user	0m0,379s
sys	0m0,230s
/x86_64/migration/validate_uuid_error
/x86_64/migration/validate_uuid_error: OK

real	0m0,275s
user	0m0,145s
sys	0m0,114s
/x86_64/migration/validate_uuid_src_not_set
/x86_64/migration/validate_uuid_src_not_set: OK

real	0m0,514s
user	0m0,377s
sys	0m0,225s
/x86_64/migration/validate_uuid_dst_not_set
/x86_64/migration/validate_uuid_dst_not_set: OK

real	0m0,519s
user	0m0,392s
sys	0m0,220s
/x86_64/migration/dirty_ring
/x86_64/migration/dirty_ring: OK

real	0m1,079s
user	0m0,613s
sys	0m0,532s
/x86_64/migration/vcpu_dirty_limit
/x86_64/migration/vcpu_dirty_limit: OK

real	0m6,308s
user	0m4,025s
sys	0m1,224s
/x86_64/migration/postcopy/plain
/x86_64/migration/postcopy/plain: OK

real	0m35,446s
user	0m47,208s
sys	0m11,828s
/x86_64/migration/postcopy/recovery/plain
/x86_64/migration/postcopy/recovery/plain: OK

real	0m34,707s
user	0m46,357s
sys	0m11,366s
/x86_64/migration/postcopy/recovery/tls/psk
/x86_64/migration/postcopy/recovery/tls/psk: OK

real	0m33,052s
user	0m46,539s
sys	0m11,537s
/x86_64/migration/postcopy/preempt/plain
/x86_64/migration/postcopy/preempt/plain: OK

real	0m35,107s
user	0m46,556s
sys	0m11,755s
/x86_64/migration/postcopy/preempt/recovery/plain
/x86_64/migration/postcopy/preempt/recovery/plain: OK

real	0m35,329s
user	0m46,951s
sys	0m11,529s
/x86_64/migration/postcopy/preempt/recovery/tls/psk
/x86_64/migration/postcopy/preempt/recovery/tls/psk: OK

real	0m36,237s
user	0m51,450s
sys	0m12,419s
/x86_64/migration/postcopy/preempt/tls/psk
/x86_64/migration/postcopy/preempt/tls/psk: OK

real	0m35,033s
user	0m49,244s
sys	0m12,123s
/x86_64/migration/postcopy/tls/psk
/x86_64/migration/postcopy/tls/psk: OK

real	0m36,097s
user	0m50,873s
sys	0m12,569s
/x86_64/migration/precopy/unix/plain
/x86_64/migration/precopy/unix/plain: OK

real	0m1,034s
user	0m0,654s
sys	0m0,463s
/x86_64/migration/precopy/unix/xbzrle
/x86_64/migration/precopy/unix/xbzrle: OK

real	0m1,119s
user	0m0,740s
sys	0m0,499s
/x86_64/migration/precopy/unix/tls/psk
/x86_64/migration/precopy/unix/tls/psk: OK

real	0m3,555s
user	0m5,448s
sys	0m0,655s
/x86_64/migration/precopy/unix/tls/x509/default-host
/x86_64/migration/precopy/unix/tls/x509/default-host: OK

real	0m1,022s
user	0m1,664s
sys	0m0,112s
/x86_64/migration/precopy/unix/tls/x509/override-host
/x86_64/migration/precopy/unix/tls/x509/override-host: OK

real	0m1,841s
user	0m1,921s
sys	0m0,739s
/x86_64/migration/precopy/tcp/plain
/x86_64/migration/precopy/tcp/plain: OK

real	0m1,241s
user	0m0,859s
sys	0m0,584s
/x86_64/migration/precopy/tcp/tls/psk/match
/x86_64/migration/precopy/tcp/tls/psk/match: OK

real	0m2,114s
user	0m2,628s
sys	0m0,613s
/x86_64/migration/precopy/tcp/tls/psk/mismatch
/x86_64/migration/precopy/tcp/tls/psk/mismatch: OK

real	0m0,575s
user	0m0,554s
sys	0m0,116s
/x86_64/migration/precopy/tcp/tls/x509/default-host
/x86_64/migration/precopy/tcp/tls/x509/default-host: OK

real	0m1,538s
user	0m1,460s
sys	0m0,608s
/x86_64/migration/precopy/tcp/tls/x509/override-host
/x86_64/migration/precopy/tcp/tls/x509/override-host: OK

real	0m1,825s
user	0m1,915s
sys	0m0,703s
/x86_64/migration/precopy/tcp/tls/x509/mismatch-host
/x86_64/migration/precopy/tcp/tls/x509/mismatch-host: OK

real	0m0,961s
user	0m1,430s
sys	0m0,111s
/x86_64/migration/precopy/tcp/tls/x509/friendly-client
/x86_64/migration/precopy/tcp/tls/x509/friendly-client: OK

real	0m1,806s
user	0m1,897s
sys	0m0,679s
/x86_64/migration/precopy/tcp/tls/x509/hostile-client
/x86_64/migration/precopy/tcp/tls/x509/hostile-client: OK

real	0m0,645s
user	0m0,614s
sys	0m0,136s
/x86_64/migration/precopy/tcp/tls/x509/allow-anon-client
/x86_64/migration/precopy/tcp/tls/x509/allow-anon-client: OK

real	0m2,204s
user	0m2,695s
sys	0m0,667s
/x86_64/migration/precopy/tcp/tls/x509/reject-anon-client
/x86_64/migration/precopy/tcp/tls/x509/reject-anon-client: OK

real	0m1,530s
user	0m2,360s
sys	0m0,156s
/x86_64/migration/multifd/tcp/plain/none
/x86_64/migration/multifd/tcp/plain/none: OK

real	0m1,055s
user	0m0,647s
sys	0m0,592s
/x86_64/migration/multifd/tcp/plain/zlib
/x86_64/migration/multifd/tcp/plain/zlib: OK

real	0m1,144s
user	0m1,763s
sys	0m0,437s
/x86_64/migration/multifd/tcp/plain/zstd
/x86_64/migration/multifd/tcp/plain/zstd: OK

real	0m1,073s
user	0m0,999s
sys	0m0,537s
/x86_64/migration/multifd/tcp/tls/psk/match
/x86_64/migration/multifd/tcp/tls/psk/match: OK

real	0m1,453s
user	0m2,475s
sys	0m0,704s
/x86_64/migration/multifd/tcp/tls/psk/mismatch
/x86_64/migration/multifd/tcp/tls/psk/mismatch: OK

real	0m0,905s
user	0m1,256s
sys	0m0,106s
/x86_64/migration/multifd/tcp/tls/x509/default-host
/x86_64/migration/multifd/tcp/tls/x509/default-host: OK

real	0m3,761s
user	0m5,874s
sys	0m0,985s
/x86_64/migration/multifd/tcp/tls/x509/override-host
/x86_64/migration/multifd/tcp/tls/x509/override-host: OK

real	0m3,238s
user	0m4,794s
sys	0m0,998s
/x86_64/migration/multifd/tcp/tls/x509/mismatch-host
/x86_64/migration/multifd/tcp/tls/x509/mismatch-host: OK

real	0m0,851s
user	0m1,007s
sys	0m0,120s
/x86_64/migration/multifd/tcp/tls/x509/allow-anon-client
/x86_64/migration/multifd/tcp/tls/x509/allow-anon-client: OK

real	0m2,607s
user	0m3,530s
sys	0m1,013s
/x86_64/migration/multifd/tcp/tls/x509/reject-anon-client
/x86_64/migration/multifd/tcp/tls/x509/reject-anon-client: OK

real	0m1,915s
user	0m3,223s
sys	0m0,180s

real	5m32,733s
user	7m24,380s
sys	1m50,801s

  Thomas
Re: [PATCH 0/2] tests/migration: Fix migration-test slowdown
Posted by Juan Quintela 1 year ago
Thomas Huth <thuth@redhat.com> wrote:
> On 18/04/2023 13.42, Juan Quintela wrote:
>> Thomas Huth <thuth@redhat.com> wrote:
>>> On 12/04/2023 16.19, Juan Quintela wrote:
>>>> Since commit:
>>>> commit 1bfc8dde505f1e6a92697c52aa9b09e81b54c78f
>>>> Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>> Date:   Mon Mar 6 15:26:12 2023 +0000
>>>>       tests/migration: Tweek auto converge limits check
>>>>       Thomas found an autoconverge test failure where the
>>>>       migration completed before the autoconverge had kicked in.
>>>>       [...]
>>>> migration-test has become very slow.
>>>> On my laptop, before that commit migration-test takes 2min10seconds
>>>> After that commit, it takes around 11minutes
>>>> We can't revert it because it fixes a real problem when the host
>>>> machine is overloaded.  See the comment on test_migrate_auto_converge().
>>>
>>> Thanks, your patches decrease the time to run the migration-test from
>>> 16 minutes down to 5 minutes on my system, that's a great improvement,
>>> indeed!
>>>
>>> Tested-by: Thomas Huth <thuth@redhat.com>
>> Thanks
>> 
>>> (though 5 minutes are still quite a lot for qtests ... maybe some
>>> other parts could be moved to only run with g_test_slow() ?)
>> Hi
>> Could you gime the output of:
>> time for i in $(./tests/qtest/migration-test -l | grep "^/"); do
>> echo $i; time ./tests/qtest/migration-test -p $i; done
>> To see what tests are taking so long on your system?
>> On my system (i9900K processor, i.e. not the latest) and
>> auto_converge
>> moved to slow the total of the tests take a bit more than 1 minute.
>
> This is with both of your patches applied:


> /x86_64/migration/postcopy/plain
> /x86_64/migration/postcopy/plain: OK
>
> real	0m35,446s
> user	0m47,208s
> sys	0m11,828s

This is quite slower than on mine, basically almost all the code that
does migration.

$ time ./tests/qtest/migration-test -p /x86_64/migration/postcopy/plain
# random seed: R02S42809b71f513e8524bd24df5facd5768
# Start of x86_64 tests
# Start of migration tests
# Start of postcopy tests
# starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-246853.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-246853.qmp,id=char0 -mon chardev=char0,mode=control -display none -accel kvm -accel tcg -name source,debug-threads=on -m 150M -serial file:/tmp/migration-test-1MGL31/src_serial -drive file=/tmp/migration-test-1MGL31/bootsect,format=raw    -accel qtest
# starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-246853.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-246853.qmp,id=char0 -mon chardev=char0,mode=control -display none -accel kvm -accel tcg -name target,debug-threads=on -m 150M -serial file:/tmp/migration-test-1MGL31/dest_serial -incoming unix:/tmp/migration-test-1MGL31/migsocket -drive file=/tmp/migration-test-1MGL31/bootsect,format=raw    -accel qtest
ok 1 /x86_64/migration/postcopy/plain
# End of postcopy tests
# End of migration tests
# End of x86_64 tests
1..1

real	0m1.104s
user	0m0.697s
sys	0m0.414s





> /x86_64/migration/postcopy/recovery/plain
> /x86_64/migration/postcopy/recovery/plain: OK
>
> real	0m34,707s
> user	0m46,357s
> sys	0m11,366s
> /x86_64/migration/postcopy/recovery/tls/psk
> /x86_64/migration/postcopy/recovery/tls/psk: OK
>
> real	0m33,052s
> user	0m46,539s
> sys	0m11,537s
> /x86_64/migration/postcopy/preempt/plain
> /x86_64/migration/postcopy/preempt/plain: OK
>
> real	0m35,107s
> user	0m46,556s
> sys	0m11,755s
> /x86_64/migration/postcopy/preempt/recovery/plain
> /x86_64/migration/postcopy/preempt/recovery/plain: OK
>
> real	0m35,329s
> user	0m46,951s
> sys	0m11,529s
> /x86_64/migration/postcopy/preempt/recovery/tls/psk
> /x86_64/migration/postcopy/preempt/recovery/tls/psk: OK
>
> real	0m36,237s
> user	0m51,450s
> sys	0m12,419s
> /x86_64/migration/postcopy/preempt/tls/psk
> /x86_64/migration/postcopy/preempt/tls/psk: OK
>
> real	0m35,033s
> user	0m49,244s
> sys	0m12,123s
> /x86_64/migration/postcopy/tls/psk
> /x86_64/migration/postcopy/tls/psk: OK
>
> real	0m36,097s
> user	0m50,873s
> sys	0m12,569s


> real	5m32,733s
> user	7m24,380s
> sys	1m50,801s

Ouch.

Can I ask:
- what is your machine?  It is specially slow?
  Otherwise I want to know why it is happening.

- as what is going slow to you is postcopy, can you told me what is this
  setting?

# we want postcopy to work for normal users
vm.unprivileged_userfaultfd = 1

And if it is not set, just change it and retest.

Thanks, Juan.
Re: [PATCH 0/2] tests/migration: Fix migration-test slowdown
Posted by Daniel P. Berrangé 1 year ago
On Tue, Apr 18, 2023 at 03:19:33PM +0200, Juan Quintela wrote:
> Thomas Huth <thuth@redhat.com> wrote:
> > On 18/04/2023 13.42, Juan Quintela wrote:
> >> Thomas Huth <thuth@redhat.com> wrote:
> >>> On 12/04/2023 16.19, Juan Quintela wrote:
> >>>> Since commit:
> >>>> commit 1bfc8dde505f1e6a92697c52aa9b09e81b54c78f
> >>>> Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >>>> Date:   Mon Mar 6 15:26:12 2023 +0000
> >>>>       tests/migration: Tweek auto converge limits check
> >>>>       Thomas found an autoconverge test failure where the
> >>>>       migration completed before the autoconverge had kicked in.
> >>>>       [...]
> >>>> migration-test has become very slow.
> >>>> On my laptop, before that commit migration-test takes 2min10seconds
> >>>> After that commit, it takes around 11minutes
> >>>> We can't revert it because it fixes a real problem when the host
> >>>> machine is overloaded.  See the comment on test_migrate_auto_converge().
> >>>
> >>> Thanks, your patches decrease the time to run the migration-test from
> >>> 16 minutes down to 5 minutes on my system, that's a great improvement,
> >>> indeed!
> >>>
> >>> Tested-by: Thomas Huth <thuth@redhat.com>
> >> Thanks
> >> 
> >>> (though 5 minutes are still quite a lot for qtests ... maybe some
> >>> other parts could be moved to only run with g_test_slow() ?)
> >> Hi
> >> Could you gime the output of:
> >> time for i in $(./tests/qtest/migration-test -l | grep "^/"); do
> >> echo $i; time ./tests/qtest/migration-test -p $i; done
> >> To see what tests are taking so long on your system?
> >> On my system (i9900K processor, i.e. not the latest) and
> >> auto_converge
> >> moved to slow the total of the tests take a bit more than 1 minute.
> >
> > This is with both of your patches applied:
> 
> 
> > /x86_64/migration/postcopy/plain
> > /x86_64/migration/postcopy/plain: OK
> >
> > real	0m35,446s
> > user	0m47,208s
> > sys	0m11,828s
> 
> This is quite slower than on mine, basically almost all the code that
> does migration.

This is expected AFAIK.  The migrate_postcopy_prepare method
waits for 1 complete pre-copy pass to run at 3mbps, before
switching to pre-copy mode.

> 
> $ time ./tests/qtest/migration-test -p /x86_64/migration/postcopy/plain
> # random seed: R02S42809b71f513e8524bd24df5facd5768
> # Start of x86_64 tests
> # Start of migration tests
> # Start of postcopy tests
> # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-246853.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-246853.qmp,id=char0 -mon chardev=char0,mode=control -display none -accel kvm -accel tcg -name source,debug-threads=on -m 150M -serial file:/tmp/migration-test-1MGL31/src_serial -drive file=/tmp/migration-test-1MGL31/bootsect,format=raw    -accel qtest
> # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-246853.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-246853.qmp,id=char0 -mon chardev=char0,mode=control -display none -accel kvm -accel tcg -name target,debug-threads=on -m 150M -serial file:/tmp/migration-test-1MGL31/dest_serial -incoming unix:/tmp/migration-test-1MGL31/migsocket -drive file=/tmp/migration-test-1MGL31/bootsect,format=raw    -accel qtest
> ok 1 /x86_64/migration/postcopy/plain
> # End of postcopy tests
> # End of migration tests
> # End of x86_64 tests
> 1..1
> 
> real	0m1.104s
> user	0m0.697s
> sys	0m0.414s

That is surprisingly fast - it is like it is not doing the pre-copy
pass at all.


> > real	5m32,733s
> > user	7m24,380s
> > sys	1m50,801s
> 
> Ouch.
> 
> Can I ask:
> - what is your machine?  It is specially slow?
>   Otherwise I want to know why it is happening.

This matches what I see in my laptop - any test which runs a full
pre-copy pass gets 30 seconds time added for this phase


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re: [PATCH 0/2] tests/migration: Fix migration-test slowdown
Posted by Thomas Huth 1 year ago
On 18/04/2023 15.19, Juan Quintela wrote:
> Thomas Huth <thuth@redhat.com> wrote:
>> On 18/04/2023 13.42, Juan Quintela wrote:
>>> Thomas Huth <thuth@redhat.com> wrote:
>>>> On 12/04/2023 16.19, Juan Quintela wrote:
>>>>> Since commit:
>>>>> commit 1bfc8dde505f1e6a92697c52aa9b09e81b54c78f
>>>>> Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>>> Date:   Mon Mar 6 15:26:12 2023 +0000
>>>>>        tests/migration: Tweek auto converge limits check
>>>>>        Thomas found an autoconverge test failure where the
>>>>>        migration completed before the autoconverge had kicked in.
>>>>>        [...]
>>>>> migration-test has become very slow.
>>>>> On my laptop, before that commit migration-test takes 2min10seconds
>>>>> After that commit, it takes around 11minutes
>>>>> We can't revert it because it fixes a real problem when the host
>>>>> machine is overloaded.  See the comment on test_migrate_auto_converge().
>>>>
>>>> Thanks, your patches decrease the time to run the migration-test from
>>>> 16 minutes down to 5 minutes on my system, that's a great improvement,
>>>> indeed!
>>>>
>>>> Tested-by: Thomas Huth <thuth@redhat.com>
>>> Thanks
>>>
>>>> (though 5 minutes are still quite a lot for qtests ... maybe some
>>>> other parts could be moved to only run with g_test_slow() ?)
>>> Hi
>>> Could you gime the output of:
>>> time for i in $(./tests/qtest/migration-test -l | grep "^/"); do
>>> echo $i; time ./tests/qtest/migration-test -p $i; done
>>> To see what tests are taking so long on your system?
>>> On my system (i9900K processor, i.e. not the latest) and
>>> auto_converge
>>> moved to slow the total of the tests take a bit more than 1 minute.
>>
>> This is with both of your patches applied:
...
>> real	5m32,733s
>> user	7m24,380s
>> sys	1m50,801s
> 
> Ouch.
> 
> Can I ask:
> - what is your machine?  It is specially slow?

It's a 4 year old T480s ThinkPad laptop.

>    Otherwise I want to know why it is happening.
> 
> - as what is going slow to you is postcopy, can you told me what is this
>    setting?
> 
> # we want postcopy to work for normal users
> vm.unprivileged_userfaultfd = 1

$ sysctl vm.unprivileged_userfaultfd
sysctl: cannot stat /proc/sys/vm/unprivileged_userfaultfd: No such file or 
directory

> And if it is not set, just change it and retest.

Seems like it is not available on RHEL 8 yet :-(

Shall we maybe disable the postcopy tests if unprivileged_userfaultfd is not 
available?

  Thomas