migration/meson.build | 7 +- migration/migration-stats.c | 17 ++ migration/migration-stats.h | 41 +++ migration/migration.c | 42 ++- migration/multifd.c | 12 +- migration/postcopy-ram.c | 2 +- migration/qemu-file.c | 11 + migration/qemu-file.h | 1 + migration/ram-compress.c | 485 ++++++++++++++++++++++++++++++ migration/ram-compress.h | 70 +++++ migration/ram.c | 562 ++++------------------------------- migration/ram.h | 24 -- migration/rdma.c | 9 +- migration/savevm.c | 3 +- migration/tls.c | 15 +- migration/tls.h | 3 +- tests/qtest/migration-test.c | 126 ++++++++ 17 files changed, 870 insertions(+), 560 deletions(-) create mode 100644 migration/migration-stats.c create mode 100644 migration/migration-stats.h create mode 100644 migration/ram-compress.c create mode 100644 migration/ram-compress.h
The following changes since commit 05d50ba2d4668d43a835c5a502efdec9b92646e6: Merge tag 'migration-20230427-pull-request' of https://gitlab.com/juan.quintela/qemu into staging (2023-04-28 08:35:06 +0100) are available in the Git repository at: https://gitlab.com/juan.quintela/qemu.git tags/migration-20230428-pull-request for you to fetch changes up to 05ecac612ec6a4bdb358e68554b4406ac2c8e25a: migration: Initialize and cleanup decompression in migration.c (2023-04-28 20:54:53 +0200) ---------------------------------------------------------------- Migration Pull request (20230429 vintage) Hi In this series: - compression code cleanup (lukas) nice, nice, nice. - drop useless parameters from migration_tls* (juan) - first part of remove QEMUFileHooks series (juan) Please apply. ---------------------------------------------------------------- Juan Quintela (8): multifd: We already account for this packet on the multifd thread migration: Move ram_stats to its own file migration-stats.[ch] migration: Rename ram_counters to mig_stats migration: Rename RAMStats to MigrationAtomicStats migration/rdma: Split the zero page case from acct_update_position migration/rdma: Unfold last user of acct_update_position() migration: Drop unused parameter for migration_tls_get_creds() migration: Drop unused parameter for migration_tls_client_create() Lukas Straub (13): qtest/migration-test.c: Add tests with compress enabled qtest/migration-test.c: Add postcopy tests with compress enabled ram.c: Let the compress threads return a CompressResult enum ram.c: Dont change param->block in the compress thread ram.c: Reset result after sending queued data ram.c: Do not call save_page_header() from compress threads ram.c: Call update_compress_thread_counts from compress_send_queued_data ram.c: Remove last ram.c dependency from the core compress code ram.c: Move core compression code into its own file ram.c: Move core decompression code into its own file ram compress: Assert that the file buffer matches the result ram-compress.c: Make target independent migration: Initialize and cleanup decompression in migration.c migration/meson.build | 7 +- migration/migration-stats.c | 17 ++ migration/migration-stats.h | 41 +++ migration/migration.c | 42 ++- migration/multifd.c | 12 +- migration/postcopy-ram.c | 2 +- migration/qemu-file.c | 11 + migration/qemu-file.h | 1 + migration/ram-compress.c | 485 ++++++++++++++++++++++++++++++ migration/ram-compress.h | 70 +++++ migration/ram.c | 562 ++++------------------------------- migration/ram.h | 24 -- migration/rdma.c | 9 +- migration/savevm.c | 3 +- migration/tls.c | 15 +- migration/tls.h | 3 +- tests/qtest/migration-test.c | 126 ++++++++ 17 files changed, 870 insertions(+), 560 deletions(-) create mode 100644 migration/migration-stats.c create mode 100644 migration/migration-stats.h create mode 100644 migration/ram-compress.c create mode 100644 migration/ram-compress.h -- 2.40.0
On 4/28/23 20:11, Juan Quintela wrote: > The following changes since commit 05d50ba2d4668d43a835c5a502efdec9b92646e6: > > Merge tag 'migration-20230427-pull-request' of https://gitlab.com/juan.quintela/qemu into staging (2023-04-28 08:35:06 +0100) > > are available in the Git repository at: > > https://gitlab.com/juan.quintela/qemu.git tags/migration-20230428-pull-request > > for you to fetch changes up to 05ecac612ec6a4bdb358e68554b4406ac2c8e25a: > > migration: Initialize and cleanup decompression in migration.c (2023-04-28 20:54:53 +0200) > > ---------------------------------------------------------------- > Migration Pull request (20230429 vintage) > > Hi > > In this series: > - compression code cleanup (lukas) > nice, nice, nice. > - drop useless parameters from migration_tls* (juan) > - first part of remove QEMUFileHooks series (juan) > > Please apply. > > ---------------------------------------------------------------- > > Juan Quintela (8): > multifd: We already account for this packet on the multifd thread > migration: Move ram_stats to its own file migration-stats.[ch] > migration: Rename ram_counters to mig_stats > migration: Rename RAMStats to MigrationAtomicStats > migration/rdma: Split the zero page case from acct_update_position > migration/rdma: Unfold last user of acct_update_position() > migration: Drop unused parameter for migration_tls_get_creds() > migration: Drop unused parameter for migration_tls_client_create() > > Lukas Straub (13): > qtest/migration-test.c: Add tests with compress enabled > qtest/migration-test.c: Add postcopy tests with compress enabled > ram.c: Let the compress threads return a CompressResult enum > ram.c: Dont change param->block in the compress thread > ram.c: Reset result after sending queued data > ram.c: Do not call save_page_header() from compress threads > ram.c: Call update_compress_thread_counts from > compress_send_queued_data > ram.c: Remove last ram.c dependency from the core compress code > ram.c: Move core compression code into its own file > ram.c: Move core decompression code into its own file > ram compress: Assert that the file buffer matches the result > ram-compress.c: Make target independent > migration: Initialize and cleanup decompression in migration.c There are a bunch of migration failures in CI: https://gitlab.com/qemu-project/qemu/-/jobs/4201998343#L375 https://gitlab.com/qemu-project/qemu/-/jobs/4201998342#L428 https://gitlab.com/qemu-project/qemu/-/jobs/4201998340#L459 https://gitlab.com/qemu-project/qemu/-/jobs/4201998336#L4883 r~
Richard Henderson <richard.henderson@linaro.org> wrote: > On 4/28/23 20:11, Juan Quintela wrote: >> The following changes since commit 05d50ba2d4668d43a835c5a502efdec9b92646e6: >> Merge tag 'migration-20230427-pull-request' of >> https://gitlab.com/juan.quintela/qemu into staging (2023-04-28 >> 08:35:06 +0100) >> are available in the Git repository at: >> https://gitlab.com/juan.quintela/qemu.git >> tags/migration-20230428-pull-request >> for you to fetch changes up to >> 05ecac612ec6a4bdb358e68554b4406ac2c8e25a: >> migration: Initialize and cleanup decompression in migration.c >> (2023-04-28 20:54:53 +0200) >> ---------------------------------------------------------------- >> Migration Pull request (20230429 vintage) >> Hi >> In this series: >> - compression code cleanup (lukas) >> nice, nice, nice. >> - drop useless parameters from migration_tls* (juan) >> - first part of remove QEMUFileHooks series (juan) >> Please apply. >> ---------------------------------------------------------------- >> Juan Quintela (8): >> multifd: We already account for this packet on the multifd thread >> migration: Move ram_stats to its own file migration-stats.[ch] >> migration: Rename ram_counters to mig_stats >> migration: Rename RAMStats to MigrationAtomicStats >> migration/rdma: Split the zero page case from acct_update_position >> migration/rdma: Unfold last user of acct_update_position() >> migration: Drop unused parameter for migration_tls_get_creds() >> migration: Drop unused parameter for migration_tls_client_create() >> Lukas Straub (13): >> qtest/migration-test.c: Add tests with compress enabled >> qtest/migration-test.c: Add postcopy tests with compress enabled >> ram.c: Let the compress threads return a CompressResult enum >> ram.c: Dont change param->block in the compress thread >> ram.c: Reset result after sending queued data >> ram.c: Do not call save_page_header() from compress threads >> ram.c: Call update_compress_thread_counts from >> compress_send_queued_data >> ram.c: Remove last ram.c dependency from the core compress code >> ram.c: Move core compression code into its own file >> ram.c: Move core decompression code into its own file >> ram compress: Assert that the file buffer matches the result >> ram-compress.c: Make target independent >> migration: Initialize and cleanup decompression in migration.c > > There are a bunch of migration failures in CI: > > https://gitlab.com/qemu-project/qemu/-/jobs/4201998343#L375 cfi-x86_64 > https://gitlab.com/qemu-project/qemu/-/jobs/4201998342#L428 opensuse aarch64? > https://gitlab.com/qemu-project/qemu/-/jobs/4201998340#L459 debian i386 > https://gitlab.com/qemu-project/qemu/-/jobs/4201998336#L4883 x86_64 and aarch64 on a s390x host? Dunno really what is going on here. It works here: fedora 37 x86_64 host and both: qemu-system-x86_64 (native kvm) qemu-system-aarch64 (emulated) my patches are only code movement and cleanups, so Lukas any clue? Lukas, I am going to drop the compress code for now and pass the other patches. In the meanwhile, I am going to try to setup some machine where I can run the upstream tests and see if I can reproduce there. BTW, I would be happy if you double check that I did the rebase correctly, they didn't apply correctly, but as said, the tests have been running for two/three days on all my daily testing, so I thought that I did the things correctly. Richard, once that we are here, one of the problem that we are having is that the test is exiting with an abort, so we have no clue what is happening. Is there a way to get a backtrace, or at least the number Later, Juan.
On Tue, 02 May 2023 12:39:12 +0200 Juan Quintela <quintela@redhat.com> wrote: > [...] > > my patches are only code movement and cleanups, so Lukas any clue? > > Lukas, I am going to drop the compress code for now and pass the other > patches. In the meanwhile, I am going to try to setup some machine > where I can run the upstream tests and see if I can reproduce there. > BTW, I would be happy if you double check that I did the rebase > correctly, they didn't apply correctly, but as said, the tests have been > running for two/three days on all my daily testing, so I thought that I > did the things correctly. Hi, I rebased the series here and got exactly the same files as in this pull request. And I can't reproduce these failures either. Maybe you can run the CI just on the newly added compress tests and see if it already blows up without the refactoring? Anyway, this series is not so important anymore... > Richard, once that we are here, one of the problem that we are having is > that the test is exiting with an abort, so we have no clue what is > happening. Is there a way to get a backtrace, or at least the number > > Later, Juan. > --
Lukas Straub <lukasstraub2@web.de> wrote: > On Tue, 02 May 2023 12:39:12 +0200 > Juan Quintela <quintela@redhat.com> wrote: > >> [...] >> >> my patches are only code movement and cleanups, so Lukas any clue? >> >> Lukas, I am going to drop the compress code for now and pass the other >> patches. In the meanwhile, I am going to try to setup some machine >> where I can run the upstream tests and see if I can reproduce there. >> BTW, I would be happy if you double check that I did the rebase >> correctly, they didn't apply correctly, but as said, the tests have been >> running for two/three days on all my daily testing, so I thought that I >> did the things correctly. Hi > Hi, > I rebased the series here and got exactly the same files as in this > pull request. And I can't reproduce these failures either. Nice > Maybe you can run the CI just on the newly added compress tests and see > if it already blows up without the refactoring? It does, I don't have to check O:-) The initial reason that I did the compression code on top of multifd was that I was not able to get the old compression code to run "reliabely" on my testing. > Anyway, this series is not so important anymore... What about: - I add the series as they are, because the code is better than what we have before (and being in a different file makes it easier to deprecate, not compile, ...) - I just disable the tests until we find something that works. Richard, Lukas? Later, Juan. >> Richard, once that we are here, one of the problem that we are having is >> that the test is exiting with an abort, so we have no clue what is >> happening. Is there a way to get a backtrace, or at least the number >> >> Later, Juan. >>
On Mon, 08 May 2023 10:12:35 +0200 Juan Quintela <quintela@redhat.com> wrote: > Lukas Straub <lukasstraub2@web.de> wrote: > > On Tue, 02 May 2023 12:39:12 +0200 > > Juan Quintela <quintela@redhat.com> wrote: > > > >> [...] > >> > >> my patches are only code movement and cleanups, so Lukas any clue? > >> > >> Lukas, I am going to drop the compress code for now and pass the other > >> patches. In the meanwhile, I am going to try to setup some machine > >> where I can run the upstream tests and see if I can reproduce there. > >> BTW, I would be happy if you double check that I did the rebase > >> correctly, they didn't apply correctly, but as said, the tests have been > >> running for two/three days on all my daily testing, so I thought that I > >> did the things correctly. > > Hi > > > Hi, > > I rebased the series here and got exactly the same files as in this > > pull request. And I can't reproduce these failures either. > > Nice > > > Maybe you can run the CI just on the newly added compress tests and see > > if it already blows up without the refactoring? > > It does, I don't have to check O:-) > > The initial reason that I did the compression code on top of multifd was > that I was not able to get the old compression code to run "reliabely" > on my testing. > > > Anyway, this series is not so important anymore... > > What about: > - I add the series as they are, because the code is better than what we > have before (and being in a different file makes it easier to > deprecate, not compile, ...) > - I just disable the tests until we find something that works. > > Richard, Lukas? That is fine with me. > > Later, Juan. > > >> Richard, once that we are here, one of the problem that we are having is > >> that the test is exiting with an abort, so we have no clue what is > >> happening. Is there a way to get a backtrace, or at least the number > >> > >> Later, Juan. > >> >
On Tue, 2 May 2023 at 11:39, Juan Quintela <quintela@redhat.com> wrote: > Richard, once that we are here, one of the problem that we are having is > that the test is exiting with an abort, so we have no clue what is > happening. Is there a way to get a backtrace, or at least the number This has been consistently an issue with the migration tests. As the owner of the tests, if they are not providing you with the level of detail that you need to diagnose failures, I think that is something that is in your court to address: the CI system is always going to only be able to provide you with what your tests are outputting to the logs. For the specific case of backtraces from assertion failures, I think Dan was looking at whether we could put something together for that. It won't help with segfaults and the like, though. You should be able to at least get the number of the subtest out of the logs (either directly in the logs of the job, or else from the more detailed log file that gets stored as a job artefact in most cases). thanks -- PMM
Peter Maydell <peter.maydell@linaro.org> wrote: > On Tue, 2 May 2023 at 11:39, Juan Quintela <quintela@redhat.com> wrote: >> Richard, once that we are here, one of the problem that we are having is >> that the test is exiting with an abort, so we have no clue what is >> happening. Is there a way to get a backtrace, or at least the number > > This has been consistently an issue with the migration tests. > As the owner of the tests, if they are not providing you with > the level of detail that you need to diagnose failures, I > think that is something that is in your court to address: > the CI system is always going to only be able to provide > you with what your tests are outputting to the logs. Right now I would be happy just to see what test it is failing at. I am doing something wrong, or from the links that I see on richard email, I am not able to reach anywhere where I can see the full logs. > For the specific case of backtraces from assertion failures, > I think Dan was looking at whether we could put something > together for that. It won't help with segfaults and the like, though. I am waiting for that O:-) > You should be able to at least get the number of the subtest out of > the logs (either directly in the logs of the job, or else > from the more detailed log file that gets stored as a > job artefact in most cases). Also note that the test is stopping in an abort, with no diagnostic message that I can see. But I don't see where the abort cames from: $ grep abort tests/qtest/migration-* tests/qtest/migration-test.c: visit_type_SocketAddressList(iv, NULL, &addrs, &error_abort); tests/qtest/migration-test.c: * In non-multifd case when client aborts due to mismatched tests/qtest/migration-test.c: * In multifd case when client aborts due to mismatched tests/qtest/migration-test.c: * to load migration state, and thus just aborts the migration $ Later, Juan.
On Wed, 3 May 2023 at 10:17, Juan Quintela <quintela@redhat.com> wrote: > > Peter Maydell <peter.maydell@linaro.org> wrote: > > On Tue, 2 May 2023 at 11:39, Juan Quintela <quintela@redhat.com> wrote: > >> Richard, once that we are here, one of the problem that we are having is > >> that the test is exiting with an abort, so we have no clue what is > >> happening. Is there a way to get a backtrace, or at least the number > > > > This has been consistently an issue with the migration tests. > > As the owner of the tests, if they are not providing you with > > the level of detail that you need to diagnose failures, I > > think that is something that is in your court to address: > > the CI system is always going to only be able to provide > > you with what your tests are outputting to the logs. > > Right now I would be happy just to see what test it is failing at. > > I am doing something wrong, or from the links that I see on richard > email, I am not able to reach anywhere where I can see the full logs. > > > For the specific case of backtraces from assertion failures, > > I think Dan was looking at whether we could put something > > together for that. It won't help with segfaults and the like, though. > > I am waiting for that O:-) > > > You should be able to at least get the number of the subtest out of > > the logs (either directly in the logs of the job, or else > > from the more detailed log file that gets stored as a > > job artefact in most cases). > > Also note that the test is stopping in an abort, with no diagnostic > message that I can see. But I don't see where the abort cames from: So, as an example I took the check-system-opensuse log: https://gitlab.com/qemu-project/qemu/-/jobs/4201998342 Use your browser's "search in web page" to look for "SIGABRT": it'll show you the two errors (as well as the summary at the bottom of the page which just says the tests aborted). Here's one: 5/351 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test ERROR 246.12s killed by signal 6 SIGABRT >>> QTEST_QEMU_BINARY=./qemu-system-x86_64 QTEST_QEMU_IMG=./qemu-img MALLOC_PERTURB_=48 QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon G_TEST_DBUS_DAEMON=/builds/qemu-project/qemu/tests/dbus-vmstate-daemon.sh /builds/qemu-project/qemu/build/tests/qtest/migration-test --tap -k ――――――――――――――――――――――――――――――――――――― ✀ ――――――――――――――――――――――――――――――――――――― stderr: Could not access KVM kernel module: No such file or directory Could not access KVM kernel module: No such file or directory Could not access KVM kernel module: No such file or directory Could not access KVM kernel module: No such file or directory Could not access KVM kernel module: No such file or directory Could not access KVM kernel module: No such file or directory Could not access KVM kernel module: No such file or directory Could not access KVM kernel module: No such file or directory Could not access KVM kernel module: No such file or directory Could not access KVM kernel module: No such file or directory ** ERROR:../tests/qtest/migration-helpers.c:205:wait_for_migration_status: assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT) (test program exited with status code -6) ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― ▶ 6/351 ERROR:../tests/qtest/migration-helpers.c:205:wait_for_migration_status: assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT) ERROR 6/351 qemu:qtest+qtest-aarch64 / qtest-aarch64/migration-test ERROR 221.18s killed by signal 6 SIGABRT Looks like it failed on a timeout in the test code. I think there ought to be artefacts from the job which have a copy of the full log, but I can't find them: not sure if this is just because the gitlab UI is terrible, or if they really didn't get generated. thanks -- PMM
Peter Maydell <peter.maydell@linaro.org> wrote: > On Wed, 3 May 2023 at 10:17, Juan Quintela <quintela@redhat.com> wrote: >> >> Peter Maydell <peter.maydell@linaro.org> wrote: >> > On Tue, 2 May 2023 at 11:39, Juan Quintela <quintela@redhat.com> wrote: >> >> Richard, once that we are here, one of the problem that we are having is >> >> that the test is exiting with an abort, so we have no clue what is >> >> happening. Is there a way to get a backtrace, or at least the number >> > >> > This has been consistently an issue with the migration tests. >> > As the owner of the tests, if they are not providing you with >> > the level of detail that you need to diagnose failures, I >> > think that is something that is in your court to address: >> > the CI system is always going to only be able to provide >> > you with what your tests are outputting to the logs. >> >> Right now I would be happy just to see what test it is failing at. >> >> I am doing something wrong, or from the links that I see on richard >> email, I am not able to reach anywhere where I can see the full logs. >> >> > For the specific case of backtraces from assertion failures, >> > I think Dan was looking at whether we could put something >> > together for that. It won't help with segfaults and the like, though. >> >> I am waiting for that O:-) >> >> > You should be able to at least get the number of the subtest out of >> > the logs (either directly in the logs of the job, or else >> > from the more detailed log file that gets stored as a >> > job artefact in most cases). >> >> Also note that the test is stopping in an abort, with no diagnostic >> message that I can see. But I don't see where the abort cames from: > > So, as an example I took the check-system-opensuse log: > https://gitlab.com/qemu-project/qemu/-/jobs/4201998342 > > Use your browser's "search in web page" to look for "SIGABRT": > it'll show you the two errors (as well as the summary at > the bottom of the page which just says the tests aborted). > Here's one: > > 5/351 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test ERROR > 246.12s killed by signal 6 SIGABRT >>>> QTEST_QEMU_BINARY=./qemu-system-x86_64 QTEST_QEMU_IMG=./qemu-img >>> MALLOC_PERTURB_=48 >>> QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon >>> G_TEST_DBUS_DAEMON=/builds/qemu-project/qemu/tests/dbus-vmstate-daemon.sh >>> /builds/qemu-project/qemu/build/tests/qtest/migration-test --tap -k > ――――――――――――――――――――――――――――――――――――― ✀ ――――――――――――――――――――――――――――――――――――― > stderr: > Could not access KVM kernel module: No such file or directory > Could not access KVM kernel module: No such file or directory > Could not access KVM kernel module: No such file or directory > Could not access KVM kernel module: No such file or directory > Could not access KVM kernel module: No such file or directory > Could not access KVM kernel module: No such file or directory > Could not access KVM kernel module: No such file or directory > Could not access KVM kernel module: No such file or directory > Could not access KVM kernel module: No such file or directory > Could not access KVM kernel module: No such file or directory > ** > ERROR:../tests/qtest/migration-helpers.c:205:wait_for_migration_status: > assertion failed: (g_test_timer_elapsed() < > MIGRATION_STATUS_WAIT_TIMEOUT) > (test program exited with status code -6) > ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― > ▶ 6/351 ERROR:../tests/qtest/migration-helpers.c:205:wait_for_migration_status: > assertion failed: (g_test_timer_elapsed() < > MIGRATION_STATUS_WAIT_TIMEOUT) ERROR > 6/351 qemu:qtest+qtest-aarch64 / qtest-aarch64/migration-test ERROR > 221.18s killed by signal 6 SIGABRT > > Looks like it failed on a timeout in the test code. Thanks. > I think there ought to be artefacts from the job which have a > copy of the full log, but I can't find them: not sure if this > is just because the gitlab UI is terrible, or if they really > didn't get generated. So now we are between a rock and a hard place. We have slowed down the bandwidth for migration test because on non loaded machines, migration was too fast to need more than one pass. And we slowed it so much than now we hit the timer that was set at 120 seconds. So ..... It is going to be interesting. BTW, what procesor speed do that aarch64 machines have? Or are they so loaded that they are efectively trashing? 2minutes for a pass looks a bit too much. Will give a try to get this test done changing when we detect that we don't move to the completion stage. Thanks for the explanation on where to find the data. The other issue is that whan I really want is to know what test failed. I can't see a way to get that info. According to Daniel answer, we don't upload that files for tests that fail. Later, Juan.
On Wed, 3 May 2023 at 14:29, Juan Quintela <quintela@redhat.com> wrote: > So now we are between a rock and a hard place. > > We have slowed down the bandwidth for migration test because on non > loaded machines, migration was too fast to need more than one pass. > > And we slowed it so much than now we hit the timer that was set at 120 > seconds. > > So ..... > > It is going to be interesting. > > BTW, what procesor speed do that aarch64 machines have? Or are they so > loaded that they are efectively trashing? The 4 failures listed in this thread are all jobs running on Gitlab's standard x86-64 shared runners, not on the private aarch64 runner. thanks -- PMM
On Wed, May 03, 2023 at 11:17:33AM +0200, Juan Quintela wrote: > Peter Maydell <peter.maydell@linaro.org> wrote: > > On Tue, 2 May 2023 at 11:39, Juan Quintela <quintela@redhat.com> wrote: > >> Richard, once that we are here, one of the problem that we are having is > >> that the test is exiting with an abort, so we have no clue what is > >> happening. Is there a way to get a backtrace, or at least the number > > > > This has been consistently an issue with the migration tests. > > As the owner of the tests, if they are not providing you with > > the level of detail that you need to diagnose failures, I > > think that is something that is in your court to address: > > the CI system is always going to only be able to provide > > you with what your tests are outputting to the logs. > > Right now I would be happy just to see what test it is failing at. > > I am doing something wrong, or from the links that I see on richard > email, I am not able to reach anywhere where I can see the full logs. > > > For the specific case of backtraces from assertion failures, > > I think Dan was looking at whether we could put something > > together for that. It won't help with segfaults and the like, though. > > I am waiting for that O:-) I did have a play, but didn't get anything satisfactory working. I could only capture a trace from a single thread and the hard problems that really want traces usually involve multiple threads. There's really no good substitute for an OS level crash collector like systemd-coredump or abrtd :-( This is really worth an RFE to GitLab as a possible enhancement to their shared runners. > > You should be able to at least get the number of the subtest out of > > the logs (either directly in the logs of the job, or else > > from the more detailed log file that gets stored as a > > job artefact in most cases). > > Also note that the test is stopping in an abort, with no diagnostic > message that I can see. But I don't see where the abort cames from: Any g_assert failure will result in an abort, so there are many possibilities. We should be uploading the testlog.txt from meson test to show what one asserts, but I've just discovered we're lacking 'when: always' on our 'artifacts' declarations. So the artifacts are only uploaded when the job succeeds, which the least useful scenario for our test logs ! With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Sat, 29 Apr 2023 19:45:07 +0100 Richard Henderson <richard.henderson@linaro.org> wrote: > On 4/28/23 20:11, Juan Quintela wrote: > > The following changes since commit 05d50ba2d4668d43a835c5a502efdec9b92646e6: > > > > Merge tag 'migration-20230427-pull-request' of https://gitlab.com/juan.quintela/qemu into staging (2023-04-28 08:35:06 +0100) > > > > are available in the Git repository at: > > > > https://gitlab.com/juan.quintela/qemu.git tags/migration-20230428-pull-request > > > > for you to fetch changes up to 05ecac612ec6a4bdb358e68554b4406ac2c8e25a: > > > > migration: Initialize and cleanup decompression in migration.c (2023-04-28 20:54:53 +0200) > > > > ---------------------------------------------------------------- > > Migration Pull request (20230429 vintage) > > > > Hi > > > > In this series: > > - compression code cleanup (lukas) > > nice, nice, nice. > > - drop useless parameters from migration_tls* (juan) > > - first part of remove QEMUFileHooks series (juan) > > > > Please apply. > > > > ---------------------------------------------------------------- > > > > Juan Quintela (8): > > multifd: We already account for this packet on the multifd thread > > migration: Move ram_stats to its own file migration-stats.[ch] > > migration: Rename ram_counters to mig_stats > > migration: Rename RAMStats to MigrationAtomicStats > > migration/rdma: Split the zero page case from acct_update_position > > migration/rdma: Unfold last user of acct_update_position() > > migration: Drop unused parameter for migration_tls_get_creds() > > migration: Drop unused parameter for migration_tls_client_create() > > > > Lukas Straub (13): > > qtest/migration-test.c: Add tests with compress enabled > > qtest/migration-test.c: Add postcopy tests with compress enabled > > ram.c: Let the compress threads return a CompressResult enum > > ram.c: Dont change param->block in the compress thread > > ram.c: Reset result after sending queued data > > ram.c: Do not call save_page_header() from compress threads > > ram.c: Call update_compress_thread_counts from > > compress_send_queued_data > > ram.c: Remove last ram.c dependency from the core compress code > > ram.c: Move core compression code into its own file > > ram.c: Move core decompression code into its own file > > ram compress: Assert that the file buffer matches the result > > ram-compress.c: Make target independent > > migration: Initialize and cleanup decompression in migration.c > > There are a bunch of migration failures in CI: > > https://gitlab.com/qemu-project/qemu/-/jobs/4201998343#L375 > https://gitlab.com/qemu-project/qemu/-/jobs/4201998342#L428 > https://gitlab.com/qemu-project/qemu/-/jobs/4201998340#L459 > https://gitlab.com/qemu-project/qemu/-/jobs/4201998336#L4883 > > > r~ Hmm, it doesn't always fail. Any way to get the testlog from the failed jobs? --
On 4/29/23 21:14, Lukas Straub wrote: > On Sat, 29 Apr 2023 19:45:07 +0100 > Richard Henderson <richard.henderson@linaro.org> wrote: > >> On 4/28/23 20:11, Juan Quintela wrote: >>> The following changes since commit 05d50ba2d4668d43a835c5a502efdec9b92646e6: >>> >>> Merge tag 'migration-20230427-pull-request' of https://gitlab.com/juan.quintela/qemu into staging (2023-04-28 08:35:06 +0100) >>> >>> are available in the Git repository at: >>> >>> https://gitlab.com/juan.quintela/qemu.git tags/migration-20230428-pull-request >>> >>> for you to fetch changes up to 05ecac612ec6a4bdb358e68554b4406ac2c8e25a: >>> >>> migration: Initialize and cleanup decompression in migration.c (2023-04-28 20:54:53 +0200) >>> >>> ---------------------------------------------------------------- >>> Migration Pull request (20230429 vintage) >>> >>> Hi >>> >>> In this series: >>> - compression code cleanup (lukas) >>> nice, nice, nice. >>> - drop useless parameters from migration_tls* (juan) >>> - first part of remove QEMUFileHooks series (juan) >>> >>> Please apply. >>> >>> ---------------------------------------------------------------- >>> >>> Juan Quintela (8): >>> multifd: We already account for this packet on the multifd thread >>> migration: Move ram_stats to its own file migration-stats.[ch] >>> migration: Rename ram_counters to mig_stats >>> migration: Rename RAMStats to MigrationAtomicStats >>> migration/rdma: Split the zero page case from acct_update_position >>> migration/rdma: Unfold last user of acct_update_position() >>> migration: Drop unused parameter for migration_tls_get_creds() >>> migration: Drop unused parameter for migration_tls_client_create() >>> >>> Lukas Straub (13): >>> qtest/migration-test.c: Add tests with compress enabled >>> qtest/migration-test.c: Add postcopy tests with compress enabled >>> ram.c: Let the compress threads return a CompressResult enum >>> ram.c: Dont change param->block in the compress thread >>> ram.c: Reset result after sending queued data >>> ram.c: Do not call save_page_header() from compress threads >>> ram.c: Call update_compress_thread_counts from >>> compress_send_queued_data >>> ram.c: Remove last ram.c dependency from the core compress code >>> ram.c: Move core compression code into its own file >>> ram.c: Move core decompression code into its own file >>> ram compress: Assert that the file buffer matches the result >>> ram-compress.c: Make target independent >>> migration: Initialize and cleanup decompression in migration.c >> >> There are a bunch of migration failures in CI: >> >> https://gitlab.com/qemu-project/qemu/-/jobs/4201998343#L375 >> https://gitlab.com/qemu-project/qemu/-/jobs/4201998342#L428 >> https://gitlab.com/qemu-project/qemu/-/jobs/4201998340#L459 >> https://gitlab.com/qemu-project/qemu/-/jobs/4201998336#L4883 >> >> >> r~ > > Hmm, it doesn't always fail. Any way to get the testlog from the failed > jobs? What you can get from the links above is all I have. But they're consistent, and new. r~
© 2016 - 2024 Red Hat, Inc.