Previous commit removed the creation of the fifo. Without it, I get
random failure during tests with high load, please consider
reintroduce it.
My guess is that there is a race between the two socats when we leave
them to create the channel, better return to the previous behavior.
I can't reproduce the problem when I run ./test-io-channel-command
test alone, I need to do the make check. And any (unrelated) change
can make it dissapear.
commit 76f5148c21b4543e62a6ad605ac4b44133421401
Author: Marc-André Lureau <marcandre.lureau@redhat.com>
Date: Thu Oct 6 15:36:57 2022 +0400
tests/unit: make test-io-channel-command work on win32
This has been tested under msys2 & windows 11. I haven't tried to make
it work with other environments yet, but that should be enough to
validate the channel-command implementation anyway.
Here are the changes:
- drop tests/ from fifo/pipe path, to avoid directory issues
- use g_find_program() to lookup the socat executable (otherwise we
would need to change ChanneCommand to use G_SPAWN_SEARCH_PATH, and deal
with missing socat differently)
- skip the "echo" test when socat is missing as well
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20221006113657.2656108-7-marcandre.lureau@redhat.com>
Failure:
[178/178] 🌓 qemu:unit / test-io-channel-command
[178/178] 🌔 qemu:unit / test-io-channel-command
[178/178] 🌕 qemu:unit / test-io-channel-command
[178/178] 🌖 qemu:unit / test-io-channel-command
[178/178] 🌗 qemu:unit / test-io-channel-command
[178/178] 🌘 qemu:unit / test-io-channel-command
[178/178] 🌑 qemu:unit / test-io-channel-command
[178/178] 🌒 qemu:unit / test-io-channel-command
[178/178] 🌓 qemu:unit / test-io-channel-command
^CWARNING: Received SIGTERM, exiting
178/178 qemu:unit / test-io-channel-command INTERRUPT 1127.75s killed by signal 15 SIGTERM
>>> MALLOC_PERTURB_=149 G_TEST_BUILDDIR=/scratch/qemu/multifd/x64/tests/unit G_TEST_SRCDIR=/mnt/code/qemu/multifd/tests/unit /scratch/qemu/multifd/x64/tests/unit/test-io-channel-command --tap -k
――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――
stderr:
2022/10/25 12:32:48 socat[463140] E mkfifo(test-io-channel-command.fifo, 438): File exists
TAP parsing error: Too few tests run (expected 4, got 0)
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
Summary of Failures:
178/178 qemu:unit / test-io-channel-command INTERRUPT 1127.75s killed by signal 15 SIGTERM
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
tests/unit/test-io-channel-command.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tests/unit/test-io-channel-command.c b/tests/unit/test-io-channel-command.c
index 7eee939c07..7e75f960f4 100644
--- a/tests/unit/test-io-channel-command.c
+++ b/tests/unit/test-io-channel-command.c
@@ -48,6 +48,9 @@ static void test_io_channel_command_fifo(bool async)
}
unlink(TEST_FIFO);
+ if (mkfifo(TEST_FIFO, 0600) < 0) {
+ abort();
+ }
src = QIO_CHANNEL(qio_channel_command_new_spawn(srcargv,
O_WRONLY,
&error_abort));
--
2.37.3
Juan Quintela <quintela@redhat.com> writes: > Previous commit removed the creation of the fifo. Without it, I get > random failure during tests with high load, please consider > reintroduce it. > > My guess is that there is a race between the two socats when we leave > them to create the channel, better return to the previous behavior. > > I can't reproduce the problem when I run ./test-io-channel-command > test alone, I need to do the make check. And any (unrelated) change > can make it dissapear. I was chasing a similar problem with this test although I don't see it timeout while running (I don't think our unit tests time out). I'm provisionally queuing this to testing/next unless anyone objects. > > commit 76f5148c21b4543e62a6ad605ac4b44133421401 > Author: Marc-André Lureau <marcandre.lureau@redhat.com> > Date: Thu Oct 6 15:36:57 2022 +0400 > > tests/unit: make test-io-channel-command work on win32 > > This has been tested under msys2 & windows 11. I haven't tried to make > it work with other environments yet, but that should be enough to > validate the channel-command implementation anyway. > > Here are the changes: > - drop tests/ from fifo/pipe path, to avoid directory issues > - use g_find_program() to lookup the socat executable (otherwise we > would need to change ChanneCommand to use G_SPAWN_SEARCH_PATH, and deal > with missing socat differently) > - skip the "echo" test when socat is missing as well > > Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> > Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> > Message-Id: <20221006113657.2656108-7-marcandre.lureau@redhat.com> > > Failure: > > [178/178] 🌓 qemu:unit / test-io-channel-command > [178/178] 🌔 qemu:unit / test-io-channel-command > [178/178] 🌕 qemu:unit / test-io-channel-command > [178/178] 🌖 qemu:unit / test-io-channel-command > [178/178] 🌗 qemu:unit / test-io-channel-command > [178/178] 🌘 qemu:unit / test-io-channel-command > [178/178] 🌑 qemu:unit / test-io-channel-command > [178/178] 🌒 qemu:unit / test-io-channel-command > [178/178] 🌓 qemu:unit / test-io-channel-command > ^CWARNING: Received SIGTERM, exiting > 178/178 qemu:unit / test-io-channel-command INTERRUPT 1127.75s killed by signal 15 SIGTERM >>>> MALLOC_PERTURB_=149 G_TEST_BUILDDIR=/scratch/qemu/multifd/x64/tests/unit G_TEST_SRCDIR=/mnt/code/qemu/multifd/tests/unit /scratch/qemu/multifd/x64/tests/unit/test-io-channel-command --tap -k > ――――――――――――――――――――――――――――――――――――― ✀ ――――――――――――――――――――――――――――――――――――― > stderr: > 2022/10/25 12:32:48 socat[463140] E mkfifo(test-io-channel-command.fifo, 438): File exists > > TAP parsing error: Too few tests run (expected 4, got 0) > ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― > > Summary of Failures: > > 178/178 qemu:unit / test-io-channel-command INTERRUPT 1127.75s killed by signal 15 SIGTERM > > Signed-off-by: Juan Quintela <quintela@redhat.com> > --- > tests/unit/test-io-channel-command.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/tests/unit/test-io-channel-command.c b/tests/unit/test-io-channel-command.c > index 7eee939c07..7e75f960f4 100644 > --- a/tests/unit/test-io-channel-command.c > +++ b/tests/unit/test-io-channel-command.c > @@ -48,6 +48,9 @@ static void test_io_channel_command_fifo(bool async) > } > > unlink(TEST_FIFO); > + if (mkfifo(TEST_FIFO, 0600) < 0) { > + abort(); > + } > src = QIO_CHANNEL(qio_channel_command_new_spawn(srcargv, > O_WRONLY, > &error_abort)); -- Alex Bennée
CC'ing Marc-André as original author of the change On Tue, Oct 25, 2022 at 01:57:23PM +0100, Alex Bennée wrote: > > Juan Quintela <quintela@redhat.com> writes: > > > Previous commit removed the creation of the fifo. Without it, I get > > random failure during tests with high load, please consider > > reintroduce it. > > > > My guess is that there is a race between the two socats when we leave > > them to create the channel, better return to the previous behavior. > > > > I can't reproduce the problem when I run ./test-io-channel-command > > test alone, I need to do the make check. And any (unrelated) change > > can make it dissapear. > > I was chasing a similar problem with this test although I don't see it > timeout while running (I don't think our unit tests time out). I'm > provisionally queuing this to testing/next unless anyone objects. It won't build on Win32 since that platform lacks mkfifo. The test normally works since socat will call mknod to create the fifo. I think the problem is that we have a race condition where the client socat runs before the server socat, and so won't see the fifo. This will be where high load triggers problems. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Daniel P. Berrangé <berrange@redhat.com> writes: > CC'ing Marc-André as original author of the change > > On Tue, Oct 25, 2022 at 01:57:23PM +0100, Alex Bennée wrote: >> >> Juan Quintela <quintela@redhat.com> writes: >> >> > Previous commit removed the creation of the fifo. Without it, I get >> > random failure during tests with high load, please consider >> > reintroduce it. >> > >> > My guess is that there is a race between the two socats when we leave >> > them to create the channel, better return to the previous behavior. >> > >> > I can't reproduce the problem when I run ./test-io-channel-command >> > test alone, I need to do the make check. And any (unrelated) change >> > can make it dissapear. >> >> I was chasing a similar problem with this test although I don't see it >> timeout while running (I don't think our unit tests time out). I'm >> provisionally queuing this to testing/next unless anyone objects. > > It won't build on Win32 since that platform lacks mkfifo. > > The test normally works since socat will call mknod to create > the fifo. > > I think the problem is that we have a race condition where the > client socat runs before the server socat, and so won't see the > fifo. This will be where high load triggers problems. Ok I shall drop the patch from testing/next - we need a better solution. -- Alex Bennée
On 26/10/2022 18.18, Alex Bennée wrote: > > Daniel P. Berrangé <berrange@redhat.com> writes: > >> CC'ing Marc-André as original author of the change >> >> On Tue, Oct 25, 2022 at 01:57:23PM +0100, Alex Bennée wrote: >>> >>> Juan Quintela <quintela@redhat.com> writes: >>> >>>> Previous commit removed the creation of the fifo. Without it, I get >>>> random failure during tests with high load, please consider >>>> reintroduce it. >>>> >>>> My guess is that there is a race between the two socats when we leave >>>> them to create the channel, better return to the previous behavior. >>>> >>>> I can't reproduce the problem when I run ./test-io-channel-command >>>> test alone, I need to do the make check. And any (unrelated) change >>>> can make it dissapear. >>> >>> I was chasing a similar problem with this test although I don't see it >>> timeout while running (I don't think our unit tests time out). I'm >>> provisionally queuing this to testing/next unless anyone objects. >> >> It won't build on Win32 since that platform lacks mkfifo. >> >> The test normally works since socat will call mknod to create >> the fifo. >> >> I think the problem is that we have a race condition where the >> client socat runs before the server socat, and so won't see the >> fifo. This will be where high load triggers problems. > > Ok I shall drop the patch from testing/next - we need a better solution. Could we maybe at least revert the patch that introduced the problem? ... the failing test is annoying ... Thomas
Thomas Huth <thuth@redhat.com> writes: > On 26/10/2022 18.18, Alex Bennée wrote: >> Daniel P. Berrangé <berrange@redhat.com> writes: >> >>> CC'ing Marc-André as original author of the change >>> >>> On Tue, Oct 25, 2022 at 01:57:23PM +0100, Alex Bennée wrote: >>>> >>>> Juan Quintela <quintela@redhat.com> writes: >>>> >>>>> Previous commit removed the creation of the fifo. Without it, I get >>>>> random failure during tests with high load, please consider >>>>> reintroduce it. >>>>> >>>>> My guess is that there is a race between the two socats when we leave >>>>> them to create the channel, better return to the previous behavior. >>>>> >>>>> I can't reproduce the problem when I run ./test-io-channel-command >>>>> test alone, I need to do the make check. And any (unrelated) change >>>>> can make it dissapear. >>>> >>>> I was chasing a similar problem with this test although I don't see it >>>> timeout while running (I don't think our unit tests time out). I'm >>>> provisionally queuing this to testing/next unless anyone objects. >>> >>> It won't build on Win32 since that platform lacks mkfifo. >>> >>> The test normally works since socat will call mknod to create >>> the fifo. >>> >>> I think the problem is that we have a race condition where the >>> client socat runs before the server socat, and so won't see the >>> fifo. This will be where high load triggers problems. >> Ok I shall drop the patch from testing/next - we need a better >> solution. > > Could we maybe at least revert the patch that introduced the problem? > ... the failing test is annoying ... I'm trying to fix it now but I haven't been able to get one of the socat's to not race with the other: --8<---------------cut here---------------start------------->8--- modified tests/unit/test-io-channel-command.c @@ -19,6 +19,7 @@ */ #include "qemu/osdep.h" +#include <glib/gstdio.h> #include "io/channel-command.h" #include "io-channel-helpers.h" #include "qapi/error.h" @@ -33,25 +34,26 @@ static char *socat = NULL; static void test_io_channel_command_fifo(bool async) { + g_autofree gchar *tmpdir = g_dir_make_tmp("qemu-test-io-channel.XXXXXX", NULL); + g_autofree gchar *fifo = g_strdup_printf("%s/%s", tmpdir, TEST_FIFO); + g_autoptr(GString) srcargs = g_string_new(socat); + g_autoptr(GString) dstargs = g_string_new(socat); + g_auto(GStrv) srcargv; + g_auto(GStrv) dstargv; QIOChannel *src, *dst; QIOChannelTest *test; - const char *srcargv[] = { - socat, "-", SOCAT_SRC, NULL, - }; - const char *dstargv[] = { - socat, SOCAT_DST, "-", NULL, - }; - if (!socat) { - g_test_skip("socat is not found in PATH"); - return; - } + g_string_append_printf(srcargs, " - PIPE:%s,wronly,creat=1,excl=1", fifo); + g_string_append_printf(dstargs, " PIPE:%s,rdonly,creat=0 -", fifo); + + srcargv = g_strsplit(srcargs->str, " ", -1); + dstargv = g_strsplit(dstargs->str, " ", -1); - unlink(TEST_FIFO); - src = QIO_CHANNEL(qio_channel_command_new_spawn(srcargv, + src = QIO_CHANNEL(qio_channel_command_new_spawn((const char**) srcargv, O_WRONLY, &error_abort)); - dst = QIO_CHANNEL(qio_channel_command_new_spawn(dstargv, + + dst = QIO_CHANNEL(qio_channel_command_new_spawn((const char**) dstargv, O_RDONLY, &error_abort)); @@ -62,17 +64,27 @@ static void test_io_channel_command_fifo(bool async) object_unref(OBJECT(src)); object_unref(OBJECT(dst)); - unlink(TEST_FIFO); + g_rmdir(tmpdir); } static void test_io_channel_command_fifo_async(void) { + if (!socat) { + g_test_skip("socat is not found in PATH"); + return; + } + test_io_channel_command_fifo(true); } static void test_io_channel_command_fifo_sync(void) { + if (!socat) { + g_test_skip("socat is not found in PATH"); + return; + } + test_io_channel_command_fifo(false); } --8<---------------cut here---------------end--------------->8--- > > Thomas -- Alex Bennée
On Thu, Oct 27, 2022 at 02:59:04PM +0100, Alex Bennée wrote: > > Thomas Huth <thuth@redhat.com> writes: > > > On 26/10/2022 18.18, Alex Bennée wrote: > >> Daniel P. Berrangé <berrange@redhat.com> writes: > >> > >>> CC'ing Marc-André as original author of the change > >>> > >>> On Tue, Oct 25, 2022 at 01:57:23PM +0100, Alex Bennée wrote: > >>>> > >>>> Juan Quintela <quintela@redhat.com> writes: > >>>> > >>>>> Previous commit removed the creation of the fifo. Without it, I get > >>>>> random failure during tests with high load, please consider > >>>>> reintroduce it. > >>>>> > >>>>> My guess is that there is a race between the two socats when we leave > >>>>> them to create the channel, better return to the previous behavior. > >>>>> > >>>>> I can't reproduce the problem when I run ./test-io-channel-command > >>>>> test alone, I need to do the make check. And any (unrelated) change > >>>>> can make it dissapear. > >>>> > >>>> I was chasing a similar problem with this test although I don't see it > >>>> timeout while running (I don't think our unit tests time out). I'm > >>>> provisionally queuing this to testing/next unless anyone objects. > >>> > >>> It won't build on Win32 since that platform lacks mkfifo. > >>> > >>> The test normally works since socat will call mknod to create > >>> the fifo. > >>> > >>> I think the problem is that we have a race condition where the > >>> client socat runs before the server socat, and so won't see the > >>> fifo. This will be where high load triggers problems. > >> Ok I shall drop the patch from testing/next - we need a better > >> solution. > > > > Could we maybe at least revert the patch that introduced the problem? > > ... the failing test is annoying ... > > I'm trying to fix it now but I haven't been able to get one of the > socat's to not race with the other: My other thought was to run a socat first, with 'unlink-close=0' and make it exit immediately, simply to get the fifo created. I'm not sure how to make ti exit immediately though, with a guarantee that its had a chance to create the fifo. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
© 2016 - 2024 Red Hat, Inc.