Previous commit removed the creation of the fifo. Without it, I get
random failure during tests with high load, please consider
reintroduce it.
My guess is that there is a race between the two socats when we leave
them to create the channel, better return to the previous behavior.
I can't reproduce the problem when I run ./test-io-channel-command
test alone, I need to do the make check. And any (unrelated) change
can make it dissapear.
commit 76f5148c21b4543e62a6ad605ac4b44133421401
Author: Marc-André Lureau <marcandre.lureau@redhat.com>
Date: Thu Oct 6 15:36:57 2022 +0400
tests/unit: make test-io-channel-command work on win32
This has been tested under msys2 & windows 11. I haven't tried to make
it work with other environments yet, but that should be enough to
validate the channel-command implementation anyway.
Here are the changes:
- drop tests/ from fifo/pipe path, to avoid directory issues
- use g_find_program() to lookup the socat executable (otherwise we
would need to change ChanneCommand to use G_SPAWN_SEARCH_PATH, and deal
with missing socat differently)
- skip the "echo" test when socat is missing as well
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20221006113657.2656108-7-marcandre.lureau@redhat.com>
Failure:
[178/178] 🌓 qemu:unit / test-io-channel-command
[178/178] 🌔 qemu:unit / test-io-channel-command
[178/178] 🌕 qemu:unit / test-io-channel-command
[178/178] 🌖 qemu:unit / test-io-channel-command
[178/178] 🌗 qemu:unit / test-io-channel-command
[178/178] 🌘 qemu:unit / test-io-channel-command
[178/178] 🌑 qemu:unit / test-io-channel-command
[178/178] 🌒 qemu:unit / test-io-channel-command
[178/178] 🌓 qemu:unit / test-io-channel-command
^CWARNING: Received SIGTERM, exiting
178/178 qemu:unit / test-io-channel-command INTERRUPT 1127.75s killed by signal 15 SIGTERM
>>> MALLOC_PERTURB_=149 G_TEST_BUILDDIR=/scratch/qemu/multifd/x64/tests/unit G_TEST_SRCDIR=/mnt/code/qemu/multifd/tests/unit /scratch/qemu/multifd/x64/tests/unit/test-io-channel-command --tap -k
――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――
stderr:
2022/10/25 12:32:48 socat[463140] E mkfifo(test-io-channel-command.fifo, 438): File exists
TAP parsing error: Too few tests run (expected 4, got 0)
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
Summary of Failures:
178/178 qemu:unit / test-io-channel-command INTERRUPT 1127.75s killed by signal 15 SIGTERM
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
tests/unit/test-io-channel-command.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tests/unit/test-io-channel-command.c b/tests/unit/test-io-channel-command.c
index 7eee939c07..7e75f960f4 100644
--- a/tests/unit/test-io-channel-command.c
+++ b/tests/unit/test-io-channel-command.c
@@ -48,6 +48,9 @@ static void test_io_channel_command_fifo(bool async)
}
unlink(TEST_FIFO);
+ if (mkfifo(TEST_FIFO, 0600) < 0) {
+ abort();
+ }
src = QIO_CHANNEL(qio_channel_command_new_spawn(srcargv,
O_WRONLY,
&error_abort));
--
2.37.3
Juan Quintela <quintela@redhat.com> writes:
> Previous commit removed the creation of the fifo. Without it, I get
> random failure during tests with high load, please consider
> reintroduce it.
>
> My guess is that there is a race between the two socats when we leave
> them to create the channel, better return to the previous behavior.
>
> I can't reproduce the problem when I run ./test-io-channel-command
> test alone, I need to do the make check. And any (unrelated) change
> can make it dissapear.
I was chasing a similar problem with this test although I don't see it
timeout while running (I don't think our unit tests time out). I'm
provisionally queuing this to testing/next unless anyone objects.
>
> commit 76f5148c21b4543e62a6ad605ac4b44133421401
> Author: Marc-André Lureau <marcandre.lureau@redhat.com>
> Date: Thu Oct 6 15:36:57 2022 +0400
>
> tests/unit: make test-io-channel-command work on win32
>
> This has been tested under msys2 & windows 11. I haven't tried to make
> it work with other environments yet, but that should be enough to
> validate the channel-command implementation anyway.
>
> Here are the changes:
> - drop tests/ from fifo/pipe path, to avoid directory issues
> - use g_find_program() to lookup the socat executable (otherwise we
> would need to change ChanneCommand to use G_SPAWN_SEARCH_PATH, and deal
> with missing socat differently)
> - skip the "echo" test when socat is missing as well
>
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
> Message-Id: <20221006113657.2656108-7-marcandre.lureau@redhat.com>
>
> Failure:
>
> [178/178] 🌓 qemu:unit / test-io-channel-command
> [178/178] 🌔 qemu:unit / test-io-channel-command
> [178/178] 🌕 qemu:unit / test-io-channel-command
> [178/178] 🌖 qemu:unit / test-io-channel-command
> [178/178] 🌗 qemu:unit / test-io-channel-command
> [178/178] 🌘 qemu:unit / test-io-channel-command
> [178/178] 🌑 qemu:unit / test-io-channel-command
> [178/178] 🌒 qemu:unit / test-io-channel-command
> [178/178] 🌓 qemu:unit / test-io-channel-command
> ^CWARNING: Received SIGTERM, exiting
> 178/178 qemu:unit / test-io-channel-command INTERRUPT 1127.75s killed by signal 15 SIGTERM
>>>> MALLOC_PERTURB_=149 G_TEST_BUILDDIR=/scratch/qemu/multifd/x64/tests/unit G_TEST_SRCDIR=/mnt/code/qemu/multifd/tests/unit /scratch/qemu/multifd/x64/tests/unit/test-io-channel-command --tap -k
> ――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――
> stderr:
> 2022/10/25 12:32:48 socat[463140] E mkfifo(test-io-channel-command.fifo, 438): File exists
>
> TAP parsing error: Too few tests run (expected 4, got 0)
> ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
>
> Summary of Failures:
>
> 178/178 qemu:unit / test-io-channel-command INTERRUPT 1127.75s killed by signal 15 SIGTERM
>
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
> tests/unit/test-io-channel-command.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/tests/unit/test-io-channel-command.c b/tests/unit/test-io-channel-command.c
> index 7eee939c07..7e75f960f4 100644
> --- a/tests/unit/test-io-channel-command.c
> +++ b/tests/unit/test-io-channel-command.c
> @@ -48,6 +48,9 @@ static void test_io_channel_command_fifo(bool async)
> }
>
> unlink(TEST_FIFO);
> + if (mkfifo(TEST_FIFO, 0600) < 0) {
> + abort();
> + }
> src = QIO_CHANNEL(qio_channel_command_new_spawn(srcargv,
> O_WRONLY,
> &error_abort));
--
Alex Bennée
CC'ing Marc-André as original author of the change On Tue, Oct 25, 2022 at 01:57:23PM +0100, Alex Bennée wrote: > > Juan Quintela <quintela@redhat.com> writes: > > > Previous commit removed the creation of the fifo. Without it, I get > > random failure during tests with high load, please consider > > reintroduce it. > > > > My guess is that there is a race between the two socats when we leave > > them to create the channel, better return to the previous behavior. > > > > I can't reproduce the problem when I run ./test-io-channel-command > > test alone, I need to do the make check. And any (unrelated) change > > can make it dissapear. > > I was chasing a similar problem with this test although I don't see it > timeout while running (I don't think our unit tests time out). I'm > provisionally queuing this to testing/next unless anyone objects. It won't build on Win32 since that platform lacks mkfifo. The test normally works since socat will call mknod to create the fifo. I think the problem is that we have a race condition where the client socat runs before the server socat, and so won't see the fifo. This will be where high load triggers problems. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Daniel P. Berrangé <berrange@redhat.com> writes: > CC'ing Marc-André as original author of the change > > On Tue, Oct 25, 2022 at 01:57:23PM +0100, Alex Bennée wrote: >> >> Juan Quintela <quintela@redhat.com> writes: >> >> > Previous commit removed the creation of the fifo. Without it, I get >> > random failure during tests with high load, please consider >> > reintroduce it. >> > >> > My guess is that there is a race between the two socats when we leave >> > them to create the channel, better return to the previous behavior. >> > >> > I can't reproduce the problem when I run ./test-io-channel-command >> > test alone, I need to do the make check. And any (unrelated) change >> > can make it dissapear. >> >> I was chasing a similar problem with this test although I don't see it >> timeout while running (I don't think our unit tests time out). I'm >> provisionally queuing this to testing/next unless anyone objects. > > It won't build on Win32 since that platform lacks mkfifo. > > The test normally works since socat will call mknod to create > the fifo. > > I think the problem is that we have a race condition where the > client socat runs before the server socat, and so won't see the > fifo. This will be where high load triggers problems. Ok I shall drop the patch from testing/next - we need a better solution. -- Alex Bennée
On 26/10/2022 18.18, Alex Bennée wrote: > > Daniel P. Berrangé <berrange@redhat.com> writes: > >> CC'ing Marc-André as original author of the change >> >> On Tue, Oct 25, 2022 at 01:57:23PM +0100, Alex Bennée wrote: >>> >>> Juan Quintela <quintela@redhat.com> writes: >>> >>>> Previous commit removed the creation of the fifo. Without it, I get >>>> random failure during tests with high load, please consider >>>> reintroduce it. >>>> >>>> My guess is that there is a race between the two socats when we leave >>>> them to create the channel, better return to the previous behavior. >>>> >>>> I can't reproduce the problem when I run ./test-io-channel-command >>>> test alone, I need to do the make check. And any (unrelated) change >>>> can make it dissapear. >>> >>> I was chasing a similar problem with this test although I don't see it >>> timeout while running (I don't think our unit tests time out). I'm >>> provisionally queuing this to testing/next unless anyone objects. >> >> It won't build on Win32 since that platform lacks mkfifo. >> >> The test normally works since socat will call mknod to create >> the fifo. >> >> I think the problem is that we have a race condition where the >> client socat runs before the server socat, and so won't see the >> fifo. This will be where high load triggers problems. > > Ok I shall drop the patch from testing/next - we need a better solution. Could we maybe at least revert the patch that introduced the problem? ... the failing test is annoying ... Thomas
Thomas Huth <thuth@redhat.com> writes:
> On 26/10/2022 18.18, Alex Bennée wrote:
>> Daniel P. Berrangé <berrange@redhat.com> writes:
>>
>>> CC'ing Marc-André as original author of the change
>>>
>>> On Tue, Oct 25, 2022 at 01:57:23PM +0100, Alex Bennée wrote:
>>>>
>>>> Juan Quintela <quintela@redhat.com> writes:
>>>>
>>>>> Previous commit removed the creation of the fifo. Without it, I get
>>>>> random failure during tests with high load, please consider
>>>>> reintroduce it.
>>>>>
>>>>> My guess is that there is a race between the two socats when we leave
>>>>> them to create the channel, better return to the previous behavior.
>>>>>
>>>>> I can't reproduce the problem when I run ./test-io-channel-command
>>>>> test alone, I need to do the make check. And any (unrelated) change
>>>>> can make it dissapear.
>>>>
>>>> I was chasing a similar problem with this test although I don't see it
>>>> timeout while running (I don't think our unit tests time out). I'm
>>>> provisionally queuing this to testing/next unless anyone objects.
>>>
>>> It won't build on Win32 since that platform lacks mkfifo.
>>>
>>> The test normally works since socat will call mknod to create
>>> the fifo.
>>>
>>> I think the problem is that we have a race condition where the
>>> client socat runs before the server socat, and so won't see the
>>> fifo. This will be where high load triggers problems.
>> Ok I shall drop the patch from testing/next - we need a better
>> solution.
>
> Could we maybe at least revert the patch that introduced the problem?
> ... the failing test is annoying ...
I'm trying to fix it now but I haven't been able to get one of the
socat's to not race with the other:
--8<---------------cut here---------------start------------->8---
modified tests/unit/test-io-channel-command.c
@@ -19,6 +19,7 @@
*/
#include "qemu/osdep.h"
+#include <glib/gstdio.h>
#include "io/channel-command.h"
#include "io-channel-helpers.h"
#include "qapi/error.h"
@@ -33,25 +34,26 @@ static char *socat = NULL;
static void test_io_channel_command_fifo(bool async)
{
+ g_autofree gchar *tmpdir = g_dir_make_tmp("qemu-test-io-channel.XXXXXX", NULL);
+ g_autofree gchar *fifo = g_strdup_printf("%s/%s", tmpdir, TEST_FIFO);
+ g_autoptr(GString) srcargs = g_string_new(socat);
+ g_autoptr(GString) dstargs = g_string_new(socat);
+ g_auto(GStrv) srcargv;
+ g_auto(GStrv) dstargv;
QIOChannel *src, *dst;
QIOChannelTest *test;
- const char *srcargv[] = {
- socat, "-", SOCAT_SRC, NULL,
- };
- const char *dstargv[] = {
- socat, SOCAT_DST, "-", NULL,
- };
- if (!socat) {
- g_test_skip("socat is not found in PATH");
- return;
- }
+ g_string_append_printf(srcargs, " - PIPE:%s,wronly,creat=1,excl=1", fifo);
+ g_string_append_printf(dstargs, " PIPE:%s,rdonly,creat=0 -", fifo);
+
+ srcargv = g_strsplit(srcargs->str, " ", -1);
+ dstargv = g_strsplit(dstargs->str, " ", -1);
- unlink(TEST_FIFO);
- src = QIO_CHANNEL(qio_channel_command_new_spawn(srcargv,
+ src = QIO_CHANNEL(qio_channel_command_new_spawn((const char**) srcargv,
O_WRONLY,
&error_abort));
- dst = QIO_CHANNEL(qio_channel_command_new_spawn(dstargv,
+
+ dst = QIO_CHANNEL(qio_channel_command_new_spawn((const char**) dstargv,
O_RDONLY,
&error_abort));
@@ -62,17 +64,27 @@ static void test_io_channel_command_fifo(bool async)
object_unref(OBJECT(src));
object_unref(OBJECT(dst));
- unlink(TEST_FIFO);
+ g_rmdir(tmpdir);
}
static void test_io_channel_command_fifo_async(void)
{
+ if (!socat) {
+ g_test_skip("socat is not found in PATH");
+ return;
+ }
+
test_io_channel_command_fifo(true);
}
static void test_io_channel_command_fifo_sync(void)
{
+ if (!socat) {
+ g_test_skip("socat is not found in PATH");
+ return;
+ }
+
test_io_channel_command_fifo(false);
}
--8<---------------cut here---------------end--------------->8---
>
> Thomas
--
Alex Bennée
On Thu, Oct 27, 2022 at 02:59:04PM +0100, Alex Bennée wrote: > > Thomas Huth <thuth@redhat.com> writes: > > > On 26/10/2022 18.18, Alex Bennée wrote: > >> Daniel P. Berrangé <berrange@redhat.com> writes: > >> > >>> CC'ing Marc-André as original author of the change > >>> > >>> On Tue, Oct 25, 2022 at 01:57:23PM +0100, Alex Bennée wrote: > >>>> > >>>> Juan Quintela <quintela@redhat.com> writes: > >>>> > >>>>> Previous commit removed the creation of the fifo. Without it, I get > >>>>> random failure during tests with high load, please consider > >>>>> reintroduce it. > >>>>> > >>>>> My guess is that there is a race between the two socats when we leave > >>>>> them to create the channel, better return to the previous behavior. > >>>>> > >>>>> I can't reproduce the problem when I run ./test-io-channel-command > >>>>> test alone, I need to do the make check. And any (unrelated) change > >>>>> can make it dissapear. > >>>> > >>>> I was chasing a similar problem with this test although I don't see it > >>>> timeout while running (I don't think our unit tests time out). I'm > >>>> provisionally queuing this to testing/next unless anyone objects. > >>> > >>> It won't build on Win32 since that platform lacks mkfifo. > >>> > >>> The test normally works since socat will call mknod to create > >>> the fifo. > >>> > >>> I think the problem is that we have a race condition where the > >>> client socat runs before the server socat, and so won't see the > >>> fifo. This will be where high load triggers problems. > >> Ok I shall drop the patch from testing/next - we need a better > >> solution. > > > > Could we maybe at least revert the patch that introduced the problem? > > ... the failing test is annoying ... > > I'm trying to fix it now but I haven't been able to get one of the > socat's to not race with the other: My other thought was to run a socat first, with 'unlink-close=0' and make it exit immediately, simply to get the fifo created. I'm not sure how to make ti exit immediately though, with a guarantee that its had a chance to create the fifo. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
© 2016 - 2026 Red Hat, Inc.