tests/functional/migration.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
The migration_with_exec test is failing sporadically for all
architectures due to a race when the destination socat process takes
too long to start listening while the source process is already
issuing connect().
The race is inherent because the exec: migration spawns the
to-be-exec'ed command asynchronously and returns from the
migrate-incoming command. The localhost-only testcase is not
representative of the majority of migrations. In a real scenario
between two different hosts that race wouldn't happen.
Fix the testcase by configuring the source socat command to wait
indefinitely while trying to connect.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
tests/functional/migration.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tests/functional/migration.py b/tests/functional/migration.py
index 144f091ba8..3b7674af3b 100644
--- a/tests/functional/migration.py
+++ b/tests/functional/migration.py
@@ -85,5 +85,5 @@ def migration_with_exec(self):
with Ports() as ports:
free_port = self._get_free_port(ports)
dst_uri = 'exec:socat TCP-LISTEN:%u -' % free_port
- src_uri = 'exec:socat - TCP:localhost:%u' % free_port
+ src_uri = 'exec:socat - TCP:localhost:%u,forever' % free_port
self.migrate(dst_uri, src_uri)
--
2.51.0
On 23/04/2026 01.00, Fabiano Rosas wrote: > The migration_with_exec test is failing sporadically for all > architectures due to a race when the destination socat process takes > too long to start listening while the source process is already > issuing connect(). > > The race is inherent because the exec: migration spawns the > to-be-exec'ed command asynchronously and returns from the > migrate-incoming command. The localhost-only testcase is not > representative of the majority of migrations. In a real scenario > between two different hosts that race wouldn't happen. > > Fix the testcase by configuring the source socat command to wait > indefinitely while trying to connect. > > Signed-off-by: Fabiano Rosas <farosas@suse.de> > --- > tests/functional/migration.py | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/tests/functional/migration.py b/tests/functional/migration.py > index 144f091ba8..3b7674af3b 100644 > --- a/tests/functional/migration.py > +++ b/tests/functional/migration.py > @@ -85,5 +85,5 @@ def migration_with_exec(self): > with Ports() as ports: > free_port = self._get_free_port(ports) > dst_uri = 'exec:socat TCP-LISTEN:%u -' % free_port > - src_uri = 'exec:socat - TCP:localhost:%u' % free_port > + src_uri = 'exec:socat - TCP:localhost:%u,forever' % free_port Maybe use "retry=90" instead? OTOH, we have the high level timeout from the meson runner, so we should not hang here forever by accident anyway. Thus: Reviewed-by: Thomas Huth <thuth@redhat.com>
On Thu, Apr 23, 2026 at 06:52:19AM +0200, Thomas Huth wrote: > On 23/04/2026 01.00, Fabiano Rosas wrote: > > The migration_with_exec test is failing sporadically for all > > architectures due to a race when the destination socat process takes > > too long to start listening while the source process is already > > issuing connect(). > > > > The race is inherent because the exec: migration spawns the > > to-be-exec'ed command asynchronously and returns from the > > migrate-incoming command. The localhost-only testcase is not > > representative of the majority of migrations. In a real scenario > > between two different hosts that race wouldn't happen. > > > > Fix the testcase by configuring the source socat command to wait > > indefinitely while trying to connect. > > > > Signed-off-by: Fabiano Rosas <farosas@suse.de> > > --- > > tests/functional/migration.py | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/tests/functional/migration.py b/tests/functional/migration.py > > index 144f091ba8..3b7674af3b 100644 > > --- a/tests/functional/migration.py > > +++ b/tests/functional/migration.py > > @@ -85,5 +85,5 @@ def migration_with_exec(self): > > with Ports() as ports: > > free_port = self._get_free_port(ports) > > dst_uri = 'exec:socat TCP-LISTEN:%u -' % free_port > > - src_uri = 'exec:socat - TCP:localhost:%u' % free_port > > + src_uri = 'exec:socat - TCP:localhost:%u,forever' % free_port > > Maybe use "retry=90" instead? OTOH, we have the high level timeout from the > meson runner, so we should not hang here forever by accident anyway. Yes, IMHO it'll be good to stick with one timeout mechanism rather than adding more magical timeouts. > > Thus: > Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> -- Peter Xu
On 23/4/26 17:02, Peter Xu wrote: > On Thu, Apr 23, 2026 at 06:52:19AM +0200, Thomas Huth wrote: >> On 23/04/2026 01.00, Fabiano Rosas wrote: >>> The migration_with_exec test is failing sporadically for all >>> architectures due to a race when the destination socat process takes >>> too long to start listening while the source process is already >>> issuing connect(). >>> >>> The race is inherent because the exec: migration spawns the >>> to-be-exec'ed command asynchronously and returns from the >>> migrate-incoming command. The localhost-only testcase is not >>> representative of the majority of migrations. In a real scenario >>> between two different hosts that race wouldn't happen. >>> >>> Fix the testcase by configuring the source socat command to wait >>> indefinitely while trying to connect. >>> >>> Signed-off-by: Fabiano Rosas <farosas@suse.de> >>> --- >>> tests/functional/migration.py | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/tests/functional/migration.py b/tests/functional/migration.py >>> index 144f091ba8..3b7674af3b 100644 >>> --- a/tests/functional/migration.py >>> +++ b/tests/functional/migration.py >>> @@ -85,5 +85,5 @@ def migration_with_exec(self): >>> with Ports() as ports: >>> free_port = self._get_free_port(ports) >>> dst_uri = 'exec:socat TCP-LISTEN:%u -' % free_port >>> - src_uri = 'exec:socat - TCP:localhost:%u' % free_port >>> + src_uri = 'exec:socat - TCP:localhost:%u,forever' % free_port >> >> Maybe use "retry=90" instead? OTOH, we have the high level timeout from the >> meson runner, so we should not hang here forever by accident anyway. > > Yes, IMHO it'll be good to stick with one timeout mechanism rather than > adding more magical timeouts. > >> >> Thus: >> Reviewed-by: Thomas Huth <thuth@redhat.com> > > Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
© 2016 - 2026 Red Hat, Inc.