[v1] iotests: Let 233 run concurrently

[Qemu-devel] [PATCH 5/5] iotests: Let 233 run concurrently

Posted by Max Reitz 6 years, 9 months ago

common.nbd's nbd_server_set_tcp_port() tries to find a free port, and
then uses it for the whole test run.  However, this is racey because
even if the port was free at the beginning, there is no guarantee it
will continue to be available.  Therefore, 233 currently cannot reliably
be run concurrently with other NBD TCP tests.

This patch addresses the problem by dropping nbd_server_set_tcp_port(),
and instead finding a new port every time nbd_server_start_tcp_socket()
is invoked.  For this, we run qemu-nbd with --fork and on error evaluate
the output to see whether it contains "Address already in use".  If so,
we try the next port.

On success, we still want to continually redirect the output from
qemu-nbd to stderr.  To achieve both, we redirect qemu-nbd's stderr to a
FIFO that we then open in bash.  If the parent process exits with status
0 (which means that the server has started successfully), we launch a
background cat process that copies the FIFO to stderr.  On failure, we
read the whole content into a variable and then evaluate it.

While at it, use --fork in nbd_server_start_unix_socket(), too.  Doing
so allows us to drop nbd_server_wait_for_*_socket().

Note that the reason common.nbd did not use --fork before is that
qemu-nbd did not have --pid-file.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/233        |  1 -
 tests/qemu-iotests/common.nbd | 93 ++++++++++++++++-------------------
 2 files changed, 42 insertions(+), 52 deletions(-)

diff --git a/tests/qemu-iotests/233 b/tests/qemu-iotests/233
index b8b6c8cc4c..8682ea277c 100755
--- a/tests/qemu-iotests/233
+++ b/tests/qemu-iotests/233
@@ -50,7 +50,6 @@ _supported_proto file
 _supported_os Linux
 _require_command QEMU_NBD
 
-nbd_server_set_tcp_port
 tls_x509_init
 
 echo
diff --git a/tests/qemu-iotests/common.nbd b/tests/qemu-iotests/common.nbd
index 25fc9ffaa4..e003478a57 100644
--- a/tests/qemu-iotests/common.nbd
+++ b/tests/qemu-iotests/common.nbd
@@ -22,6 +22,11 @@
 nbd_unix_socket="${TEST_DIR}/qemu-nbd.sock"
 nbd_tcp_addr="127.0.0.1"
 nbd_pid_file="${TEST_DIR}/qemu-nbd.pid"
+nbd_stderr_fifo="${TEST_DIR}/qemu-nbd.fifo"
+
+# If bash version is >= 4.1, this will be overwritten by a dynamically
+# assigned file descriptor value.
+nbd_fifo_fd=10
 
 nbd_server_stop()
 {
@@ -34,76 +39,62 @@ nbd_server_stop()
         fi
     fi
     rm -f "$nbd_unix_socket"
-}
-
-nbd_server_wait_for_unix_socket()
-{
-    pid=$1
-
-    for ((i = 0; i < 300; i++))
-    do
-        if [ -r "$nbd_unix_socket" ]; then
-            return
-        fi
-        kill -s 0 $pid 2>/dev/null
-        if test $? != 0
-        then
-            echo "qemu-nbd unexpectedly quit"
-            exit 1
-        fi
-        sleep 0.1
-    done
-    echo "Failed in check of unix socket created by qemu-nbd"
-    exit 1
+    rm -f "$nbd_stderr_fifo"
 }
 
 nbd_server_start_unix_socket()
 {
     nbd_server_stop
-    $QEMU_NBD -v -t -k "$nbd_unix_socket" "$@" &
-    nbd_server_wait_for_unix_socket $!
+    $QEMU_NBD -v -t -k "$nbd_unix_socket" --fork "$@"
 }
 
-nbd_server_set_tcp_port()
+nbd_server_start_tcp_socket()
 {
-    (ss --help) >/dev/null 2>&1 || _notrun "ss utility not found, skipping test"
+    nbd_server_stop
 
+    mkfifo "$nbd_stderr_fifo"
     for ((port = 10809; port <= 10909; port++))
     do
-        if ! ss -tln | grep -sqE ":$port\b"; then
+        # Redirect stderr to FIFO, so we can later decide whether we
+        # want to read it or to redirect it to our stderr, depending
+        # on whether the command fails or not
+        $QEMU_NBD -v -t -b $nbd_tcp_addr -p $port --fork "$@" \
+            2> "$nbd_stderr_fifo" &
+
+        # Taken from common.qemu
+        if [[ "${BASH_VERSINFO[0]}" -ge "5" ||
+            ("${BASH_VERSINFO[0]}" -ge "4" && "${BASH_VERSINFO[1]}" -ge "1") ]]
+        then
+            exec {nbd_fifo_fd}<"$nbd_stderr_fifo"
+        else
+            let _nbd_fifo_fd++
+            eval "exec ${_nbd_fifo_fd}<'$nbd_stderr_fifo'"
+        fi
+        wait $!
+
+        if test $? == 0
+        then
+            # Success, redirect qemu-nbd's stderr to our stderr
             nbd_tcp_port=$port
+            (cat <&$nbd_fifo_fd >&2) &
+            eval "exec $nbd_fifo_fd>&-"
             return
         fi
-    done
 
-    echo "Cannot find free TCP port for nbd in range 10809-10909"
-    exit 1
-}
-
-nbd_server_wait_for_tcp_socket()
-{
-    pid=$1
+        # Failure, read the output
+        output=$(cat <&$nbd_fifo_fd)
+        eval "exec $nbd_fifo_fd>&-"
 
-    for ((i = 0; i < 300; i++))
-    do
-        if ss -tln | grep -sqE ":$nbd_tcp_port\b"; then
-            return
-        fi
-        kill -s 0 $pid 2>/dev/null
-        if test $? != 0
+        if ! echo "$output" | grep -q "Address already in use"
         then
-            echo "qemu-nbd unexpectedly quit"
+            # Unknown error, print it
+            echo "$output" >&2
+            rm -f "$nbd_stderr_fifo"
             exit 1
         fi
-        sleep 0.1
     done
-    echo "Failed in check of TCP socket created by qemu-nbd"
-    exit 1
-}
 
-nbd_server_start_tcp_socket()
-{
-    nbd_server_stop
-    $QEMU_NBD -v -t -b $nbd_tcp_addr -p $nbd_tcp_port "$@" &
-    nbd_server_wait_for_tcp_socket $!
+    echo "Cannot find free TCP port for nbd in range 10809-10909"
+    rm -f "$nbd_stderr_fifo"
+    exit 1
 }
-- 
2.20.1

Re: [Qemu-devel] [PATCH 5/5] iotests: Let 233 run concurrently

Posted by Eric Blake 6 years, 9 months ago

On 5/7/19 1:36 PM, Max Reitz wrote:
> common.nbd's nbd_server_set_tcp_port() tries to find a free port, and
> then uses it for the whole test run.  However, this is racey because

racy

> even if the port was free at the beginning, there is no guarantee it
> will continue to be available.  Therefore, 233 currently cannot reliably
> be run concurrently with other NBD TCP tests.
> 
> This patch addresses the problem by dropping nbd_server_set_tcp_port(),
> and instead finding a new port every time nbd_server_start_tcp_socket()
> is invoked.  For this, we run qemu-nbd with --fork and on error evaluate
> the output to see whether it contains "Address already in use".  If so,
> we try the next port.
> 
> On success, we still want to continually redirect the output from
> qemu-nbd to stderr.  To achieve both, we redirect qemu-nbd's stderr to a
> FIFO that we then open in bash.  If the parent process exits with status
> 0 (which means that the server has started successfully), we launch a
> background cat process that copies the FIFO to stderr.  On failure, we
> read the whole content into a variable and then evaluate it.
> 
> While at it, use --fork in nbd_server_start_unix_socket(), too.  Doing
> so allows us to drop nbd_server_wait_for_*_socket().
> 
> Note that the reason common.nbd did not use --fork before is that
> qemu-nbd did not have --pid-file.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  tests/qemu-iotests/233        |  1 -
>  tests/qemu-iotests/common.nbd | 93 ++++++++++++++++-------------------
>  2 files changed, 42 insertions(+), 52 deletions(-)
> 

> @@ -34,76 +39,62 @@ nbd_server_stop()
>          fi
>      fi
>      rm -f "$nbd_unix_socket"
> -}
> -
> -nbd_server_wait_for_unix_socket()
> -{
...
> -    echo "Failed in check of unix socket created by qemu-nbd"
> -    exit 1
> +    rm -f "$nbd_stderr_fifo"

You could use a single 'rm -f "$nbd_unix_socket" "$nbd_stderr_fifo"'.
That's cosmetic, though.

Are we sure that even on failure, our fifo will not fill up and cause
deadlock? If the failing qemu-nbd has so much output as to be non-atomic
so that it blocks waiting for a reader, but we don't read anything until
after qemu-nbd exits after forking the daemon, then we have deadlock.
But in the common case, I don't think qemu-nbd ever spits out that much
in errors, even when it fails to start whether due to a socket in use or
for other reasons.  And even if it does hang, it is our testsuite (and
our CI tools will probably notice it), rather than our main code.

Otherwise, it's a lot of shell code with quite a few bash-isms, but we
already require bash, and I didn't spot anything blatantly wrong.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH 5/5] iotests: Let 233 run concurrently

Posted by Max Reitz 6 years, 9 months ago

On 07.05.19 22:38, Eric Blake wrote:
> On 5/7/19 1:36 PM, Max Reitz wrote:
>> common.nbd's nbd_server_set_tcp_port() tries to find a free port, and
>> then uses it for the whole test run.  However, this is racey because
> 
> racy
> 
>> even if the port was free at the beginning, there is no guarantee it
>> will continue to be available.  Therefore, 233 currently cannot reliably
>> be run concurrently with other NBD TCP tests.
>>
>> This patch addresses the problem by dropping nbd_server_set_tcp_port(),
>> and instead finding a new port every time nbd_server_start_tcp_socket()
>> is invoked.  For this, we run qemu-nbd with --fork and on error evaluate
>> the output to see whether it contains "Address already in use".  If so,
>> we try the next port.
>>
>> On success, we still want to continually redirect the output from
>> qemu-nbd to stderr.  To achieve both, we redirect qemu-nbd's stderr to a
>> FIFO that we then open in bash.  If the parent process exits with status
>> 0 (which means that the server has started successfully), we launch a
>> background cat process that copies the FIFO to stderr.  On failure, we
>> read the whole content into a variable and then evaluate it.
>>
>> While at it, use --fork in nbd_server_start_unix_socket(), too.  Doing
>> so allows us to drop nbd_server_wait_for_*_socket().
>>
>> Note that the reason common.nbd did not use --fork before is that
>> qemu-nbd did not have --pid-file.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>  tests/qemu-iotests/233        |  1 -
>>  tests/qemu-iotests/common.nbd | 93 ++++++++++++++++-------------------
>>  2 files changed, 42 insertions(+), 52 deletions(-)
>>
> 
>> @@ -34,76 +39,62 @@ nbd_server_stop()
>>          fi
>>      fi
>>      rm -f "$nbd_unix_socket"
>> -}
>> -
>> -nbd_server_wait_for_unix_socket()
>> -{
> ...
>> -    echo "Failed in check of unix socket created by qemu-nbd"
>> -    exit 1
>> +    rm -f "$nbd_stderr_fifo"
> 
> You could use a single 'rm -f "$nbd_unix_socket" "$nbd_stderr_fifo"'.
> That's cosmetic, though.
> 
> Are we sure that even on failure, our fifo will not fill up and cause
> deadlock? If the failing qemu-nbd has so much output as to be non-atomic
> so that it blocks waiting for a reader, but we don't read anything until
> after qemu-nbd exits after forking the daemon, then we have deadlock.

Hm, right.  I don’t think it will happen, but if it does, it won’t be
because of an “Address already in use”.  So if it did happen, the test
should fail anyway.

Of course, a hang is not the nicest way to fail a test, but I think as
long as we don’t think it will be a problem, it should be fine.

(The alternative I can think of would be to start a background cat that
copies data over to a log file, and then kill it after the qemu-nbd
parent process has exited.  On error, we read the log; on success, we
print it to stderr and then start the cat from nbd_stderr_fifo to stderr.)

> But in the common case, I don't think qemu-nbd ever spits out that much
> in errors, even when it fails to start whether due to a socket in use or
> for other reasons.  And even if it does hang, it is our testsuite (and
> our CI tools will probably notice it), rather than our main code.
> 
> Otherwise, it's a lot of shell code with quite a few bash-isms, but we
> already require bash, and I didn't spot anything blatantly wrong.
> 
> Reviewed-by: Eric Blake <eblake@redhat.com>

Thanks again!

I’ll prepare the v2.

Max

[Qemu-devel] [PATCH 1/5] qemu-nbd: Add --pid-file option
[Qemu-devel] [PATCH 2/5] iotests.py: Add qemu_nbd_early_pipe()
[Qemu-devel] [PATCH 3/5] qemu-nbd: Do not close stderr
[Qemu-devel] [PATCH 4/5] iotests: Use qemu-nbd's --pid-file
[Qemu-devel] [PATCH 5/5] iotests: Let 233 run concurrently