[RFC PATCH v1] io/channel-socket: abort socket reads after a force shutdown request

Daniil Tatianin posted 1 patch 5 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20250910085006.69790-1-d-tatianin@yandex-team.ru
Maintainers: "Daniel P. Berrangé" <berrange@redhat.com>
io/channel-socket.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
[RFC PATCH v1] io/channel-socket: abort socket reads after a force shutdown request
Posted by Daniil Tatianin 5 months ago
When a user chooses to force shutdown QEMU by pressing ^C or sending a
SIGINT otherwise, we want to shutdown as soon as possible, and entering
a blocking read which happens in the main thread seems like the opposite
of that.

This may seem like a rare case, but it is actually not when using
vhost-user devices, which usually have the control plane working via
UNIX sockets.

The way the code is currently written, all vhost-user devices are
serviced in the main thread and thus block each other, as well as other
things that happen in the QEMU's main thread, including QMP, and even
network devices that are not vhost-net.

In case the vhost-user backend servicing a device decides to hang for
whatever reason, any control plane request in QEMU will also hang the
main loop until the backend either dies or ends up replying.

Ideally the vhost-user handling code should be rewritten to work
asynchronously, or to support io-threads or similar, but that seems like
a giant undertaking and we would like to at least be able to make QEMU
shutdown no matter if a vhost-user backend is currently able to service
the control plane or not.

Luckily for us, SIGINT or similar causes the kernel to cancel (almost)
all blocking syscalls with EINTR, which we can utilize to check whether
a shutdown was requested while we were blocked in the syscall, which is
what this commit does. The check is performed even on the first attempt,
not only retries after EINTR. This is intentional to avoid race
conditions where QEMU may decide to perform a control plane request
before the shutdown event is checked for thus forcing the user to send
SIGINT at least 2 times.

Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
---
v0 -> v1:
- Fix code alignment
- Fix included header
---
 io/channel-socket.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/io/channel-socket.c b/io/channel-socket.c
index 3b7ca924ff..74238b511a 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -26,10 +26,20 @@
 #include "io/channel-watch.h"
 #include "trace.h"
 #include "qapi/clone-visitor.h"
+#include "system/runstate.h"
 #ifdef CONFIG_LINUX
 #include <linux/errqueue.h>
 #include <sys/socket.h>
 
+/*
+ * This function is not available when io links against qemu-img etc.,
+ * in this case just pretend it always returns false.
+ */
+__attribute__((weak)) bool qemu_force_shutdown_requested(void)
+{
+    return false;
+}
+
 #if (defined(MSG_ZEROCOPY) && defined(SO_ZEROCOPY))
 #define QEMU_MSG_ZEROCOPY
 #endif
@@ -541,6 +551,12 @@ static ssize_t qio_channel_socket_readv(QIOChannel *ioc,
     }
 
  retry:
+    if (qemu_force_shutdown_requested()) {
+        error_setg_errno(errp, ECANCELED,
+                         "Socket read aborted due to force shutdown");
+        return -1;
+    }
+
     ret = recvmsg(sioc->fd, &msg, sflags);
     if (ret < 0) {
         if (errno == EAGAIN) {
-- 
2.34.1
Re: [RFC PATCH v1] io/channel-socket: abort socket reads after a force shutdown request
Posted by Daniel P. Berrangé 1 month ago
On Wed, Sep 10, 2025 at 11:50:06AM +0300, Daniil Tatianin wrote:
> When a user chooses to force shutdown QEMU by pressing ^C or sending a
> SIGINT otherwise, we want to shutdown as soon as possible, and entering
> a blocking read which happens in the main thread seems like the opposite
> of that.
> 
> This may seem like a rare case, but it is actually not when using
> vhost-user devices, which usually have the control plane working via
> UNIX sockets.
> 
> The way the code is currently written, all vhost-user devices are
> serviced in the main thread and thus block each other, as well as other
> things that happen in the QEMU's main thread, including QMP, and even
> network devices that are not vhost-net.
> 
> In case the vhost-user backend servicing a device decides to hang for
> whatever reason, any control plane request in QEMU will also hang the
> main loop until the backend either dies or ends up replying.
> 
> Ideally the vhost-user handling code should be rewritten to work
> asynchronously, or to support io-threads or similar, but that seems like
> a giant undertaking and we would like to at least be able to make QEMU
> shutdown no matter if a vhost-user backend is currently able to service
> the control plane or not.

This sounds like rather a major impl flaw in the vhost-user code
IMHO. Anything that is running in the main thread needs to have
very fast execution times, without any blocking on I/O, to avoid
starving the main event loop. The event loop starvation is really
a much bigger problem than the delayed shutdown on ctrl-c.

I'm not familiar with the vhost-user code, but can you share an
example of the stack trace you see when it is blocked on an I/O
op and preventing shutdown ?

I'm very dis-inclined to special case the channel-socket.c code
to workaruond a bug in a single part of qemu, given that this
code is used universally across everything doing socket I/O.

Some of these parts of QEMU potentially want to be able to do
a clean shutdown of their I/O layer when QEMU exit, and not
restarting on EINTR might negatively impact them. IOW, this
proposed workaround may well simply move the brokness from
one place to another place and not leave us better overall.

> Luckily for us, SIGINT or similar causes the kernel to cancel (almost)
> all blocking syscalls with EINTR, which we can utilize to check whether
> a shutdown was requested while we were blocked in the syscall, which is
> what this commit does. The check is performed even on the first attempt,
> not only retries after EINTR. This is intentional to avoid race
> conditions where QEMU may decide to perform a control plane request
> before the shutdown event is checked for thus forcing the user to send
> SIGINT at least 2 times.
> 
> Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
> ---
> v0 -> v1:
> - Fix code alignment
> - Fix included header
> ---
>  io/channel-socket.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/io/channel-socket.c b/io/channel-socket.c
> index 3b7ca924ff..74238b511a 100644
> --- a/io/channel-socket.c
> +++ b/io/channel-socket.c
> @@ -26,10 +26,20 @@
>  #include "io/channel-watch.h"
>  #include "trace.h"
>  #include "qapi/clone-visitor.h"
> +#include "system/runstate.h"
>  #ifdef CONFIG_LINUX
>  #include <linux/errqueue.h>
>  #include <sys/socket.h>
>  
> +/*
> + * This function is not available when io links against qemu-img etc.,
> + * in this case just pretend it always returns false.
> + */
> +__attribute__((weak)) bool qemu_force_shutdown_requested(void)
> +{
> +    return false;
> +}
> +
>  #if (defined(MSG_ZEROCOPY) && defined(SO_ZEROCOPY))
>  #define QEMU_MSG_ZEROCOPY
>  #endif
> @@ -541,6 +551,12 @@ static ssize_t qio_channel_socket_readv(QIOChannel *ioc,
>      }
>  
>   retry:
> +    if (qemu_force_shutdown_requested()) {
> +        error_setg_errno(errp, ECANCELED,
> +                         "Socket read aborted due to force shutdown");
> +        return -1;
> +    }
> +
>      ret = recvmsg(sioc->fd, &msg, sflags);
>      if (ret < 0) {
>          if (errno == EAGAIN) {
> -- 
> 2.34.1
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|