[PATCH 2/2] migration: fix workaround for gnutls thread safety

Daniel P. Berrangé posted 2 patches 3 months, 2 weeks ago
Maintainers: "Daniel P. Berrangé" <berrange@redhat.com>, Peter Xu <peterx@redhat.com>, Fabiano Rosas <farosas@suse.de>
[PATCH 2/2] migration: fix workaround for gnutls thread safety
Posted by Daniel P. Berrangé 3 months, 2 weeks ago
In previous commits

  eb3618e9 migration: activate TLS thread safety workaround
  edea8183 io: add support for activating TLS thread safety workaround
  24ad5e19 crypto: implement workaround for GNUTLS thread safety problems

an attempt was made to workaround broken gnutls thread safety when
TLS 1.3 rekeying is performed.

Those patches acquired locks before calling gnutls_record_{send|recv}
but temporarily dropped the locks in the push/pull functions, in the
mistaken belief that there was a race inside gnutls that did not cross
execution of the push/pull functions.

A non-deterministic reproducer mislead into thinking the workaround
was operating as expected, but this was wrong. Juraj demonstrated
that QEMU would still see errors from GNUTLS as well as crashes.

The issue is that a pointer to internal state is saved before the
the push/pull functions are called, and after they return this
saved pointer is potentially invalid. IOW, it is never safe to
temporarily drop the mutexes inside the push/pull functions. The
lock must be held throughout execution of gnutls_record_{send|recv}.

This would be possible with QEMU migration, except that the return
path thread sits in a blocking read waiting for data that very
rarely arrives from the destination QEMU. This blocks ability to
send any migration data in the other thread.

It is possible to workaround this issue, however, by proactively
calling poll() to check for available incoming data before trying
the qio_channel_read() call.

Reported-by: Juraj Marcin <jmarcin@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 crypto/tlssession.c   | 16 ----------------
 migration/qemu-file.c | 16 ++++++++++++++++
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/crypto/tlssession.c b/crypto/tlssession.c
index 86d407a142..7e11317528 100644
--- a/crypto/tlssession.c
+++ b/crypto/tlssession.c
@@ -95,19 +95,11 @@ qcrypto_tls_session_push(void *opaque, const void *buf, size_t len)
         return -1;
     };
 
-    if (session->lockEnabled) {
-        qemu_mutex_unlock(&session->lock);
-    }
-
     error_free(session->werr);
     session->werr = NULL;
 
     ret = session->writeFunc(buf, len, session->opaque, &session->werr);
 
-    if (session->lockEnabled) {
-        qemu_mutex_lock(&session->lock);
-    }
-
     if (ret == QCRYPTO_TLS_SESSION_ERR_BLOCK) {
         errno = EAGAIN;
         return -1;
@@ -134,16 +126,8 @@ qcrypto_tls_session_pull(void *opaque, void *buf, size_t len)
     error_free(session->rerr);
     session->rerr = NULL;
 
-    if (session->lockEnabled) {
-        qemu_mutex_unlock(&session->lock);
-    }
-
     ret = session->readFunc(buf, len, session->opaque, &session->rerr);
 
-    if (session->lockEnabled) {
-        qemu_mutex_lock(&session->lock);
-    }
-
     if (ret == QCRYPTO_TLS_SESSION_ERR_BLOCK) {
         errno = EAGAIN;
         return -1;
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 8ee44c5ac9..cf6115e699 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -338,6 +338,22 @@ static ssize_t coroutine_mixed_fn qemu_fill_buffer(QEMUFile *f)
         return 0;
     }
 
+    /*
+     * This feature triggers acquisition of mutexes around every
+     * read and write. Thus we must not sit in a blocking read
+     * if this is set, but must instead poll proactively. This
+     * does not work with some channel types, however, so must
+     * only pre-poll when the featre is set.
+     */
+    if (qio_channel_has_feature(f->ioc,
+                                QIO_CHANNEL_FEATURE_CONCURRENT_IO)) {
+        if (qemu_in_coroutine()) {
+            qio_channel_yield(f->ioc, G_IO_IN);
+        } else {
+            qio_channel_wait(f->ioc, G_IO_IN);
+        }
+    }
+
     do {
         struct iovec iov = { f->buf + pending, IO_BUF_SIZE - pending };
         len = qio_channel_readv_full(f->ioc, &iov, 1, pfds, pnfd, 0,
-- 
2.50.1


Re: [PATCH 2/2] migration: fix workaround for gnutls thread safety
Posted by Fabiano Rosas 3 months, 1 week ago
Daniel P. Berrangé <berrange@redhat.com> writes:

> In previous commits
>
>   eb3618e9 migration: activate TLS thread safety workaround
>   edea8183 io: add support for activating TLS thread safety workaround
>   24ad5e19 crypto: implement workaround for GNUTLS thread safety problems
>
> an attempt was made to workaround broken gnutls thread safety when
> TLS 1.3 rekeying is performed.
>
> Those patches acquired locks before calling gnutls_record_{send|recv}
> but temporarily dropped the locks in the push/pull functions, in the
> mistaken belief that there was a race inside gnutls that did not cross
> execution of the push/pull functions.
>
> A non-deterministic reproducer mislead into thinking the workaround
> was operating as expected, but this was wrong. Juraj demonstrated
> that QEMU would still see errors from GNUTLS as well as crashes.
>
> The issue is that a pointer to internal state is saved before the
> the push/pull functions are called, and after they return this
> saved pointer is potentially invalid. IOW, it is never safe to
> temporarily drop the mutexes inside the push/pull functions. The
> lock must be held throughout execution of gnutls_record_{send|recv}.
>
> This would be possible with QEMU migration, except that the return
> path thread sits in a blocking read waiting for data that very
> rarely arrives from the destination QEMU. This blocks ability to
> send any migration data in the other thread.
>
> It is possible to workaround this issue, however, by proactively
> calling poll() to check for available incoming data before trying
> the qio_channel_read() call.
>
> Reported-by: Juraj Marcin <jmarcin@redhat.com>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Re: [PATCH 2/2] migration: fix workaround for gnutls thread safety
Posted by Prasad Pandit 3 months, 1 week ago
On Sat, 2 Aug 2025 at 00:09, Daniel P. Berrangé <berrange@redhat.com> wrote:
>   eb3618e9 migration: activate TLS thread safety workaround
>   edea8183 io: add support for activating TLS thread safety workaround
>   24ad5e19 crypto: implement workaround for GNUTLS thread safety problems
>
> an attempt was made to workaround broken gnutls thread safety when
> TLS 1.3 rekeying is performed.
>
> Those patches acquired locks before calling gnutls_record_{send|recv}
> but temporarily dropped the locks in the push/pull functions, in the
> mistaken belief that there was a race inside gnutls that did not cross
> execution of the push/pull functions.
>
> A non-deterministic reproducer mislead into thinking the workaround
> was operating as expected, but this was wrong. Juraj demonstrated
> that QEMU would still see errors from GNUTLS as well as crashes.
>
> The issue is that a pointer to internal state is saved before the
> the push/pull functions are called, and after they return this
> saved pointer is potentially invalid. IOW, it is never safe to
> temporarily drop the mutexes inside the push/pull functions. The
> lock must be held throughout execution of gnutls_record_{send|recv}.
>
> This would be possible with QEMU migration, except that the return
> path thread sits in a blocking read waiting for data that very
> rarely arrives from the destination QEMU. This blocks ability to
> send any migration data in the other thread.
>
> It is possible to workaround this issue, however, by proactively
> calling poll() to check for available incoming data before trying
> the qio_channel_read() call.
>
> Reported-by: Juraj Marcin <jmarcin@redhat.com>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>  crypto/tlssession.c   | 16 ----------------
>  migration/qemu-file.c | 16 ++++++++++++++++
>  2 files changed, 16 insertions(+), 16 deletions(-)
>
> diff --git a/crypto/tlssession.c b/crypto/tlssession.c
> index 86d407a142..7e11317528 100644
> --- a/crypto/tlssession.c
> +++ b/crypto/tlssession.c
> @@ -95,19 +95,11 @@ qcrypto_tls_session_push(void *opaque, const void *buf, size_t len)
>          return -1;
>      };
>
> -    if (session->lockEnabled) {
> -        qemu_mutex_unlock(&session->lock);
> -    }
> -
>      error_free(session->werr);
>      session->werr = NULL;
>
>      ret = session->writeFunc(buf, len, session->opaque, &session->werr);
>
> -    if (session->lockEnabled) {
> -        qemu_mutex_lock(&session->lock);
> -    }
> -
>      if (ret == QCRYPTO_TLS_SESSION_ERR_BLOCK) {
>          errno = EAGAIN;
>          return -1;
> @@ -134,16 +126,8 @@ qcrypto_tls_session_pull(void *opaque, void *buf, size_t len)
>      error_free(session->rerr);
>      session->rerr = NULL;
>
> -    if (session->lockEnabled) {
> -        qemu_mutex_unlock(&session->lock);
> -    }
> -
>      ret = session->readFunc(buf, len, session->opaque, &session->rerr);
>
> -    if (session->lockEnabled) {
> -        qemu_mutex_lock(&session->lock);
> -    }
> -
>      if (ret == QCRYPTO_TLS_SESSION_ERR_BLOCK) {
>          errno = EAGAIN;
>          return -1;
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index 8ee44c5ac9..cf6115e699 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -338,6 +338,22 @@ static ssize_t coroutine_mixed_fn qemu_fill_buffer(QEMUFile *f)
>          return 0;
>      }
>
> +    /*
> +     * This feature triggers acquisition of mutexes around every
> +     * read and write. Thus we must not sit in a blocking read
> +     * if this is set, but must instead poll proactively. This
> +     * does not work with some channel types, however, so must
> +     * only pre-poll when the featre is set.
> +     */
> +    if (qio_channel_has_feature(f->ioc,
> +                                QIO_CHANNEL_FEATURE_CONCURRENT_IO)) {
> +        if (qemu_in_coroutine()) {
> +            qio_channel_yield(f->ioc, G_IO_IN);
> +        } else {
> +            qio_channel_wait(f->ioc, G_IO_IN);
> +        }
> +    }
> +
>      do {
>          struct iovec iov = { f->buf + pending, IO_BUF_SIZE - pending };
>          len = qio_channel_readv_full(f->ioc, &iov, 1, pfds, pnfd, 0,
> --

* Looks okay.
Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>

Thank you.
---
  - Prasad