Compiler optimizations can cache TLS values across coroutine yield
points, resulting in stale values from the previous thread when a
coroutine is re-entered by a new thread.
Serge Guelton developed an __attribute__((noinline)) wrapper and tested
it with clang and gcc. I formatted his idea according to QEMU's coding
style and wrote documentation.
These macros must be used instead of __thread from now on to prevent
coroutine TLS bugs.
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1952483
Suggested-by: Serge Guelton <sguelton@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
MAINTAINERS | 1 +
include/qemu/tls.h | 142 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 143 insertions(+)
create mode 100644 include/qemu/tls.h
diff --git a/MAINTAINERS b/MAINTAINERS
index 894dc43105..cf225b7029 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2869,6 +2869,7 @@ M: Kevin Wolf <kwolf@redhat.com>
S: Maintained
F: util/*coroutine*
F: include/qemu/coroutine*
+F: include/qemu/tls.h
F: tests/unit/test-coroutine.c
Buffers
diff --git a/include/qemu/tls.h b/include/qemu/tls.h
new file mode 100644
index 0000000000..9a5ccc6c52
--- /dev/null
+++ b/include/qemu/tls.h
@@ -0,0 +1,142 @@
+/*
+ * QEMU Thread Local Storage
+ *
+ * Copyright Red Hat
+ *
+ * SPDX-License-Identifier: LGPL-2.1-or-later
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.1 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ * It is forbidden to use __thread in QEMU because compiler optimizations may
+ * cause Thread Local Storage variables to be cached across coroutine re-entry.
+ * Coroutines can run in more than one thread through the course of their life,
+ * leading bugs when stale TLS values from the wrong thread are used as a
+ * result of compiler optimization.
+ *
+ * Although avoiding __thread is strictly only necessary when coroutines access
+ * the variable this is hard to audit or enforce in practice. Therefore
+ * __thread is forbidden everywhere. This prevents subtle bugs from creeping in
+ * if a variable that was previously not accessed from coroutines is now
+ * accessed from coroutines.
+ *
+ * An example is:
+ *
+ * ..code-block:: c
+ * :caption: A coroutine that may see the wrong TLS value
+ *
+ * static __thread AioContext *current_aio_context;
+ * ...
+ * static void coroutine_fn foo(void)
+ * {
+ * aio_notify(current_aio_context);
+ * qemu_coroutine_yield();
+ * aio_notify(current_aio_context); // <-- may be stale after yielding!
+ * }
+ *
+ * This header provides macros for safely defining variables in Thread Local
+ * Storage:
+ *
+ * ..code-block:: c
+ * :caption: A coroutine that safely uses TLS
+ *
+ * QEMU_DECLARE_STATIC_TLS(AioContext *, current_aio_context)
+ * ...
+ * static void coroutine_fn foo(void)
+ * {
+ * aio_notify(get_current_aio_context());
+ * qemu_coroutine_yield();
+ * aio_notify(get_current_aio_context()); // <-- safe
+ * }
+ */
+
+#ifndef QEMU_TLS_H
+#define QEMU_TLS_H
+
+/**
+ * QEMU_DECLARE_TLS:
+ * @type: the variable's C type
+ * @var: the variable name
+ *
+ * Declare an extern variable in Thread Local Storage from a header file:
+ *
+ * .. code-block:: c
+ * :caption: Declaring an extern variable in Thread Local Storage
+ *
+ * QEMU_DECLARE_TLS(int, my_count)
+ * ...
+ * int c = get_my_count();
+ * set_my_count(c + 1);
+ *
+ * Use this instead of:
+ *
+ * .. code-block:: c
+ * :caption: Declaring a TLS variable using __thread
+ *
+ * extern __thread int my_count;
+ * ...
+ * int c = my_count;
+ * my_count = c + 1;
+ */
+#define QEMU_DECLARE_TLS(type, var) \
+ __attribute__((noinline)) type get_##var(void); \
+ __attribute__((noinline)) void set_##var(type v); \
+
+/**
+ * QEMU_DEFINE_TLS:
+ * @type: the variable's C type
+ * @var: the variable name
+ *
+ * Define an variable in Thread Local Storage that was previously declared from
+ * a header file with QEMU_DECLARE_TLS():
+ *
+ * .. code-block:: c
+ * :caption: Defining a variable in Thread Local Storage
+ *
+ * QEMU_DEFINE_TLS(int, my_count)
+ *
+ * Use this instead of:
+ *
+ * .. code-block:: c
+ * :caption: Defining a TLS variable using __thread
+ *
+ * __thread int my_count;
+ */
+#define QEMU_DEFINE_TLS(type, var) \
+ __thread type qemu_tls_##var; \
+ type get_##var(void) { return qemu_tls_##var; } \
+ void set_##var(type v) { qemu_tls_##var = v; }
+
+/**
+ * QEMU_DEFINE_STATIC_TLS:
+ * @type: the variable's C type
+ * @var: the variable name
+ *
+ * Define a static variable in Thread Local Storage:
+ *
+ * .. code-block:: c
+ * :caption: Defining a static variable in Thread Local Storage
+ *
+ * QEMU_DEFINE_STATIC_TLS(int, my_count)
+ * ...
+ * int c = get_my_count();
+ * set_my_count(c + 1);
+ *
+ * Use this instead of:
+ *
+ * .. code-block:: c
+ * :caption: Defining a static TLS variable using __thread
+ *
+ * static __thread int my_count;
+ * ...
+ * int c = my_count;
+ * my_count = c + 1;
+ */
+#define QEMU_DEFINE_STATIC_TLS(type, var) \
+ static __thread type qemu_tls_##var; \
+ static __attribute__((noinline)) type get_##var(void); \
+ static type get_##var(void) { return qemu_tls_##var; } \
+ static __attribute__((noinline)) void set_##var(type v); \
+ static void set_##var(type v) { qemu_tls_##var = v; }
+
+#endif /* QEMU_TLS_H */
--
2.31.1
On Mon, Oct 25, 2021 at 03:07:15PM +0100, Stefan Hajnoczi wrote: > Compiler optimizations can cache TLS values across coroutine yield > points, resulting in stale values from the previous thread when a > coroutine is re-entered by a new thread. > > Serge Guelton developed an __attribute__((noinline)) wrapper and tested > it with clang and gcc. I formatted his idea according to QEMU's coding > style and wrote documentation. > > These macros must be used instead of __thread from now on to prevent > coroutine TLS bugs. Does this apply to all __thread usage in the QEMU process that can be used from coroutine context, or just certain __thread usage ? Mostly I'm wondering if this is going to have implications on external libraries we use. eg if block layer is using librbd.so APIs, is librbd.sp safe to use __thread directly in any way it desires ? Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Mon, Oct 25, 2021 at 03:14:36PM +0100, Daniel P. Berrangé wrote: > On Mon, Oct 25, 2021 at 03:07:15PM +0100, Stefan Hajnoczi wrote: > > Compiler optimizations can cache TLS values across coroutine yield > > points, resulting in stale values from the previous thread when a > > coroutine is re-entered by a new thread. > > > > Serge Guelton developed an __attribute__((noinline)) wrapper and tested > > it with clang and gcc. I formatted his idea according to QEMU's coding > > style and wrote documentation. > > > > These macros must be used instead of __thread from now on to prevent > > coroutine TLS bugs. > > Does this apply to all __thread usage in the QEMU process that can > be used from coroutine context, or just certain __thread usage ? > > Mostly I'm wondering if this is going to have implications on external > libraries we use. eg if block layer is using librbd.so APIs, is librbd.sp > safe to use __thread directly in any way it desires ? There is a theoretical problem but I'm not sure if it will occur in practice: If a __thread variable is in an extern function then there's little risk of the value being cached. The function executes and returns back to QEMU, never yielding. The only case I can think of is when the library accesses a __thread variable, invokes a callback into QEMU, and that callback yields. If the coroutine is re-entered from another thread and returns back to the library, then we have a problem. This seems like a very rare case though. It gets trickier if the library has the __thread variable in a header file, because then the compiler may optimize it into a QEMU coroutine function and cache its value. Stefan
Actually, nevermind what I said about the callback scenario. I don't think that is a problem because the compiler cannot assume the __thread variable remains unchanged across the callback. Therefore it cannot safely cache the value. So I think only the header file scenario is a problem. Stefan
Am 26.10.2021 um 15:41 hat Stefan Hajnoczi geschrieben: > Actually, nevermind what I said about the callback scenario. I don't > think that is a problem because the compiler cannot assume the __thread > variable remains unchanged across the callback. Therefore it cannot > safely cache the value. And additionally, I think callbacks are never coroutine_fn (especially not if they come from an external library), so they must not yield anyway. > So I think only the header file scenario is a problem. The mere existence of a __thread variable in the header file isn't a problem either, but if QEMU accesses it, we would have to implement wrappers similar to what you're proposing for QEMU's thread local variables. Kevin
On Tue, Oct 26, 2021 at 04:10:16PM +0200, Kevin Wolf wrote: > Am 26.10.2021 um 15:41 hat Stefan Hajnoczi geschrieben: > > Actually, nevermind what I said about the callback scenario. I don't > > think that is a problem because the compiler cannot assume the __thread > > variable remains unchanged across the callback. Therefore it cannot > > safely cache the value. > > And additionally, I think callbacks are never coroutine_fn (especially > not if they come from an external library), so they must not yield > anyway. There's a tiny chance that the third-party library was called from coroutine context and the callback invokes a non-coroutine_fn that uses qemu_in_coroutine() to dynamically decide it's possible to yield. But it seems very unlikely. > > So I think only the header file scenario is a problem. > > The mere existence of a __thread variable in the header file isn't a > problem either, but if QEMU accesses it, we would have to implement > wrappers similar to what you're proposing for QEMU's thread local > variables. There could be static inline functions that access it in the header file. If QEMU calls those functions then the compiler can optimize that. Stefan
On 10/25/21 7:07 AM, Stefan Hajnoczi wrote:
> Compiler optimizations can cache TLS values across coroutine yield
> points, resulting in stale values from the previous thread when a
> coroutine is re-entered by a new thread.
...
> include/qemu/tls.h | 142 +++++++++++++++++++++++++++++++++++++++++++++
Better as qemu/coroutine-tls.h, since it is needed for no other purpose.
> +#define QEMU_DEFINE_TLS(type, var) \
> + __thread type qemu_tls_##var; \
> + type get_##var(void) { return qemu_tls_##var; } \
> + void set_##var(type v) { qemu_tls_##var = v; }
You might as well make the variable static, since it may only be referenced by these two
functions.
> +#define QEMU_DEFINE_STATIC_TLS(type, var) \
> + static __thread type qemu_tls_##var; \
> + static __attribute__((noinline)) type get_##var(void); \
> + static type get_##var(void) { return qemu_tls_##var; } \
> + static __attribute__((noinline)) void set_##var(type v); \
> + static void set_##var(type v) { qemu_tls_##var = v; }
You don't need separate function declarations; you can fold them together.
If would be nice to inline this when possible,
#if defined(__aarch64__)
#define QEMU_COROUTINE_TLS_ADDR(RET, VAR) \
asm volatile("mrs %0, tpidr_el0\n\t" \
"add %0, %0, #:tprel_hi12:"#VAR", lsl #12\n\t" \
"add %0, %0, #:tprel_lo12_nc:"#VAR \
: "=r"(RET))
#elif defined(__powerpc64__)
#define QEMU_COROUTINE_TLS_ADDR(RET, VAR) \
asm volatile("addis %0,13,"#VAR"@tprel@ha\n\t" \
"add %0,%0,"#VAR"@tprel@l" \
: "=r"(RET))
#elif defined(__riscv)
#define QEMU_COROUTINE_TLS_ADDR(RET, VAR) \
asm volatile("lui %0,%%tprel_hi("#VAR")\n\t" \
"add %0,%0,%%tprel_add("#VAR")\n\t" \
"addi %0,%0,%%tprel_lo("#VAR")" \
: "=r"(RET))
#elif defined(__x86_64__)
#define QEMU_COROUTINE_TLS_ADDR(RET, VAR) \
asm volatile("mov %%fs:"#VAR"@tpoff, %0" : "=r"(RET))
#endif
#ifdef QEMU_COROUTINE_TLS_ADDR
#define QEMU_COROUTINE_TLS_DECLARE(TYPE, VAR) \
extern __thread TYPE co_tls_##VAR; \
static inline TYPE get_##VAR(void) \
{ TYPE *p; QEMU_COROUTINE_TLS_ADDR(p, co_tls_##VAR); return *p; } \
static inline void set_##VAR(TYPE v) \
{ TYPE *p; QEMU_COROUTINE_TLS_ADDR(p, co_tls_##VAR); *p = v; }
#else
etc
#endif
r~
On Mon, Oct 25, 2021 at 10:19:04AM -0700, Richard Henderson wrote:
> On 10/25/21 7:07 AM, Stefan Hajnoczi wrote:
> > Compiler optimizations can cache TLS values across coroutine yield
> > points, resulting in stale values from the previous thread when a
> > coroutine is re-entered by a new thread.
> ...
> > include/qemu/tls.h | 142 +++++++++++++++++++++++++++++++++++++++++++++
>
> Better as qemu/coroutine-tls.h, since it is needed for no other purpose.
>
> > +#define QEMU_DEFINE_TLS(type, var) \
> > + __thread type qemu_tls_##var; \
> > + type get_##var(void) { return qemu_tls_##var; } \
> > + void set_##var(type v) { qemu_tls_##var = v; }
>
> You might as well make the variable static, since it may only be referenced
> by these two functions.
Oops, that's a bug. It should be declared extern. QEMU_DEFINE_TLS() is
meant for use in header files.
>
> > +#define QEMU_DEFINE_STATIC_TLS(type, var) \
> > + static __thread type qemu_tls_##var; \
> > + static __attribute__((noinline)) type get_##var(void); \
> > + static type get_##var(void) { return qemu_tls_##var; } \
> > + static __attribute__((noinline)) void set_##var(type v); \
> > + static void set_##var(type v) { qemu_tls_##var = v; }
>
> You don't need separate function declarations; you can fold them together.
>
> If would be nice to inline this when possible,
>
> #if defined(__aarch64__)
> #define QEMU_COROUTINE_TLS_ADDR(RET, VAR) \
> asm volatile("mrs %0, tpidr_el0\n\t" \
> "add %0, %0, #:tprel_hi12:"#VAR", lsl #12\n\t" \
> "add %0, %0, #:tprel_lo12_nc:"#VAR \
> : "=r"(RET))
> #elif defined(__powerpc64__)
> #define QEMU_COROUTINE_TLS_ADDR(RET, VAR) \
> asm volatile("addis %0,13,"#VAR"@tprel@ha\n\t" \
> "add %0,%0,"#VAR"@tprel@l" \
> : "=r"(RET))
> #elif defined(__riscv)
> #define QEMU_COROUTINE_TLS_ADDR(RET, VAR) \
> asm volatile("lui %0,%%tprel_hi("#VAR")\n\t" \
> "add %0,%0,%%tprel_add("#VAR")\n\t" \
> "addi %0,%0,%%tprel_lo("#VAR")" \
> : "=r"(RET))
> #elif defined(__x86_64__)
> #define QEMU_COROUTINE_TLS_ADDR(RET, VAR) \
> asm volatile("mov %%fs:"#VAR"@tpoff, %0" : "=r"(RET))
> #endif
>
> #ifdef QEMU_COROUTINE_TLS_ADDR
> #define QEMU_COROUTINE_TLS_DECLARE(TYPE, VAR) \
> extern __thread TYPE co_tls_##VAR; \
> static inline TYPE get_##VAR(void) \
> { TYPE *p; QEMU_COROUTINE_TLS_ADDR(p, co_tls_##VAR); return *p; } \
> static inline void set_##VAR(TYPE v) \
> { TYPE *p; QEMU_COROUTINE_TLS_ADDR(p, co_tls_##VAR); *p = v; }
> #else
> etc
> #endif
Nice. That makes me wonder if it's possible to write a portable version:
static inline TYPE get_##VAR(void) \
{ volatile TYPE *p = &co_tls_##VAR; return *p; }
But the trouble is we need &co_tls_##VAR to be "volatile" and I don't
think there is a way to express that?
Stefan
On 10/26/21 6:30 AM, Stefan Hajnoczi wrote:
> On Mon, Oct 25, 2021 at 10:19:04AM -0700, Richard Henderson wrote:
>> On 10/25/21 7:07 AM, Stefan Hajnoczi wrote:
>>> Compiler optimizations can cache TLS values across coroutine yield
>>> points, resulting in stale values from the previous thread when a
>>> coroutine is re-entered by a new thread.
>> ...
>>> include/qemu/tls.h | 142 +++++++++++++++++++++++++++++++++++++++++++++
>>
>> Better as qemu/coroutine-tls.h, since it is needed for no other purpose.
>>
>>> +#define QEMU_DEFINE_TLS(type, var) \
>>> + __thread type qemu_tls_##var; \
>>> + type get_##var(void) { return qemu_tls_##var; } \
>>> + void set_##var(type v) { qemu_tls_##var = v; }
>>
>> You might as well make the variable static, since it may only be referenced
>> by these two functions.
>
> Oops, that's a bug. It should be declared extern. QEMU_DEFINE_TLS() is
> meant for use in header files.
No, QEMU_DECLARE_TLS was for use in header files, and it of course does not globally
declare the tls variable at all. Only the definition here requires the tls variable.
> Nice. That makes me wonder if it's possible to write a portable version:
>
> static inline TYPE get_##VAR(void) \
> { volatile TYPE *p = &co_tls_##VAR; return *p; }
>
> But the trouble is we need &co_tls_##VAR to be "volatile" and I don't
> think there is a way to express that?
No, there's not.
Anyway, with those four hosts we get coverage of almost all users. I'll note that both
arm32 and s390x use the constant pool in computing these tls addresses, so they'll need to
keep using your functional version. So we'll still have testing of that path as well.
r~
On Tue, Oct 26, 2021 at 08:32:11AM -0700, Richard Henderson wrote:
> On 10/26/21 6:30 AM, Stefan Hajnoczi wrote:
> > On Mon, Oct 25, 2021 at 10:19:04AM -0700, Richard Henderson wrote:
> > > On 10/25/21 7:07 AM, Stefan Hajnoczi wrote:
> > > > Compiler optimizations can cache TLS values across coroutine yield
> > > > points, resulting in stale values from the previous thread when a
> > > > coroutine is re-entered by a new thread.
> > > ...
> > > > include/qemu/tls.h | 142 +++++++++++++++++++++++++++++++++++++++++++++
> > >
> > > Better as qemu/coroutine-tls.h, since it is needed for no other purpose.
> > >
> > > > +#define QEMU_DEFINE_TLS(type, var) \
> > > > + __thread type qemu_tls_##var; \
> > > > + type get_##var(void) { return qemu_tls_##var; } \
> > > > + void set_##var(type v) { qemu_tls_##var = v; }
> > >
> > > You might as well make the variable static, since it may only be referenced
> > > by these two functions.
> >
> > Oops, that's a bug. It should be declared extern. QEMU_DEFINE_TLS() is
> > meant for use in header files.
>
> No, QEMU_DECLARE_TLS was for use in header files, and it of course does not
> globally declare the tls variable at all. Only the definition here requires
> the tls variable.
You are right, thanks for pointing this out.
>
> > Nice. That makes me wonder if it's possible to write a portable version:
> >
> > static inline TYPE get_##VAR(void) \
> > { volatile TYPE *p = &co_tls_##VAR; return *p; }
> >
> > But the trouble is we need &co_tls_##VAR to be "volatile" and I don't
> > think there is a way to express that?
>
> No, there's not.
>
> Anyway, with those four hosts we get coverage of almost all users. I'll
> note that both arm32 and s390x use the constant pool in computing these tls
> addresses, so they'll need to keep using your functional version. So we'll
> still have testing of that path as well.
Okay, thanks!
Stefan
© 2016 - 2026 Red Hat, Inc.