[PULL 03/18] rcu: Unify force quiescent state

Paolo Bonzini posted 18 patches 2 weeks, 3 days ago
Maintainers: Magnus Kulke <magnus.kulke@linux.microsoft.com>, Wei Liu <wei.liu@kernel.org>, "Gonglei (Arei)" <arei.gonglei@huawei.com>, Zhenwei Pi <pizhenwei@bytedance.com>, "Michael S. Tsirkin" <mst@redhat.com>, Stefano Garzarella <sgarzare@redhat.com>, Laurent Vivier <lvivier@redhat.com>, Amit Shah <amit@kernel.org>, Stefan Berger <stefanb@linux.vnet.ibm.com>, "Marc-André Lureau" <marcandre.lureau@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, "Alex Bennée" <alex.bennee@linaro.org>, "Philippe Mathieu-Daudé" <philmd@linaro.org>, Peter Maydell <peter.maydell@linaro.org>, Igor Mitsyanko <i.mitsyanko@gmail.com>, "Clément Chigot" <chigot@adacore.com>, Frederic Konrad <konrad.frederic@yahoo.fr>, Alberto Garcia <berto@igalia.com>, Thomas Huth <huth@tuxfamily.org>, Halil Pasic <pasic@linux.ibm.com>, Christian Borntraeger <borntraeger@linux.ibm.com>, Jason Herne <jjherne@linux.ibm.com>, Yoshinori Sato <yoshinori.sato@nifty.com>, Magnus Damm <magnus.damm@gmail.com>, Nicholas Piggin <npiggin@gmail.com>, Harsh Prateek Bora <harshpb@linux.ibm.com>, "Collin L. Walling" <walling@linux.ibm.com>, Stefano Stabellini <sstabellini@kernel.org>, Anthony PERARD <anthony@xenproject.org>, Paul Durrant <paul@xen.org>, "Edgar E. Iglesias" <edgar.iglesias@gmail.com>, Alistair Francis <alistair@alistair23.me>, "Daniel P. Berrangé" <berrange@redhat.com>, Eduardo Habkost <eduardo@habkost.net>, Richard Henderson <richard.henderson@linaro.org>, Helge Deller <deller@gmx.de>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Corey Minyard <minyard@acm.org>, Paul Burton <paulburton@kernel.org>, Aleksandar Rikalo <arikalo@gmail.com>, Aurelien Jarno <aurelien@aurel32.net>, Palmer Dabbelt <palmer@dabbelt.com>, Weiwei Li <liwei1518@gmail.com>, Daniel Henrique Barboza <dbarboza@ventanamicro.com>, Liu Zhiwei <zhiwei_liu@linux.alibaba.com>, Fam Zheng <fam@euphon.net>, Samuel Thibault <samuel.thibault@ens-lyon.org>, Michael Rolnik <mrolnik@gmail.com>, Antony Pavlov <antonynpavlov@gmail.com>, Joel Stanley <joel@jms.id.au>, Vijai Kumar K <vijai@behindbytes.com>, Samuel Tardieu <sam@rfc1149.net>, Gustavo Romero <gustavo.romero@linaro.org>, Raphael Norwitz <raphael@enfabrica.net>, Stefan Hajnoczi <stefanha@redhat.com>, Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>, Dmitry Osipenko <dmitry.osipenko@collabora.com>, Markus Armbruster <armbru@redhat.com>, Fabiano Rosas <farosas@suse.de>, "Dr. David Alan Gilbert" <dave@treblig.org>, Zhang Chen <zhangckid@gmail.com>, Li Zhijian <lizhijian@fujitsu.com>, Jason Wang <jasowang@redhat.com>, Manos Pitsidianakis <manos.pitsidianakis@linaro.org>, John Snow <jsnow@redhat.com>, Cleber Rosa <crosa@redhat.com>, Cameron Esfahani <dirty@apple.com>, Roman Bolshakov <rbolshakov@ddn.com>, Phil Dennis-Jordan <phil@philjordan.eu>, Marcelo Tosatti <mtosatti@redhat.com>, Reinoud Zandijk <reinoud@netbsd.org>, Sunil Muthuswamy <sunilmut@microsoft.com>, Max Filippov <jcmvbkbc@gmail.com>, Lukas Straub <lukasstraub2@web.de>
There is a newer version of this series
[PULL 03/18] rcu: Unify force quiescent state
Posted by Paolo Bonzini 2 weeks, 3 days ago
From: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>

Borrow the concept of force quiescent state from Linux to ensure readers
remain fast during normal operation and to avoid stalls.

Background
==========

The previous implementation had four steps to begin reclamation.

1. call_rcu_thread() would wait for the first callback.

2. call_rcu_thread() would periodically poll until a decent number of
   callbacks piled up or it timed out.

3. synchronize_rcu() would statr a grace period (GP).

4. wait_for_readers() would wait for the GP to end. It would also
   trigger the force_rcu notifier to break busy loops in a read-side
   critical section if drain_call_rcu() had been called.

Problem
=======

The separation of waiting logic across these steps led to suboptimal
behavior:

The GP was delayed until call_rcu_thread() stops polling.

force_rcu was not consistently triggered when call_rcu_thread() detected
a high number of pending callbacks or a timeout. This inconsistency
sometimes led to stalls, as reported in a virtio-gpu issue where memory
unmapping was blocked[1].

wait_for_readers() imposed unnecessary overhead in non-urgent cases by
unconditionally executing qatomic_set(&index->waiting, true) and
qemu_event_reset(&rcu_gp_event), which are necessary only for expedited
synchronization.

Solution
========

Move the polling in call_rcu_thread() to wait_for_readers() to prevent
the delay of the GP. Additionally, reorganize wait_for_readers() to
distinguish between two states:

Normal State: it relies exclusively on periodic polling to detect
the end of the GP and maintains the read-side fast path.

Force Quiescent State: Whenever expediting synchronization, it always
triggers force_rcu and executes both qatomic_set(&index->waiting, true)
and qemu_event_reset(&rcu_gp_event). This avoids stalls while confining
the read-side overhead to this state.

This unified approach, inspired by the Linux RCU, ensures consistent and
efficient RCU grace period handling and confirms resolution of the
virtio-gpu issue.

[1] https://lore.kernel.org/qemu-devel/20251014111234.3190346-9-alex.bennee@linaro.org/

Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Link: https://lore.kernel.org/r/20251016-force-v1-1-919a82112498@rsg.ci.i.u-tokyo.ac.jp
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 util/rcu.c | 81 +++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 52 insertions(+), 29 deletions(-)

diff --git a/util/rcu.c b/util/rcu.c
index b703c86f15a..acac9446ea9 100644
--- a/util/rcu.c
+++ b/util/rcu.c
@@ -43,10 +43,14 @@
 #define RCU_GP_LOCKED           (1UL << 0)
 #define RCU_GP_CTR              (1UL << 1)
 
+
+#define RCU_CALL_MIN_SIZE        30
+
 unsigned long rcu_gp_ctr = RCU_GP_LOCKED;
 
 QemuEvent rcu_gp_event;
 static int in_drain_call_rcu;
+static int rcu_call_count;
 static QemuMutex rcu_registry_lock;
 static QemuMutex rcu_sync_lock;
 
@@ -76,15 +80,29 @@ static void wait_for_readers(void)
 {
     ThreadList qsreaders = QLIST_HEAD_INITIALIZER(qsreaders);
     struct rcu_reader_data *index, *tmp;
+    int sleeps = 0;
+    bool forced = false;
 
     for (;;) {
-        /* We want to be notified of changes made to rcu_gp_ongoing
-         * while we walk the list.
+        /*
+         * Force the grace period to end and wait for it if any of the
+         * following heuristical conditions are satisfied:
+         * - A decent number of callbacks piled up.
+         * - It timed out.
+         * - It is in a drain_call_rcu() call.
+         *
+         * Otherwise, periodically poll the grace period, hoping it ends
+         * promptly.
          */
-        qemu_event_reset(&rcu_gp_event);
+        if (!forced &&
+            (qatomic_read(&rcu_call_count) >= RCU_CALL_MIN_SIZE ||
+             sleeps >= 5 || qatomic_read(&in_drain_call_rcu))) {
+            forced = true;
 
-        QLIST_FOREACH(index, &registry, node) {
-            qatomic_set(&index->waiting, true);
+            QLIST_FOREACH(index, &registry, node) {
+                notifier_list_notify(&index->force_rcu, NULL);
+                qatomic_set(&index->waiting, true);
+            }
         }
 
         /* Here, order the stores to index->waiting before the loads of
@@ -106,8 +124,6 @@ static void wait_for_readers(void)
                  * get some extra futex wakeups.
                  */
                 qatomic_set(&index->waiting, false);
-            } else if (qatomic_read(&in_drain_call_rcu)) {
-                notifier_list_notify(&index->force_rcu, NULL);
             }
         }
 
@@ -115,7 +131,8 @@ static void wait_for_readers(void)
             break;
         }
 
-        /* Wait for one thread to report a quiescent state and try again.
+        /*
+         * Sleep for a while and try again.
          * Release rcu_registry_lock, so rcu_(un)register_thread() doesn't
          * wait too much time.
          *
@@ -133,7 +150,20 @@ static void wait_for_readers(void)
          * rcu_registry_lock is released.
          */
         qemu_mutex_unlock(&rcu_registry_lock);
-        qemu_event_wait(&rcu_gp_event);
+
+        if (forced) {
+            qemu_event_wait(&rcu_gp_event);
+
+            /*
+             * We want to be notified of changes made to rcu_gp_ongoing
+             * while we walk the list.
+             */
+            qemu_event_reset(&rcu_gp_event);
+        } else {
+            g_usleep(10000);
+            sleeps++;
+        }
+
         qemu_mutex_lock(&rcu_registry_lock);
     }
 
@@ -173,15 +203,11 @@ void synchronize_rcu(void)
     }
 }
 
-
-#define RCU_CALL_MIN_SIZE        30
-
 /* Multi-producer, single-consumer queue based on urcu/static/wfqueue.h
  * from liburcu.  Note that head is only used by the consumer.
  */
 static struct rcu_head dummy;
 static struct rcu_head *head = &dummy, **tail = &dummy.next;
-static int rcu_call_count;
 static QemuEvent rcu_call_ready_event;
 
 static void enqueue(struct rcu_head *node)
@@ -259,30 +285,27 @@ static void *call_rcu_thread(void *opaque)
     rcu_register_thread();
 
     for (;;) {
-        int tries = 0;
-        int n = qatomic_read(&rcu_call_count);
+        int n;
 
-        /* Heuristically wait for a decent number of callbacks to pile up.
+        /*
          * Fetch rcu_call_count now, we only must process elements that were
          * added before synchronize_rcu() starts.
          */
-        while (n == 0 || (n < RCU_CALL_MIN_SIZE && ++tries <= 5)) {
-            g_usleep(10000);
-            if (n == 0) {
-                qemu_event_reset(&rcu_call_ready_event);
-                n = qatomic_read(&rcu_call_count);
-                if (n == 0) {
-#if defined(CONFIG_MALLOC_TRIM)
-                    malloc_trim(4 * 1024 * 1024);
-#endif
-                    qemu_event_wait(&rcu_call_ready_event);
-                }
-            }
+        for (;;) {
+            qemu_event_reset(&rcu_call_ready_event);
             n = qatomic_read(&rcu_call_count);
+            if (n) {
+                break;
+            }
+
+#if defined(CONFIG_MALLOC_TRIM)
+            malloc_trim(4 * 1024 * 1024);
+#endif
+            qemu_event_wait(&rcu_call_ready_event);
         }
 
-        qatomic_sub(&rcu_call_count, n);
         synchronize_rcu();
+        qatomic_sub(&rcu_call_count, n);
         bql_lock();
         while (n > 0) {
             node = try_dequeue();
-- 
2.51.1
Regression with the "replay" test on target alpha (was: [PULL 03/18] rcu: Unify force quiescent state)
Posted by Thomas Huth 1 week, 4 days ago
On 28/10/2025 18.34, Paolo Bonzini wrote:
> From: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
> 
> Borrow the concept of force quiescent state from Linux to ensure readers
> remain fast during normal operation and to avoid stalls.

  Hi Akihiko,

looks like this commit has introduced a regression with the "replay" 
functional test on the alpha target.
When I run something like:

  pyvenv/bin/meson test --no-rebuild -t 1 --setup thorough \
   --num-processes 1 --repeat 10 func-alpha-replay

in the build folder, approx. half of the test runs are failing for me now.

I bisected the issue to this patch here - when I rebuild qemu-system-alpha 
with the commit right before this change here, the above test runs work 
fine, so I'm pretty sure that the problem has been introduced by this commit 
here.

Could you please have a look?

  Thanks,
   Thomas


> Background
> ==========
> 
> The previous implementation had four steps to begin reclamation.
> 
> 1. call_rcu_thread() would wait for the first callback.
> 
> 2. call_rcu_thread() would periodically poll until a decent number of
>     callbacks piled up or it timed out.
> 
> 3. synchronize_rcu() would statr a grace period (GP).
> 
> 4. wait_for_readers() would wait for the GP to end. It would also
>     trigger the force_rcu notifier to break busy loops in a read-side
>     critical section if drain_call_rcu() had been called.
> 
> Problem
> =======
> 
> The separation of waiting logic across these steps led to suboptimal
> behavior:
> 
> The GP was delayed until call_rcu_thread() stops polling.
> 
> force_rcu was not consistently triggered when call_rcu_thread() detected
> a high number of pending callbacks or a timeout. This inconsistency
> sometimes led to stalls, as reported in a virtio-gpu issue where memory
> unmapping was blocked[1].
> 
> wait_for_readers() imposed unnecessary overhead in non-urgent cases by
> unconditionally executing qatomic_set(&index->waiting, true) and
> qemu_event_reset(&rcu_gp_event), which are necessary only for expedited
> synchronization.
> 
> Solution
> ========
> 
> Move the polling in call_rcu_thread() to wait_for_readers() to prevent
> the delay of the GP. Additionally, reorganize wait_for_readers() to
> distinguish between two states:
> 
> Normal State: it relies exclusively on periodic polling to detect
> the end of the GP and maintains the read-side fast path.
> 
> Force Quiescent State: Whenever expediting synchronization, it always
> triggers force_rcu and executes both qatomic_set(&index->waiting, true)
> and qemu_event_reset(&rcu_gp_event). This avoids stalls while confining
> the read-side overhead to this state.
> 
> This unified approach, inspired by the Linux RCU, ensures consistent and
> efficient RCU grace period handling and confirms resolution of the
> virtio-gpu issue.
> 
> [1] https://lore.kernel.org/qemu-devel/20251014111234.3190346-9-alex.bennee@linaro.org/
> 
> Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
> Link: https://lore.kernel.org/r/20251016-force-v1-1-919a82112498@rsg.ci.i.u-tokyo.ac.jp
> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   util/rcu.c | 81 +++++++++++++++++++++++++++++++++++-------------------
>   1 file changed, 52 insertions(+), 29 deletions(-)
> 
> diff --git a/util/rcu.c b/util/rcu.c
> index b703c86f15a..acac9446ea9 100644
> --- a/util/rcu.c
> +++ b/util/rcu.c
> @@ -43,10 +43,14 @@
>   #define RCU_GP_LOCKED           (1UL << 0)
>   #define RCU_GP_CTR              (1UL << 1)
>   
> +
> +#define RCU_CALL_MIN_SIZE        30
> +
>   unsigned long rcu_gp_ctr = RCU_GP_LOCKED;
>   
>   QemuEvent rcu_gp_event;
>   static int in_drain_call_rcu;
> +static int rcu_call_count;
>   static QemuMutex rcu_registry_lock;
>   static QemuMutex rcu_sync_lock;
>   
> @@ -76,15 +80,29 @@ static void wait_for_readers(void)
>   {
>       ThreadList qsreaders = QLIST_HEAD_INITIALIZER(qsreaders);
>       struct rcu_reader_data *index, *tmp;
> +    int sleeps = 0;
> +    bool forced = false;
>   
>       for (;;) {
> -        /* We want to be notified of changes made to rcu_gp_ongoing
> -         * while we walk the list.
> +        /*
> +         * Force the grace period to end and wait for it if any of the
> +         * following heuristical conditions are satisfied:
> +         * - A decent number of callbacks piled up.
> +         * - It timed out.
> +         * - It is in a drain_call_rcu() call.
> +         *
> +         * Otherwise, periodically poll the grace period, hoping it ends
> +         * promptly.
>            */
> -        qemu_event_reset(&rcu_gp_event);
> +        if (!forced &&
> +            (qatomic_read(&rcu_call_count) >= RCU_CALL_MIN_SIZE ||
> +             sleeps >= 5 || qatomic_read(&in_drain_call_rcu))) {
> +            forced = true;
>   
> -        QLIST_FOREACH(index, &registry, node) {
> -            qatomic_set(&index->waiting, true);
> +            QLIST_FOREACH(index, &registry, node) {
> +                notifier_list_notify(&index->force_rcu, NULL);
> +                qatomic_set(&index->waiting, true);
> +            }
>           }
>   
>           /* Here, order the stores to index->waiting before the loads of
> @@ -106,8 +124,6 @@ static void wait_for_readers(void)
>                    * get some extra futex wakeups.
>                    */
>                   qatomic_set(&index->waiting, false);
> -            } else if (qatomic_read(&in_drain_call_rcu)) {
> -                notifier_list_notify(&index->force_rcu, NULL);
>               }
>           }
>   
> @@ -115,7 +131,8 @@ static void wait_for_readers(void)
>               break;
>           }
>   
> -        /* Wait for one thread to report a quiescent state and try again.
> +        /*
> +         * Sleep for a while and try again.
>            * Release rcu_registry_lock, so rcu_(un)register_thread() doesn't
>            * wait too much time.
>            *
> @@ -133,7 +150,20 @@ static void wait_for_readers(void)
>            * rcu_registry_lock is released.
>            */
>           qemu_mutex_unlock(&rcu_registry_lock);
> -        qemu_event_wait(&rcu_gp_event);
> +
> +        if (forced) {
> +            qemu_event_wait(&rcu_gp_event);
> +
> +            /*
> +             * We want to be notified of changes made to rcu_gp_ongoing
> +             * while we walk the list.
> +             */
> +            qemu_event_reset(&rcu_gp_event);
> +        } else {
> +            g_usleep(10000);
> +            sleeps++;
> +        }
> +
>           qemu_mutex_lock(&rcu_registry_lock);
>       }
>   
> @@ -173,15 +203,11 @@ void synchronize_rcu(void)
>       }
>   }
>   
> -
> -#define RCU_CALL_MIN_SIZE        30
> -
>   /* Multi-producer, single-consumer queue based on urcu/static/wfqueue.h
>    * from liburcu.  Note that head is only used by the consumer.
>    */
>   static struct rcu_head dummy;
>   static struct rcu_head *head = &dummy, **tail = &dummy.next;
> -static int rcu_call_count;
>   static QemuEvent rcu_call_ready_event;
>   
>   static void enqueue(struct rcu_head *node)
> @@ -259,30 +285,27 @@ static void *call_rcu_thread(void *opaque)
>       rcu_register_thread();
>   
>       for (;;) {
> -        int tries = 0;
> -        int n = qatomic_read(&rcu_call_count);
> +        int n;
>   
> -        /* Heuristically wait for a decent number of callbacks to pile up.
> +        /*
>            * Fetch rcu_call_count now, we only must process elements that were
>            * added before synchronize_rcu() starts.
>            */
> -        while (n == 0 || (n < RCU_CALL_MIN_SIZE && ++tries <= 5)) {
> -            g_usleep(10000);
> -            if (n == 0) {
> -                qemu_event_reset(&rcu_call_ready_event);
> -                n = qatomic_read(&rcu_call_count);
> -                if (n == 0) {
> -#if defined(CONFIG_MALLOC_TRIM)
> -                    malloc_trim(4 * 1024 * 1024);
> -#endif
> -                    qemu_event_wait(&rcu_call_ready_event);
> -                }
> -            }
> +        for (;;) {
> +            qemu_event_reset(&rcu_call_ready_event);
>               n = qatomic_read(&rcu_call_count);
> +            if (n) {
> +                break;
> +            }
> +
> +#if defined(CONFIG_MALLOC_TRIM)
> +            malloc_trim(4 * 1024 * 1024);
> +#endif
> +            qemu_event_wait(&rcu_call_ready_event);
>           }
>   
> -        qatomic_sub(&rcu_call_count, n);
>           synchronize_rcu();
> +        qatomic_sub(&rcu_call_count, n);
>           bql_lock();
>           while (n > 0) {
>               node = try_dequeue();
Re: Regression with the "replay" test on target alpha
Posted by Akihiko Odaki 1 week, 3 days ago
On 2025/11/03 22:59, Thomas Huth wrote:
> On 28/10/2025 18.34, Paolo Bonzini wrote:
>> From: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
>>
>> Borrow the concept of force quiescent state from Linux to ensure readers
>> remain fast during normal operation and to avoid stalls.
> 
>   Hi Akihiko,
> 
> looks like this commit has introduced a regression with the "replay" 
> functional test on the alpha target.
> When I run something like:
> 
>   pyvenv/bin/meson test --no-rebuild -t 1 --setup thorough \
>    --num-processes 1 --repeat 10 func-alpha-replay
> 
> in the build folder, approx. half of the test runs are failing for me now.
> 
> I bisected the issue to this patch here - when I rebuild qemu-system- 
> alpha with the commit right before this change here, the above test runs 
> work fine, so I'm pretty sure that the problem has been introduced by 
> this commit here.
> 
> Could you please have a look?

I cannot reproduce it with commit 55d98e3edeeb ("rcu: Unify force 
quiescent state").

Can you provide meson-logs/testlog-thorough.txt so that I can look into 
the failure you are facing? If you think you have something useful for 
debugging, please share it to me too.

Regards,
Akihiko Odaki

Re: Regression with the "replay" test on target alpha
Posted by Thomas Huth 1 week, 3 days ago
On 04/11/2025 02.45, Akihiko Odaki wrote:
> On 2025/11/03 22:59, Thomas Huth wrote:
>> On 28/10/2025 18.34, Paolo Bonzini wrote:
>>> From: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
>>>
>>> Borrow the concept of force quiescent state from Linux to ensure readers
>>> remain fast during normal operation and to avoid stalls.
>>
>>   Hi Akihiko,
>>
>> looks like this commit has introduced a regression with the "replay" 
>> functional test on the alpha target.
>> When I run something like:
>>
>>   pyvenv/bin/meson test --no-rebuild -t 1 --setup thorough \
>>    --num-processes 1 --repeat 10 func-alpha-replay
>>
>> in the build folder, approx. half of the test runs are failing for me now.
>>
>> I bisected the issue to this patch here - when I rebuild qemu-system- 
>> alpha with the commit right before this change here, the above test runs 
>> work fine, so I'm pretty sure that the problem has been introduced by this 
>> commit here.
>>
>> Could you please have a look?
> 
> I cannot reproduce it with commit 55d98e3edeeb ("rcu: Unify force quiescent 
> state").
> 
> Can you provide meson-logs/testlog-thorough.txt so that I can look into the 
> failure you are facing? If you think you have something useful for 
> debugging, please share it to me too.

There's not much in that testlog-thorough.txt that could be helpful here,
it's basically just the information that the test has been killed due to
the timeout:

==================================== 1/1 ===================================
test:         qemu:func-thorough+func-alpha-thorough+thorough / func-alpha-replay
start time:   07:25:26
duration:     90.01s
result:       killed by signal 15 SIGTERM
command:      RUST_BACKTRACE=1 QEMU_TEST_QEMU_IMG=/tmp/qemu-rcu/qemu-img QEMU_TEST_GDB=/usr/bin/gdb MALLOC_PERTURB_=255 MESON_TEST_ITERATION=1 PYTHONPATH=/home/thuth/devel/qemu/python:/home/thuth/devel/qemu/tests/functional G_TEST_SLOW=1 SPEED=thorough QEMU_TEST_QEMU_BINARY=/tmp/qemu-rcu/qemu-system-alpha ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 LD_LIBRARY_PATH=/tmp/qemu-rcu/tests/tcg/plugins:/tmp/qemu-rcu/contrib/plugins UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 QEMU_BUILD_ROOT=/tmp/qemu-rcu /tmp/qemu-rcu/pyvenv/bin/python3 /home/thuth/devel/qemu/tests/functional/alpha/test_replay.py
=============================================================================

Summary of Failures:

1/1 qemu:func-thorough+func-alpha-thorough+thorough / func-alpha-replay TIMEOUT        90.01s   killed by signal 15 SIGTERM


There is also not that much helpful information in
tests/functional/alpha/test_replay.AlphaReplay.test_clipper, apart from
the console.log file. For a good run, the console log looks like this:

2025-11-04 08:16:46,148: PCI: 00:01:0 class 0101 id 1095:0646
2025-11-04 08:16:46,149: PCI:   region 0: 0000c000
2025-11-04 08:16:46,149: PCI:   region 1: 0000c008
2025-11-04 08:16:46,149: PCI:   region 2: 0000c010
2025-11-04 08:16:46,149: PCI:   region 3: 0000c018
2025-11-04 08:16:46,149: PCI:   region 4: 0000c020
2025-11-04 08:16:46,149: PCI: 00:07:0 class 0601 id 8086:0484
2025-11-04 08:16:48,149: [    0.000000] Initializing cgroup subsys cpu
2025-11-04 08:16:48,149: [    0.000000] Linux version 2.6.26-2-alpha-generic (Debian 2.6.26-29) (dannf@debian.org) (gcc version 4.1.3 20080704 (prerelease) (Debian 4.1.2-25)) #1 Sun Mar 4 21:08:03 UTC 2012
2025-11-04 08:16:48,150: [    0.000000] Booting GENERIC on Tsunami variation Clipper using machine vector Clipper from SRM
2025-11-04 08:16:48,150: [    0.000000] Major Options: MAGIC_SYSRQ
2025-11-04 08:16:48,150: [    0.000000] Command line: printk.time=0 console=ttyS0
2025-11-04 08:16:48,150: [    0.000000] memcluster 0, usage 1, start        0, end       15
2025-11-04 08:16:48,150: [    0.000000] memcluster 1, usage 0, start       15, end    16384
2025-11-04 08:16:48,150: [    0.000000] freeing pages 15:2048
2025-11-04 08:16:48,150: [    0.000000] freeing pages 2987:16384
2025-11-04 08:16:48,151: [    0.000000] reserving pages 2987:2988
2025-11-04 08:16:48,151: [    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 16272
2025-11-04 08:16:48,151: [    0.000000] Kernel command line: printk.time=0 console=ttyS0
2025-11-04 08:16:57,358: PCI: 00:01:0 class 0101 id 1095:0646
2025-11-04 08:16:57,358: PCI:   region 0: 0000c000
2025-11-04 08:16:57,358: PCI:   region 1: 0000c008
2025-11-04 08:16:57,359: PCI:   region 2: 0000c010
2025-11-04 08:16:57,359: PCI:   region 3: 0000c018
2025-11-04 08:16:57,359: PCI:   region 4: 0000c020
2025-11-04 08:16:57,360: PCI: 00:07:0 class 0601 id 8086:0484
2025-11-04 08:17:08,468: [    0.000000] Initializing cgroup subsys cpu
2025-11-04 08:17:08,470: [    0.000000] Linux version 2.6.26-2-alpha-generic (Debian 2.6.26-29) (dannf@debian.org) (gcc version 4.1.3 20080704 (prerelease) (Debian 4.1.2-25)) #1 Sun Mar 4 21:08:03 UTC 2012
2025-11-04 08:17:08,471: [    0.000000] Booting GENERIC on Tsunami variation Clipper using machine vector Clipper from SRM
2025-11-04 08:17:08,471: [    0.000000] Major Options: MAGIC_SYSRQ
2025-11-04 08:17:08,472: [    0.000000] Command line: printk.time=0 console=ttyS0
2025-11-04 08:17:08,472: [    0.000000] memcluster 0, usage 1, start        0, end       15
2025-11-04 08:17:08,473: [    0.000000] memcluster 1, usage 0, start       15, end    16384
2025-11-04 08:17:08,473: [    0.000000] freeing pages 15:2048
2025-11-04 08:17:08,474: [    0.000000] freeing pages 2987:16384
2025-11-04 08:17:08,474: [    0.000000] reserving pages 2987:2988
2025-11-04 08:17:08,475: [    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 16272
2025-11-04 08:17:08,476: [    0.000000] Kernel command line: printk.time=0 console=ttyS0

I.e. the replay worked as expected. When it fails, console.log only contains:

2025-11-04 08:25:26,601: PCI: 00:01:0 class 0101 id 1095:0646
2025-11-04 08:25:26,601: PCI:   region 0: 0000c000
2025-11-04 08:25:26,601: PCI:   region 1: 0000c008
2025-11-04 08:25:26,601: PCI:   region 2: 0000c010
2025-11-04 08:25:26,601: PCI:   region 3: 0000c018
2025-11-04 08:25:26,601: PCI:   region 4: 0000c020
2025-11-04 08:25:26,602: PCI: 00:07:0 class 0601 id 8086:0484
2025-11-04 08:25:28,601: [    0.000000] Initializing cgroup subsys cpu
2025-11-04 08:25:28,602: [    0.000000] Linux version 2.6.26-2-alpha-generic (Debian 2.6.26-29) (dannf@debian.org) (gcc version 4.1.3 20080704 (prerelease) (Debian 4.1.2-25)) #1 Sun Mar 4 21:08:03 UTC 2012
2025-11-04 08:25:28,602: [    0.000000] Booting GENERIC on Tsunami variation Clipper using machine vector Clipper from SRM
2025-11-04 08:25:28,602: [    0.000000] Major Options: MAGIC_SYSRQ
2025-11-04 08:25:28,602: [    0.000000] Command line: printk.time=0 console=ttyS0
2025-11-04 08:25:28,602: [    0.000000] memcluster 0, usage 1, start        0, end       15
2025-11-04 08:25:28,602: [    0.000000] memcluster 1, usage 0, start       15, end    16384
2025-11-04 08:25:28,602: [    0.000000] freeing pages 15:2048
2025-11-04 08:25:28,603: [    0.000000] freeing pages 2987:16384
2025-11-04 08:25:28,603: [    0.000000] reserving pages 2987:2988
2025-11-04 08:25:28,603: [    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 16272
2025-11-04 08:25:28,603: [    0.000000] Kernel command line: printk.time=0 console=ttyS0

I.e. the replay did not work.

Could this RCU stuff somehow influence the replay mechanism in QEMU?

  Thomas


Re: Regression with the "replay" test on target alpha
Posted by Akihiko Odaki 1 week, 3 days ago
On 2025/11/04 16:41, Thomas Huth wrote:
> On 04/11/2025 02.45, Akihiko Odaki wrote:
>> On 2025/11/03 22:59, Thomas Huth wrote:
>>> On 28/10/2025 18.34, Paolo Bonzini wrote:
>>>> From: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
>>>>
>>>> Borrow the concept of force quiescent state from Linux to ensure 
>>>> readers
>>>> remain fast during normal operation and to avoid stalls.
>>>
>>>   Hi Akihiko,
>>>
>>> looks like this commit has introduced a regression with the "replay" 
>>> functional test on the alpha target.
>>> When I run something like:
>>>
>>>   pyvenv/bin/meson test --no-rebuild -t 1 --setup thorough \
>>>    --num-processes 1 --repeat 10 func-alpha-replay
>>>
>>> in the build folder, approx. half of the test runs are failing for me 
>>> now.
>>>
>>> I bisected the issue to this patch here - when I rebuild qemu-system- 
>>> alpha with the commit right before this change here, the above test 
>>> runs work fine, so I'm pretty sure that the problem has been 
>>> introduced by this commit here.
>>>
>>> Could you please have a look?
>>
>> I cannot reproduce it with commit 55d98e3edeeb ("rcu: Unify force 
>> quiescent state").
>>
>> Can you provide meson-logs/testlog-thorough.txt so that I can look 
>> into the failure you are facing? If you think you have something 
>> useful for debugging, please share it to me too.
> 
> There's not much in that testlog-thorough.txt that could be helpful here,
> it's basically just the information that the test has been killed due to
> the timeout:
> 
> ==================================== 1/1 
> ===================================
> test:         qemu:func-thorough+func-alpha-thorough+thorough / func- 
> alpha-replay
> start time:   07:25:26
> duration:     90.01s
> result:       killed by signal 15 SIGTERM
> command:      RUST_BACKTRACE=1 QEMU_TEST_QEMU_IMG=/tmp/qemu-rcu/qemu-img 
> QEMU_TEST_GDB=/usr/bin/gdb MALLOC_PERTURB_=255 MESON_TEST_ITERATION=1 
> PYTHONPATH=/home/thuth/devel/qemu/python:/home/thuth/devel/qemu/tests/ 
> functional G_TEST_SLOW=1 SPEED=thorough QEMU_TEST_QEMU_BINARY=/tmp/qemu- 
> rcu/qemu-system-alpha 
> ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 
> LD_LIBRARY_PATH=/tmp/qemu-rcu/tests/tcg/plugins:/tmp/qemu-rcu/contrib/ 
> plugins 
> UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 QEMU_BUILD_ROOT=/tmp/qemu-rcu /tmp/qemu-rcu/pyvenv/bin/python3 /home/thuth/devel/qemu/tests/functional/alpha/test_replay.py
> =============================================================================
> 
> Summary of Failures:
> 
> 1/1 qemu:func-thorough+func-alpha-thorough+thorough / func-alpha-replay 
> TIMEOUT        90.01s   killed by signal 15 SIGTERM
> 
> 
> There is also not that much helpful information in
> tests/functional/alpha/test_replay.AlphaReplay.test_clipper, apart from
> the console.log file. For a good run, the console log looks like this:
> 
> 2025-11-04 08:16:46,148: PCI: 00:01:0 class 0101 id 1095:0646
> 2025-11-04 08:16:46,149: PCI:   region 0: 0000c000
> 2025-11-04 08:16:46,149: PCI:   region 1: 0000c008
> 2025-11-04 08:16:46,149: PCI:   region 2: 0000c010
> 2025-11-04 08:16:46,149: PCI:   region 3: 0000c018
> 2025-11-04 08:16:46,149: PCI:   region 4: 0000c020
> 2025-11-04 08:16:46,149: PCI: 00:07:0 class 0601 id 8086:0484
> 2025-11-04 08:16:48,149: [    0.000000] Initializing cgroup subsys cpu
> 2025-11-04 08:16:48,149: [    0.000000] Linux version 2.6.26-2-alpha- 
> generic (Debian 2.6.26-29) (dannf@debian.org) (gcc version 4.1.3 
> 20080704 (prerelease) (Debian 4.1.2-25)) #1 Sun Mar 4 21:08:03 UTC 2012
> 2025-11-04 08:16:48,150: [    0.000000] Booting GENERIC on Tsunami 
> variation Clipper using machine vector Clipper from SRM
> 2025-11-04 08:16:48,150: [    0.000000] Major Options: MAGIC_SYSRQ
> 2025-11-04 08:16:48,150: [    0.000000] Command line: printk.time=0 
> console=ttyS0
> 2025-11-04 08:16:48,150: [    0.000000] memcluster 0, usage 1, 
> start        0, end       15
> 2025-11-04 08:16:48,150: [    0.000000] memcluster 1, usage 0, 
> start       15, end    16384
> 2025-11-04 08:16:48,150: [    0.000000] freeing pages 15:2048
> 2025-11-04 08:16:48,150: [    0.000000] freeing pages 2987:16384
> 2025-11-04 08:16:48,151: [    0.000000] reserving pages 2987:2988
> 2025-11-04 08:16:48,151: [    0.000000] Built 1 zonelists in Zone order, 
> mobility grouping on.  Total pages: 16272
> 2025-11-04 08:16:48,151: [    0.000000] Kernel command line: 
> printk.time=0 console=ttyS0
> 2025-11-04 08:16:57,358: PCI: 00:01:0 class 0101 id 1095:0646
> 2025-11-04 08:16:57,358: PCI:   region 0: 0000c000
> 2025-11-04 08:16:57,358: PCI:   region 1: 0000c008
> 2025-11-04 08:16:57,359: PCI:   region 2: 0000c010
> 2025-11-04 08:16:57,359: PCI:   region 3: 0000c018
> 2025-11-04 08:16:57,359: PCI:   region 4: 0000c020
> 2025-11-04 08:16:57,360: PCI: 00:07:0 class 0601 id 8086:0484
> 2025-11-04 08:17:08,468: [    0.000000] Initializing cgroup subsys cpu
> 2025-11-04 08:17:08,470: [    0.000000] Linux version 2.6.26-2-alpha- 
> generic (Debian 2.6.26-29) (dannf@debian.org) (gcc version 4.1.3 
> 20080704 (prerelease) (Debian 4.1.2-25)) #1 Sun Mar 4 21:08:03 UTC 2012
> 2025-11-04 08:17:08,471: [    0.000000] Booting GENERIC on Tsunami 
> variation Clipper using machine vector Clipper from SRM
> 2025-11-04 08:17:08,471: [    0.000000] Major Options: MAGIC_SYSRQ
> 2025-11-04 08:17:08,472: [    0.000000] Command line: printk.time=0 
> console=ttyS0
> 2025-11-04 08:17:08,472: [    0.000000] memcluster 0, usage 1, 
> start        0, end       15
> 2025-11-04 08:17:08,473: [    0.000000] memcluster 1, usage 0, 
> start       15, end    16384
> 2025-11-04 08:17:08,473: [    0.000000] freeing pages 15:2048
> 2025-11-04 08:17:08,474: [    0.000000] freeing pages 2987:16384
> 2025-11-04 08:17:08,474: [    0.000000] reserving pages 2987:2988
> 2025-11-04 08:17:08,475: [    0.000000] Built 1 zonelists in Zone order, 
> mobility grouping on.  Total pages: 16272
> 2025-11-04 08:17:08,476: [    0.000000] Kernel command line: 
> printk.time=0 console=ttyS0
> 
> I.e. the replay worked as expected. When it fails, console.log only 
> contains:
> 
> 2025-11-04 08:25:26,601: PCI: 00:01:0 class 0101 id 1095:0646
> 2025-11-04 08:25:26,601: PCI:   region 0: 0000c000
> 2025-11-04 08:25:26,601: PCI:   region 1: 0000c008
> 2025-11-04 08:25:26,601: PCI:   region 2: 0000c010
> 2025-11-04 08:25:26,601: PCI:   region 3: 0000c018
> 2025-11-04 08:25:26,601: PCI:   region 4: 0000c020
> 2025-11-04 08:25:26,602: PCI: 00:07:0 class 0601 id 8086:0484
> 2025-11-04 08:25:28,601: [    0.000000] Initializing cgroup subsys cpu
> 2025-11-04 08:25:28,602: [    0.000000] Linux version 2.6.26-2-alpha- 
> generic (Debian 2.6.26-29) (dannf@debian.org) (gcc version 4.1.3 
> 20080704 (prerelease) (Debian 4.1.2-25)) #1 Sun Mar 4 21:08:03 UTC 2012
> 2025-11-04 08:25:28,602: [    0.000000] Booting GENERIC on Tsunami 
> variation Clipper using machine vector Clipper from SRM
> 2025-11-04 08:25:28,602: [    0.000000] Major Options: MAGIC_SYSRQ
> 2025-11-04 08:25:28,602: [    0.000000] Command line: printk.time=0 
> console=ttyS0
> 2025-11-04 08:25:28,602: [    0.000000] memcluster 0, usage 1, 
> start        0, end       15
> 2025-11-04 08:25:28,602: [    0.000000] memcluster 1, usage 0, 
> start       15, end    16384
> 2025-11-04 08:25:28,602: [    0.000000] freeing pages 15:2048
> 2025-11-04 08:25:28,603: [    0.000000] freeing pages 2987:16384
> 2025-11-04 08:25:28,603: [    0.000000] reserving pages 2987:2988
> 2025-11-04 08:25:28,603: [    0.000000] Built 1 zonelists in Zone order, 
> mobility grouping on.  Total pages: 16272
> 2025-11-04 08:25:28,603: [    0.000000] Kernel command line: 
> printk.time=0 console=ttyS0
> 
> I.e. the replay did not work.
> 
> Could this RCU stuff somehow influence the replay mechanism in QEMU?

I don't know (yet).

Can you attach gdb and show a backtrace for each thread? It often 
reveals deadlock among threads.

Regards,
Akihiko Odaki

Re: Regression with the "replay" test on target alpha
Posted by Thomas Huth 1 week, 3 days ago
On 04/11/2025 09.08, Akihiko Odaki wrote:
> On 2025/11/04 16:41, Thomas Huth wrote:
>> On 04/11/2025 02.45, Akihiko Odaki wrote:
>>> On 2025/11/03 22:59, Thomas Huth wrote:
>>>> On 28/10/2025 18.34, Paolo Bonzini wrote:
>>>>> From: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
>>>>>
>>>>> Borrow the concept of force quiescent state from Linux to ensure readers
>>>>> remain fast during normal operation and to avoid stalls.
>>>>
>>>>   Hi Akihiko,
>>>>
>>>> looks like this commit has introduced a regression with the "replay" 
>>>> functional test on the alpha target.
>>>> When I run something like:
>>>>
>>>>   pyvenv/bin/meson test --no-rebuild -t 1 --setup thorough \
>>>>    --num-processes 1 --repeat 10 func-alpha-replay
>>>>
>>>> in the build folder, approx. half of the test runs are failing for me now.
>>>>
>>>> I bisected the issue to this patch here - when I rebuild qemu-system- 
>>>> alpha with the commit right before this change here, the above test runs 
>>>> work fine, so I'm pretty sure that the problem has been introduced by 
>>>> this commit here.
>>>>
>>>> Could you please have a look?
>>>
>>> I cannot reproduce it with commit 55d98e3edeeb ("rcu: Unify force 
>>> quiescent state").
>>>
>>> Can you provide meson-logs/testlog-thorough.txt so that I can look into 
>>> the failure you are facing? If you think you have something useful for 
>>> debugging, please share it to me too.
>>
>> There's not much in that testlog-thorough.txt that could be helpful here,
>> it's basically just the information that the test has been killed due to
>> the timeout:
>>
>> ==================================== 1/1 ===================================
>> test:         qemu:func-thorough+func-alpha-thorough+thorough / func- 
>> alpha-replay
>> start time:   07:25:26
>> duration:     90.01s
>> result:       killed by signal 15 SIGTERM
>> command:      RUST_BACKTRACE=1 QEMU_TEST_QEMU_IMG=/tmp/qemu-rcu/qemu-img 
>> QEMU_TEST_GDB=/usr/bin/gdb MALLOC_PERTURB_=255 MESON_TEST_ITERATION=1 
>> PYTHONPATH=/home/thuth/devel/qemu/python:/home/thuth/devel/qemu/tests/ 
>> functional G_TEST_SLOW=1 SPEED=thorough QEMU_TEST_QEMU_BINARY=/tmp/qemu- 
>> rcu/qemu-system-alpha 
>> ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 
>> LD_LIBRARY_PATH=/tmp/qemu-rcu/tests/tcg/plugins:/tmp/qemu-rcu/contrib/ 
>> plugins 
>> UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 QEMU_BUILD_ROOT=/tmp/qemu-rcu /tmp/qemu-rcu/pyvenv/bin/python3 /home/thuth/devel/qemu/tests/functional/alpha/test_replay.py
>> =============================================================================
>>
>> Summary of Failures:
>>
>> 1/1 qemu:func-thorough+func-alpha-thorough+thorough / func-alpha-replay 
>> TIMEOUT        90.01s   killed by signal 15 SIGTERM
>>
>>
>> There is also not that much helpful information in
>> tests/functional/alpha/test_replay.AlphaReplay.test_clipper, apart from
>> the console.log file. For a good run, the console log looks like this:
>>
>> 2025-11-04 08:16:46,148: PCI: 00:01:0 class 0101 id 1095:0646
>> 2025-11-04 08:16:46,149: PCI:   region 0: 0000c000
>> 2025-11-04 08:16:46,149: PCI:   region 1: 0000c008
>> 2025-11-04 08:16:46,149: PCI:   region 2: 0000c010
>> 2025-11-04 08:16:46,149: PCI:   region 3: 0000c018
>> 2025-11-04 08:16:46,149: PCI:   region 4: 0000c020
>> 2025-11-04 08:16:46,149: PCI: 00:07:0 class 0601 id 8086:0484
>> 2025-11-04 08:16:48,149: [    0.000000] Initializing cgroup subsys cpu
>> 2025-11-04 08:16:48,149: [    0.000000] Linux version 2.6.26-2-alpha- 
>> generic (Debian 2.6.26-29) (dannf@debian.org) (gcc version 4.1.3 20080704 
>> (prerelease) (Debian 4.1.2-25)) #1 Sun Mar 4 21:08:03 UTC 2012
>> 2025-11-04 08:16:48,150: [    0.000000] Booting GENERIC on Tsunami 
>> variation Clipper using machine vector Clipper from SRM
>> 2025-11-04 08:16:48,150: [    0.000000] Major Options: MAGIC_SYSRQ
>> 2025-11-04 08:16:48,150: [    0.000000] Command line: printk.time=0 
>> console=ttyS0
>> 2025-11-04 08:16:48,150: [    0.000000] memcluster 0, usage 1, 
>> start        0, end       15
>> 2025-11-04 08:16:48,150: [    0.000000] memcluster 1, usage 0, start       
>> 15, end    16384
>> 2025-11-04 08:16:48,150: [    0.000000] freeing pages 15:2048
>> 2025-11-04 08:16:48,150: [    0.000000] freeing pages 2987:16384
>> 2025-11-04 08:16:48,151: [    0.000000] reserving pages 2987:2988
>> 2025-11-04 08:16:48,151: [    0.000000] Built 1 zonelists in Zone order, 
>> mobility grouping on.  Total pages: 16272
>> 2025-11-04 08:16:48,151: [    0.000000] Kernel command line: printk.time=0 
>> console=ttyS0
>> 2025-11-04 08:16:57,358: PCI: 00:01:0 class 0101 id 1095:0646
>> 2025-11-04 08:16:57,358: PCI:   region 0: 0000c000
>> 2025-11-04 08:16:57,358: PCI:   region 1: 0000c008
>> 2025-11-04 08:16:57,359: PCI:   region 2: 0000c010
>> 2025-11-04 08:16:57,359: PCI:   region 3: 0000c018
>> 2025-11-04 08:16:57,359: PCI:   region 4: 0000c020
>> 2025-11-04 08:16:57,360: PCI: 00:07:0 class 0601 id 8086:0484
>> 2025-11-04 08:17:08,468: [    0.000000] Initializing cgroup subsys cpu
>> 2025-11-04 08:17:08,470: [    0.000000] Linux version 2.6.26-2-alpha- 
>> generic (Debian 2.6.26-29) (dannf@debian.org) (gcc version 4.1.3 20080704 
>> (prerelease) (Debian 4.1.2-25)) #1 Sun Mar 4 21:08:03 UTC 2012
>> 2025-11-04 08:17:08,471: [    0.000000] Booting GENERIC on Tsunami 
>> variation Clipper using machine vector Clipper from SRM
>> 2025-11-04 08:17:08,471: [    0.000000] Major Options: MAGIC_SYSRQ
>> 2025-11-04 08:17:08,472: [    0.000000] Command line: printk.time=0 
>> console=ttyS0
>> 2025-11-04 08:17:08,472: [    0.000000] memcluster 0, usage 1, 
>> start        0, end       15
>> 2025-11-04 08:17:08,473: [    0.000000] memcluster 1, usage 0, start       
>> 15, end    16384
>> 2025-11-04 08:17:08,473: [    0.000000] freeing pages 15:2048
>> 2025-11-04 08:17:08,474: [    0.000000] freeing pages 2987:16384
>> 2025-11-04 08:17:08,474: [    0.000000] reserving pages 2987:2988
>> 2025-11-04 08:17:08,475: [    0.000000] Built 1 zonelists in Zone order, 
>> mobility grouping on.  Total pages: 16272
>> 2025-11-04 08:17:08,476: [    0.000000] Kernel command line: printk.time=0 
>> console=ttyS0
>>
>> I.e. the replay worked as expected. When it fails, console.log only contains:
>>
>> 2025-11-04 08:25:26,601: PCI: 00:01:0 class 0101 id 1095:0646
>> 2025-11-04 08:25:26,601: PCI:   region 0: 0000c000
>> 2025-11-04 08:25:26,601: PCI:   region 1: 0000c008
>> 2025-11-04 08:25:26,601: PCI:   region 2: 0000c010
>> 2025-11-04 08:25:26,601: PCI:   region 3: 0000c018
>> 2025-11-04 08:25:26,601: PCI:   region 4: 0000c020
>> 2025-11-04 08:25:26,602: PCI: 00:07:0 class 0601 id 8086:0484
>> 2025-11-04 08:25:28,601: [    0.000000] Initializing cgroup subsys cpu
>> 2025-11-04 08:25:28,602: [    0.000000] Linux version 2.6.26-2-alpha- 
>> generic (Debian 2.6.26-29) (dannf@debian.org) (gcc version 4.1.3 20080704 
>> (prerelease) (Debian 4.1.2-25)) #1 Sun Mar 4 21:08:03 UTC 2012
>> 2025-11-04 08:25:28,602: [    0.000000] Booting GENERIC on Tsunami 
>> variation Clipper using machine vector Clipper from SRM
>> 2025-11-04 08:25:28,602: [    0.000000] Major Options: MAGIC_SYSRQ
>> 2025-11-04 08:25:28,602: [    0.000000] Command line: printk.time=0 
>> console=ttyS0
>> 2025-11-04 08:25:28,602: [    0.000000] memcluster 0, usage 1, 
>> start        0, end       15
>> 2025-11-04 08:25:28,602: [    0.000000] memcluster 1, usage 0, start       
>> 15, end    16384
>> 2025-11-04 08:25:28,602: [    0.000000] freeing pages 15:2048
>> 2025-11-04 08:25:28,603: [    0.000000] freeing pages 2987:16384
>> 2025-11-04 08:25:28,603: [    0.000000] reserving pages 2987:2988
>> 2025-11-04 08:25:28,603: [    0.000000] Built 1 zonelists in Zone order, 
>> mobility grouping on.  Total pages: 16272
>> 2025-11-04 08:25:28,603: [    0.000000] Kernel command line: printk.time=0 
>> console=ttyS0
>>
>> I.e. the replay did not work.
>>
>> Could this RCU stuff somehow influence the replay mechanism in QEMU?
> 
> I don't know (yet).
> 
> Can you attach gdb and show a backtrace for each thread? It often reveals 
> deadlock among threads.

Sure, it looks like this:

(gdb) thread apply all bt

Thread 4 (Thread 0x7f31bd1ff6c0 (LWP 89223) "qemu-system-alp"):
#0  0x00007f31c47150cd in syscall () at /lib64/libc.so.6
#1  0x00005593dd2b578d in qemu_futex_wait (f=0x5593ddad9e50 <rcu_call_ready_event>, val=4294967295) at /home/thuth/devel/qemu/include/qemu/futex.h:47
#2  0x00005593dd2b59a1 in qemu_event_wait (ev=0x5593ddad9e50 <rcu_call_ready_event>) at ../../home/thuth/devel/qemu/util/event.c:162
#3  0x00005593dd2c12e3 in call_rcu_thread (opaque=0x0) at ../../home/thuth/devel/qemu/util/rcu.c:304
#4  0x00005593dd2b3311 in qemu_thread_start (args=0x5594051010c0) at ../../home/thuth/devel/qemu/util/qemu-thread-posix.c:393
#5  0x00007f31c4693f54 in start_thread () at /lib64/libc.so.6
#6  0x00007f31c471732c in __clone3 () at /lib64/libc.so.6

Thread 3 (Thread 0x7f31bc8fd6c0 (LWP 89224) "qemu-system-alp"):
#0  0x00007f31c469c462 in __syscall_cancel_arch () at /lib64/libc.so.6
#1  0x00007f31c469075c in __internal_syscall_cancel () at /lib64/libc.so.6
#2  0x00007f31c46907a4 in __syscall_cancel () at /lib64/libc.so.6
#3  0x00007f31c470a7c6 in ppoll () at /lib64/libc.so.6
#4  0x00007f31c6916890 in g_main_context_iterate_unlocked.isra () at /lib64/libglib-2.0.so.0
#5  0x00007f31c6916a4f in g_main_loop_run () at /lib64/libglib-2.0.so.0
#6  0x00005593dd0d1ab0 in iothread_run (opaque=0x559405a567a0) at ../../home/thuth/devel/qemu/iothread.c:70
#7  0x00005593dd2b3311 in qemu_thread_start (args=0x559405a571a0) at ../../home/thuth/devel/qemu/util/qemu-thread-posix.c:393
#8  0x00007f31c4693f54 in start_thread () at /lib64/libc.so.6
#9  0x00007f31c471732c in __clone3 () at /lib64/libc.so.6

Thread 2 (Thread 0x7f3137fff6c0 (LWP 89225) "qemu-system-alp"):
#0  0x00007f31c469c462 in __syscall_cancel_arch () at /lib64/libc.so.6
#1  0x00007f31c469075c in __internal_syscall_cancel () at /lib64/libc.so.6
#2  0x00007f31c46907a4 in __syscall_cancel () at /lib64/libc.so.6
#3  0x00007f31c470b2be in write () at /lib64/libc.so.6
#4  0x00005593dd2af441 in event_notifier_set (e=0x559405a56a54) at ../../home/thuth/devel/qemu/util/event_notifier-posix.c:117
#5  0x00005593dd2cdcde in aio_notify (ctx=0x559405a56980) at ../../home/thuth/devel/qemu/util/async.c:506
#6  0x00005593dd2cdd2a in aio_timerlist_notify (opaque=0x559405a56980, type=QEMU_CLOCK_VIRTUAL) at ../../home/thuth/devel/qemu/util/async.c:524
#7  0x00005593dd2d4014 in timerlist_notify (timer_list=0x559405a56d50) at ../../home/thuth/devel/qemu/util/qemu-timer.c:293
#8  0x00005593dd2d3aaf in qemu_clock_notify (type=QEMU_CLOCK_VIRTUAL) at ../../home/thuth/devel/qemu/util/qemu-timer.c:147
#9  0x00005593dcf92c2c in icount_notify_aio_contexts () at ../../home/thuth/devel/qemu/accel/tcg/tcg-accel-ops-icount.c:72
#10 0x00005593dcf92c92 in icount_handle_deadline () at ../../home/thuth/devel/qemu/accel/tcg/tcg-accel-ops-icount.c:88
#11 0x00005593dcf93c45 in rr_cpu_thread_fn (arg=0x559405ce47d0) at ../../home/thuth/devel/qemu/accel/tcg/tcg-accel-ops-rr.c:251
#12 0x00005593dd2b3311 in qemu_thread_start (args=0x559405ce4720) at ../../home/thuth/devel/qemu/util/qemu-thread-posix.c:393
#13 0x00007f31c4693f54 in start_thread () at /lib64/libc.so.6
#14 0x00007f31c471732c in __clone3 () at /lib64/libc.so.6

Thread 1 (Thread 0x7f31bd82bb00 (LWP 89222) "qemu-system-alp"):
#0  0x00007f31c469c462 in __syscall_cancel_arch () at /lib64/libc.so.6
#1  0x00007f31c469075c in __internal_syscall_cancel () at /lib64/libc.so.6
#2  0x00007f31c46907a4 in __syscall_cancel () at /lib64/libc.so.6
#3  0x00007f31c470a7c6 in ppoll () at /lib64/libc.so.6
#4  0x00005593dd2d418e in qemu_poll_ns (fds=0x5594064aad30, nfds=5, timeout=993592000) at ../../home/thuth/devel/qemu/util/qemu-timer.c:342
#5  0x00005593dd2cf1bd in os_host_main_loop_wait (timeout=993592000) at ../../home/thuth/devel/qemu/util/main-loop.c:308
#6  0x00005593dd2cf30f in main_loop_wait (nonblocking=0) at ../../home/thuth/devel/qemu/util/main-loop.c:592
#7  0x00005593dce7edf6 in qemu_main_loop () at ../../home/thuth/devel/qemu/system/runstate.c:903
#8  0x00005593dd1e212e in qemu_default_main (opaque=0x0) at ../../home/thuth/devel/qemu/system/main.c:50
#9  0x00005593dd1e21e8 in main (argc=25, argv=0x7ffdad60bf18) at ../../home/thuth/devel/qemu/system/main.c:93


  HTH,
   Thomas


Re: Regression with the "replay" test on target alpha
Posted by Paolo Bonzini 1 week, 3 days ago
On 11/4/25 09:38, Thomas Huth wrote:
> Thread 4 (Thread 0x7f31bd1ff6c0 (LWP 89223) "qemu-system-alp"):
> #0  0x00007f31c47150cd in syscall () at /lib64/libc.so.6
> #1  0x00005593dd2b578d in qemu_futex_wait (f=0x5593ddad9e50 
> <rcu_call_ready_event>, val=4294967295) at /home/thuth/devel/qemu/ 
> include/qemu/futex.h:47
> #2  0x00005593dd2b59a1 in qemu_event_wait (ev=0x5593ddad9e50 
> <rcu_call_ready_event>) at ../../home/thuth/devel/qemu/util/event.c:162
> #3  0x00005593dd2c12e3 in call_rcu_thread (opaque=0x0) at ../../home/ 
> thuth/devel/qemu/util/rcu.c:304

The RCU thread is simply waiting.

> Thread 3 (Thread 0x7f31bc8fd6c0 (LWP 89224) "qemu-system-alp"):
> #0  0x00007f31c469c462 in __syscall_cancel_arch () at /lib64/libc.so.6
> #1  0x00007f31c469075c in __internal_syscall_cancel () at /lib64/libc.so.6
> #2  0x00007f31c46907a4 in __syscall_cancel () at /lib64/libc.so.6
> #3  0x00007f31c470a7c6 in ppoll () at /lib64/libc.so.6
> #4  0x00007f31c6916890 in g_main_context_iterate_unlocked.isra () at / 
> lib64/libglib-2.0.so.0
> #5  0x00007f31c6916a4f in g_main_loop_run () at /lib64/libglib-2.0.so.0
> #6  0x00005593dd0d1ab0 in iothread_run (opaque=0x559405a567a0) at ../../ 
> home/thuth/devel/qemu/iothread.c:70
> #7  0x00005593dd2b3311 in qemu_thread_start (args=0x559405a571a0) 
> at ../../home/thuth/devel/qemu/util/qemu-thread-posix.c:393
> #8  0x00007f31c4693f54 in start_thread () at /lib64/libc.so.6
> #9  0x00007f31c471732c in __clone3 () at /lib64/libc.so.6

This iothread is doing nothing.

> Thread 2 (Thread 0x7f3137fff6c0 (LWP 89225) "qemu-system-alp"):
> #0  0x00007f31c469c462 in __syscall_cancel_arch () at /lib64/libc.so.6
> #1  0x00007f31c469075c in __internal_syscall_cancel () at /lib64/libc.so.6
> #2  0x00007f31c46907a4 in __syscall_cancel () at /lib64/libc.so.6
> #3  0x00007f31c470b2be in write () at /lib64/libc.so.6
> #4  0x00005593dd2af441 in event_notifier_set (e=0x559405a56a54) 
> at ../../home/thuth/devel/qemu/util/event_notifier-posix.c:117
> #5  0x00005593dd2cdcde in aio_notify (ctx=0x559405a56980) at ../../home/ 
> thuth/devel/qemu/util/async.c:506
In this backtrace the CPU is waking up the main loop (thread 1), but the 
main loop is running so I don't think it's really a deadlock.  It's more 
likely that the replay is not matching the record, or there's a similar 
reason why the replay is not proceeding.

Paolo


Re: Regression with the "replay" test on target alpha
Posted by Akihiko Odaki 1 week, 2 days ago
On 2025/11/04 21:18, Paolo Bonzini wrote:
> On 11/4/25 09:38, Thomas Huth wrote:
>> Thread 4 (Thread 0x7f31bd1ff6c0 (LWP 89223) "qemu-system-alp"):
>> #0  0x00007f31c47150cd in syscall () at /lib64/libc.so.6
>> #1  0x00005593dd2b578d in qemu_futex_wait (f=0x5593ddad9e50 
>> <rcu_call_ready_event>, val=4294967295) at /home/thuth/devel/qemu/ 
>> include/qemu/futex.h:47
>> #2  0x00005593dd2b59a1 in qemu_event_wait (ev=0x5593ddad9e50 
>> <rcu_call_ready_event>) at ../../home/thuth/devel/qemu/util/event.c:162
>> #3  0x00005593dd2c12e3 in call_rcu_thread (opaque=0x0) at ../../home/ 
>> thuth/devel/qemu/util/rcu.c:304
> 
> The RCU thread is simply waiting.
> 
>> Thread 3 (Thread 0x7f31bc8fd6c0 (LWP 89224) "qemu-system-alp"):
>> #0  0x00007f31c469c462 in __syscall_cancel_arch () at /lib64/libc.so.6
>> #1  0x00007f31c469075c in __internal_syscall_cancel () at /lib64/ 
>> libc.so.6
>> #2  0x00007f31c46907a4 in __syscall_cancel () at /lib64/libc.so.6
>> #3  0x00007f31c470a7c6 in ppoll () at /lib64/libc.so.6
>> #4  0x00007f31c6916890 in g_main_context_iterate_unlocked.isra () at / 
>> lib64/libglib-2.0.so.0
>> #5  0x00007f31c6916a4f in g_main_loop_run () at /lib64/libglib-2.0.so.0
>> #6  0x00005593dd0d1ab0 in iothread_run (opaque=0x559405a567a0) 
>> at ../../ home/thuth/devel/qemu/iothread.c:70
>> #7  0x00005593dd2b3311 in qemu_thread_start (args=0x559405a571a0) 
>> at ../../home/thuth/devel/qemu/util/qemu-thread-posix.c:393
>> #8  0x00007f31c4693f54 in start_thread () at /lib64/libc.so.6
>> #9  0x00007f31c471732c in __clone3 () at /lib64/libc.so.6
> 
> This iothread is doing nothing.
> 
>> Thread 2 (Thread 0x7f3137fff6c0 (LWP 89225) "qemu-system-alp"):
>> #0  0x00007f31c469c462 in __syscall_cancel_arch () at /lib64/libc.so.6
>> #1  0x00007f31c469075c in __internal_syscall_cancel () at /lib64/ 
>> libc.so.6
>> #2  0x00007f31c46907a4 in __syscall_cancel () at /lib64/libc.so.6
>> #3  0x00007f31c470b2be in write () at /lib64/libc.so.6
>> #4  0x00005593dd2af441 in event_notifier_set (e=0x559405a56a54) 
>> at ../../home/thuth/devel/qemu/util/event_notifier-posix.c:117
>> #5  0x00005593dd2cdcde in aio_notify (ctx=0x559405a56980) at ../../ 
>> home/ thuth/devel/qemu/util/async.c:506
> In this backtrace the CPU is waking up the main loop (thread 1), but the 
> main loop is running so I don't think it's really a deadlock.  It's more 
> likely that the replay is not matching the record, or there's a similar 
> reason why the replay is not proceeding.

I agree. It is more likely that debugging the replay code instead of the 
RCU change will lead to the real cause.

Regards,
Akihiko Odaki

Re: Regression with the "replay" test on target alpha
Posted by Thomas Huth 1 week ago
On 05/11/2025 07.29, Akihiko Odaki wrote:
> On 2025/11/04 21:18, Paolo Bonzini wrote:
>> On 11/4/25 09:38, Thomas Huth wrote:
>>> Thread 4 (Thread 0x7f31bd1ff6c0 (LWP 89223) "qemu-system-alp"):
>>> #0  0x00007f31c47150cd in syscall () at /lib64/libc.so.6
>>> #1  0x00005593dd2b578d in qemu_futex_wait (f=0x5593ddad9e50 
>>> <rcu_call_ready_event>, val=4294967295) at /home/thuth/devel/qemu/ 
>>> include/qemu/futex.h:47
>>> #2  0x00005593dd2b59a1 in qemu_event_wait (ev=0x5593ddad9e50 
>>> <rcu_call_ready_event>) at ../../home/thuth/devel/qemu/util/event.c:162
>>> #3  0x00005593dd2c12e3 in call_rcu_thread (opaque=0x0) at ../../home/ 
>>> thuth/devel/qemu/util/rcu.c:304
>>
>> The RCU thread is simply waiting.
>>
>>> Thread 3 (Thread 0x7f31bc8fd6c0 (LWP 89224) "qemu-system-alp"):
>>> #0  0x00007f31c469c462 in __syscall_cancel_arch () at /lib64/libc.so.6
>>> #1  0x00007f31c469075c in __internal_syscall_cancel () at /lib64/ libc.so.6
>>> #2  0x00007f31c46907a4 in __syscall_cancel () at /lib64/libc.so.6
>>> #3  0x00007f31c470a7c6 in ppoll () at /lib64/libc.so.6
>>> #4  0x00007f31c6916890 in g_main_context_iterate_unlocked.isra () at / 
>>> lib64/libglib-2.0.so.0
>>> #5  0x00007f31c6916a4f in g_main_loop_run () at /lib64/libglib-2.0.so.0
>>> #6  0x00005593dd0d1ab0 in iothread_run (opaque=0x559405a567a0) at ../../ 
>>> home/thuth/devel/qemu/iothread.c:70
>>> #7  0x00005593dd2b3311 in qemu_thread_start (args=0x559405a571a0) 
>>> at ../../home/thuth/devel/qemu/util/qemu-thread-posix.c:393
>>> #8  0x00007f31c4693f54 in start_thread () at /lib64/libc.so.6
>>> #9  0x00007f31c471732c in __clone3 () at /lib64/libc.so.6
>>
>> This iothread is doing nothing.
>>
>>> Thread 2 (Thread 0x7f3137fff6c0 (LWP 89225) "qemu-system-alp"):
>>> #0  0x00007f31c469c462 in __syscall_cancel_arch () at /lib64/libc.so.6
>>> #1  0x00007f31c469075c in __internal_syscall_cancel () at /lib64/ libc.so.6
>>> #2  0x00007f31c46907a4 in __syscall_cancel () at /lib64/libc.so.6
>>> #3  0x00007f31c470b2be in write () at /lib64/libc.so.6
>>> #4  0x00005593dd2af441 in event_notifier_set (e=0x559405a56a54) at ../../ 
>>> home/thuth/devel/qemu/util/event_notifier-posix.c:117
>>> #5  0x00005593dd2cdcde in aio_notify (ctx=0x559405a56980) at ../../ home/ 
>>> thuth/devel/qemu/util/async.c:506
>> In this backtrace the CPU is waking up the main loop (thread 1), but the 
>> main loop is running so I don't think it's really a deadlock.  It's more 
>> likely that the replay is not matching the record, or there's a similar 
>> reason why the replay is not proceeding.
> 
> I agree. It is more likely that debugging the replay code instead of the RCU 
> change will lead to the real cause.

This now also impacts the Gitlab-CI :

  https://gitlab.com/qemu-project/qemu/-/jobs/11984309870

How do we tackle this? It feels bad that we have such a subtle problem at 
the beginning of the freeze period. Shall we revert the RCU patch 'til the 
problem is understood?

  Thomas


Re: Regression with the "replay" test on target alpha
Posted by Paolo Bonzini 1 week ago
On 11/7/25 08:41, Thomas Huth wrote:
> On 05/11/2025 07.29, Akihiko Odaki wrote:
>> On 2025/11/04 21:18, Paolo Bonzini wrote:
>>> In this backtrace the CPU is waking up the main loop (thread 1), but 
>>> the main loop is running so I don't think it's really a deadlock.  
>>> It's more likely that the replay is not matching the record, or 
>>> there's a similar reason why the replay is not proceeding.
>>
>> I agree. It is more likely that debugging the replay code instead of 
>> the RCU change will lead to the real cause.
> 
> This now also impacts the Gitlab-CI :
> 
>   https://gitlab.com/qemu-project/qemu/-/jobs/11984309870
> 
> How do we tackle this? It feels bad that we have such a subtle problem 
> at the beginning of the freeze period. Shall we revert the RCU patch 
> 'til the problem is understood?

I'll look at it today; if I can't make heads or tails of it we'll revert 
it, yes.

Paolo