[Qemu-devel] [RFC PATCH] tests: replace rem = sleep(time) with g_timer

Alex Bennée posted 1 patch 5 years, 3 months ago
Test asan passed
Test checkpatch passed
Test docker-clang@ubuntu passed
Test docker-mingw@fedora passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20190111143815.26107-1-alex.bennee@linaro.org
tests/atomic64-bench.c   | 11 ++++++++---
tests/atomic_add-bench.c | 11 ++++++++---
tests/qht-bench.c        | 11 ++++++++---
3 files changed, 24 insertions(+), 9 deletions(-)
[Qemu-devel] [RFC PATCH] tests: replace rem = sleep(time) with g_timer
Posted by Alex Bennée 5 years, 3 months ago
Relying on sleep to always return having slept isn't safe as a signal
may have occurred. If signals are constantly incoming the program will
never reach it's termination condition. This is believed to be the
mechanism causing time outs for qht-test in Travis.

Instead we use a g_timer to determine if the duration of the test has
passed and sleep for a second at a time. This may bias benchmark
results for short runs.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 tests/atomic64-bench.c   | 11 ++++++++---
 tests/atomic_add-bench.c | 11 ++++++++---
 tests/qht-bench.c        | 11 ++++++++---
 3 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/tests/atomic64-bench.c b/tests/atomic64-bench.c
index 71692560ed..f087ca5bc2 100644
--- a/tests/atomic64-bench.c
+++ b/tests/atomic64-bench.c
@@ -74,18 +74,23 @@ static void *thread_func(void *arg)
 
 static void run_test(void)
 {
-    unsigned int remaining;
+    GTimer *timer;
     unsigned int i;
 
     while (atomic_read(&n_ready_threads) != n_threads) {
         cpu_relax();
     }
+
+    timer = g_timer_new();
+
     atomic_set(&test_start, true);
     do {
-        remaining = sleep(duration);
-    } while (remaining);
+        sleep(1);
+    } while (g_timer_elapsed(timer, NULL) < duration);
     atomic_set(&test_stop, true);
 
+    g_timer_destroy(timer);
+
     for (i = 0; i < n_threads; i++) {
         qemu_thread_join(&threads[i]);
     }
diff --git a/tests/atomic_add-bench.c b/tests/atomic_add-bench.c
index 2f6c72f63a..5fdf901a71 100644
--- a/tests/atomic_add-bench.c
+++ b/tests/atomic_add-bench.c
@@ -76,18 +76,23 @@ static void *thread_func(void *arg)
 
 static void run_test(void)
 {
-    unsigned int remaining;
+    GTimer *timer;
     unsigned int i;
 
     while (atomic_read(&n_ready_threads) != n_threads) {
         cpu_relax();
     }
+
+    timer = g_timer_new();
+
     atomic_set(&test_start, true);
     do {
-        remaining = sleep(duration);
-    } while (remaining);
+        sleep(1);
+    } while (g_timer_elapsed(timer, NULL) < duration);
     atomic_set(&test_stop, true);
 
+    g_timer_destroy(timer);
+
     for (i = 0; i < n_threads; i++) {
         qemu_thread_join(&threads[i]);
     }
diff --git a/tests/qht-bench.c b/tests/qht-bench.c
index ab4e708180..7473fcb60a 100644
--- a/tests/qht-bench.c
+++ b/tests/qht-bench.c
@@ -398,18 +398,23 @@ static void pr_stats(void)
 
 static void run_test(void)
 {
-    unsigned int remaining;
+    GTimer *timer;
     int i;
 
     while (atomic_read(&n_ready_threads) != n_rw_threads + n_rz_threads) {
         cpu_relax();
     }
+
+    timer = g_timer_new();
+
     atomic_set(&test_start, true);
     do {
-        remaining = sleep(duration);
-    } while (remaining);
+        sleep(1);
+    } while (g_timer_elapsed(timer, NULL) < duration);
     atomic_set(&test_stop, true);
 
+    g_timer_destroy(timer);
+
     for (i = 0; i < n_rw_threads; i++) {
         qemu_thread_join(&rw_threads[i]);
     }
-- 
2.17.1


Re: [Qemu-devel] [RFC PATCH] tests: replace rem = sleep(time) with g_timer
Posted by Paolo Bonzini 5 years, 3 months ago
On 11/01/19 15:38, Alex Bennée wrote:
> Relying on sleep to always return having slept isn't safe as a signal
> may have occurred. If signals are constantly incoming the program will
> never reach it's termination condition. This is believed to be the
> mechanism causing time outs for qht-test in Travis.
> 
> Instead we use a g_timer to determine if the duration of the test has
> passed and sleep for a second at a time. This may bias benchmark
> results for short runs.

Why not g_usleep?  It already does a while loop around nanosleep (which
returns the remaining time in the wait, like select but unlike sleep and
poll).

Thanks,

Paolo

Re: [Qemu-devel] [RFC PATCH] tests: replace rem = sleep(time) with g_timer
Posted by Alex Bennée 5 years, 3 months ago
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 11/01/19 15:38, Alex Bennée wrote:
>> Relying on sleep to always return having slept isn't safe as a signal
>> may have occurred. If signals are constantly incoming the program will
>> never reach it's termination condition. This is believed to be the
>> mechanism causing time outs for qht-test in Travis.
>>
>> Instead we use a g_timer to determine if the duration of the test has
>> passed and sleep for a second at a time. This may bias benchmark
>> results for short runs.
>
> Why not g_usleep?  It already does a while loop around nanosleep (which
> returns the remaining time in the wait, like select but unlike sleep and
> poll).

Yeah I'm testing that now. However I have managed to trigger:

  ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): (35584 == 0)

but I'm not sure if this is some other side-effect of the
test-qht-par/qht-bench invocation dance.

--
Alex Bennée

Re: [Qemu-devel] [RFC PATCH] tests: replace rem = sleep(time) with g_timer
Posted by Paolo Bonzini 5 years, 3 months ago
On 11/01/19 16:28, Alex Bennée wrote:
>> Why not g_usleep?  It already does a while loop around nanosleep (which
>> returns the remaining time in the wait, like select but unlike sleep and
>> poll).
> Yeah I'm testing that now. However I have managed to trigger:
> 
>   ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): (35584 == 0)

I think that's a good old SIGSEGV (0x8B00).

Paolo

Re: [Qemu-devel] [RFC PATCH] tests: replace rem = sleep(time) with g_timer
Posted by Alex Bennée 5 years, 3 months ago
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 11/01/19 16:28, Alex Bennée wrote:
>>> Why not g_usleep?  It already does a while loop around nanosleep (which
>>> returns the remaining time in the wait, like select but unlike sleep and
>>> poll).
>> Yeah I'm testing that now. However I have managed to trigger:
>>
>>   ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): (35584 == 0)
>
> I think that's a good old SIGSEGV (0x8B00).

Hmmm, but I haven't been able to trigger it running it directly:

  retry.py -n 30 -c -- ./tests/qht-bench 1>/dev/null 2>&1 -R -S0.1 -D10000 -N1 -n 4 -u 20 -d 1

Could this be some sort of weird interaction caused by using system()?

--
Alex Bennée

Re: [Qemu-devel] [RFC PATCH] tests: replace rem = sleep(time) with g_timer
Posted by Alex Bennée 5 years, 3 months ago
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 11/01/19 16:28, Alex Bennée wrote:
>>> Why not g_usleep?  It already does a while loop around nanosleep (which
>>> returns the remaining time in the wait, like select but unlike sleep and
>>> poll).
>> Yeah I'm testing that now. However I have managed to trigger:
>>
>>   ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): (35584 == 0)
>
> I think that's a good old SIGSEGV (0x8B00).

According to the PC in the logs:

  Line 98 of "mcount.c" starts at address 0x7ffff6e15145 <__mcount_internal+69> and ends at 0x7ffff6e15148 <__mcount_internal+72>.

>
> Paolo


--
Alex Bennée

Re: [Qemu-devel] [RFC PATCH] tests: replace rem = sleep(time) with g_timer
Posted by Eduardo Habkost 5 years, 3 months ago
On Fri, Jan 11, 2019 at 04:06:54PM +0000, Alex Bennée wrote:
> 
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
> > On 11/01/19 16:28, Alex Bennée wrote:
> >>> Why not g_usleep?  It already does a while loop around nanosleep (which
> >>> returns the remaining time in the wait, like select but unlike sleep and
> >>> poll).
> >> Yeah I'm testing that now. However I have managed to trigger:
> >>
> >>   ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): (35584 == 0)
> >
> > I think that's a good old SIGSEGV (0x8B00).
> 
> According to the PC in the logs:
> 
>   Line 98 of "mcount.c" starts at address 0x7ffff6e15145 <__mcount_internal+69> and ends at 0x7ffff6e15148 <__mcount_internal+72>.

Was this on Travis?  Which architecture?

-- 
Eduardo

Re: [Qemu-devel] [RFC PATCH] tests: replace rem = sleep(time) with g_timer
Posted by Greg Kurz 5 years, 3 months ago
On Fri, 11 Jan 2019 16:41:41 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 11/01/19 16:28, Alex Bennée wrote:
> >> Why not g_usleep?  It already does a while loop around nanosleep (which
> >> returns the remaining time in the wait, like select but unlike sleep and
> >> poll).  
> > Yeah I'm testing that now. However I have managed to trigger:
> > 
> >   ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): (35584 == 0)  
> 
> I think that's a good old SIGSEGV (0x8B00).
> 

Hmmm... system() returns a "wait status" that can  be examined using the
macros described in waitpid(2), and we have:

/* If WIFEXITED(STATUS), the low-order 8 bits of the status.  */
#define	__WEXITSTATUS(status)	(((status) & 0xff00) >> 8)

So this rather looks like a 139 exit status to me... Not sure how
this can happen though.

> Paolo
> 


Re: [Qemu-devel] [RFC PATCH] tests: replace rem = sleep(time) with g_timer
Posted by Alex Bennée 5 years, 3 months ago
Greg Kurz <groug@kaod.org> writes:

> On Fri, 11 Jan 2019 16:41:41 +0100
> Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>> On 11/01/19 16:28, Alex Bennée wrote:
>> >> Why not g_usleep?  It already does a while loop around nanosleep (which
>> >> returns the remaining time in the wait, like select but unlike sleep and
>> >> poll).
>> > Yeah I'm testing that now. However I have managed to trigger:
>> >
>> >   ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): (35584 == 0)
>>
>> I think that's a good old SIGSEGV (0x8B00).
>>
>
> Hmmm... system() returns a "wait status" that can  be examined using the
> macros described in waitpid(2), and we have:
>
> /* If WIFEXITED(STATUS), the low-order 8 bits of the status.  */
> #define	__WEXITSTATUS(status)	(((status) & 0xff00) >> 8)
>
> So this rather looks like a 139 exit status to me... Not sure how
> this can happen though.

Yeah the child segfaulted in mcount while closing down. I've started a
new thread with the details of the remaining failure modes:

  Subject: Remaining CI failures
  Date: Fri, 11 Jan 2019 19:10:07 +0000
  Message-ID: <87lg3rui28.fsf@linaro.org>


--
Alex Bennée