[Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30

Markus Armbruster posted 8 patches 7 years ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20181030191620.32168-1-armbru@redhat.com
Test docker-clang@ubuntu passed
Test checkpatch passed
Test asan passed
Test docker-mingw@fedora passed
Test docker-quick@centos7 passed
There is a newer version of this series
docs/interop/qmp-spec.txt |   5 ++-
include/monitor/monitor.h |   3 +-
monitor.c                 | 111 +++++++++++++++++++++-------------------------
qapi/misc.json            |  40 -----------------
tests/libqtest.c          |  10 ++---
tests/libqtest.h          |   4 +-
tests/qmp-test.c          |  32 +++++++++++--
tests/test-qmp-cmds.c     |  16 +++++++
vl.c                      |   9 ++--
9 files changed, 110 insertions(+), 120 deletions(-)
[Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30
Posted by Markus Armbruster 7 years ago
The following changes since commit 3f3285491dd52014852a56135c90e428c8b507ea:

  Merge remote-tracking branch 'remotes/jnsnow/tags/bitmaps-pull-request' into staging (2018-10-30 14:09:25 +0000)

are available in the Git repository at:

  git://repo.or.cz/qemu/armbru.git tags/pull-monitor-2018-10-30

for you to fetch changes up to 1a8de278dd3c93a7aa40f88486509fe0e2de04da:

  vl: Avoid crash when -mon is underspecified (2018-10-30 20:13:52 +0100)

----------------------------------------------------------------
Monitor patches for 2018-10-30

* Fix crash on shutdown with --daemonize
* Change out-of-band execution to stop reading instead of dropping
  commands
* Enable out-of-band execution by default
* Avoid crash when -mon lacks chardev=...

----------------------------------------------------------------
Eric Blake (1):
      vl: Avoid crash when -mon is underspecified

Peter Xu (5):
      monitor: Suspend monitor instead dropping commands
      monitor: remove "x-oob", turn oob on by default
      Revert "tests: Add parameter to qtest_init_without_qmp_handshake"
      tests: add oob functional test for test-qmp-cmds
      tests: qmp-test: add queue full test

Wolfgang Bumiller (2):
      monitor: guard iothread access by mon->use_io_thread
      monitor: delay monitor iothread creation

 docs/interop/qmp-spec.txt |   5 ++-
 include/monitor/monitor.h |   3 +-
 monitor.c                 | 111 +++++++++++++++++++++-------------------------
 qapi/misc.json            |  40 -----------------
 tests/libqtest.c          |  10 ++---
 tests/libqtest.h          |   4 +-
 tests/qmp-test.c          |  32 +++++++++++--
 tests/test-qmp-cmds.c     |  16 +++++++
 vl.c                      |   9 ++--
 9 files changed, 110 insertions(+), 120 deletions(-)

-- 
2.17.2


Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30
Posted by Peter Maydell 7 years ago
On 30 October 2018 at 19:16, Markus Armbruster <armbru@redhat.com> wrote:
> The following changes since commit 3f3285491dd52014852a56135c90e428c8b507ea:
>
>   Merge remote-tracking branch 'remotes/jnsnow/tags/bitmaps-pull-request' into staging (2018-10-30 14:09:25 +0000)
>
> are available in the Git repository at:
>
>   git://repo.or.cz/qemu/armbru.git tags/pull-monitor-2018-10-30
>
> for you to fetch changes up to 1a8de278dd3c93a7aa40f88486509fe0e2de04da:
>
>   vl: Avoid crash when -mon is underspecified (2018-10-30 20:13:52 +0100)
>
> ----------------------------------------------------------------
> Monitor patches for 2018-10-30
>
> * Fix crash on shutdown with --daemonize
> * Change out-of-band execution to stop reading instead of dropping
>   commands
> * Enable out-of-band execution by default
> * Avoid crash when -mon lacks chardev=...
>
> ----------------------------------------------------------------

Hi; I get test failures on FreeBSD host:

TEST: tests/qmp-test... (pid=18305)
  /sparc64/qmp/protocol:                                               OK
  /sparc64/qmp/oob:                                                    FAIL
GTester: last random seed: R02Se36038cab4c051a2cd47374ec9718e98
(pid=18337)
  /sparc64/qmp/preconfig:                                              OK
FAIL: tests/qmp-test

ERROR:tests/qmp-test.c:211:recv_cmd_id: assertion failed
(qdict_get_try_str(resp, "id") == id): ("oob-1" == "queue-blocks-1")
gmake: *** [/var/tmp/qemu-test.2n2tBU/tests/Makefile.include:805:
check-qtest-sparc] Error 1

and similarly with i386/qmp/oob and sparc/qmp/oob.

thanks
-- PMM

Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30
Posted by Markus Armbruster 6 years, 12 months ago
Peter Maydell <peter.maydell@linaro.org> writes:

> On 30 October 2018 at 19:16, Markus Armbruster <armbru@redhat.com> wrote:
>> The following changes since commit 3f3285491dd52014852a56135c90e428c8b507ea:
>>
>>   Merge remote-tracking branch 'remotes/jnsnow/tags/bitmaps-pull-request' into staging (2018-10-30 14:09:25 +0000)
>>
>> are available in the Git repository at:
>>
>>   git://repo.or.cz/qemu/armbru.git tags/pull-monitor-2018-10-30
>>
>> for you to fetch changes up to 1a8de278dd3c93a7aa40f88486509fe0e2de04da:
>>
>>   vl: Avoid crash when -mon is underspecified (2018-10-30 20:13:52 +0100)
>>
>> ----------------------------------------------------------------
>> Monitor patches for 2018-10-30
>>
>> * Fix crash on shutdown with --daemonize
>> * Change out-of-band execution to stop reading instead of dropping
>>   commands
>> * Enable out-of-band execution by default
>> * Avoid crash when -mon lacks chardev=...
>>
>> ----------------------------------------------------------------
>
> Hi; I get test failures on FreeBSD host:
>
> TEST: tests/qmp-test... (pid=18305)
>   /sparc64/qmp/protocol:                                               OK
>   /sparc64/qmp/oob:                                                    FAIL
> GTester: last random seed: R02Se36038cab4c051a2cd47374ec9718e98
> (pid=18337)
>   /sparc64/qmp/preconfig:                                              OK
> FAIL: tests/qmp-test
>
> ERROR:tests/qmp-test.c:211:recv_cmd_id: assertion failed
> (qdict_get_try_str(resp, "id") == id): ("oob-1" == "queue-blocks-1")
> gmake: *** [/var/tmp/qemu-test.2n2tBU/tests/Makefile.include:805:
> check-qtest-sparc] Error 1
>
> and similarly with i386/qmp/oob and sparc/qmp/oob.

I haven't been able to reproduce, yet.  I'm going to respin with fewer
patches, to get the other stuff in while I debug.

Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30
Posted by Peter Xu 6 years, 12 months ago
On Tue, Nov 06, 2018 at 04:56:51PM +0100, Markus Armbruster wrote:
> Peter Maydell <peter.maydell@linaro.org> writes:
> 
> > On 30 October 2018 at 19:16, Markus Armbruster <armbru@redhat.com> wrote:
> >> The following changes since commit 3f3285491dd52014852a56135c90e428c8b507ea:
> >>
> >>   Merge remote-tracking branch 'remotes/jnsnow/tags/bitmaps-pull-request' into staging (2018-10-30 14:09:25 +0000)
> >>
> >> are available in the Git repository at:
> >>
> >>   git://repo.or.cz/qemu/armbru.git tags/pull-monitor-2018-10-30
> >>
> >> for you to fetch changes up to 1a8de278dd3c93a7aa40f88486509fe0e2de04da:
> >>
> >>   vl: Avoid crash when -mon is underspecified (2018-10-30 20:13:52 +0100)
> >>
> >> ----------------------------------------------------------------
> >> Monitor patches for 2018-10-30
> >>
> >> * Fix crash on shutdown with --daemonize
> >> * Change out-of-band execution to stop reading instead of dropping
> >>   commands
> >> * Enable out-of-band execution by default
> >> * Avoid crash when -mon lacks chardev=...
> >>
> >> ----------------------------------------------------------------
> >
> > Hi; I get test failures on FreeBSD host:
> >
> > TEST: tests/qmp-test... (pid=18305)
> >   /sparc64/qmp/protocol:                                               OK
> >   /sparc64/qmp/oob:                                                    FAIL
> > GTester: last random seed: R02Se36038cab4c051a2cd47374ec9718e98
> > (pid=18337)
> >   /sparc64/qmp/preconfig:                                              OK
> > FAIL: tests/qmp-test
> >
> > ERROR:tests/qmp-test.c:211:recv_cmd_id: assertion failed
> > (qdict_get_try_str(resp, "id") == id): ("oob-1" == "queue-blocks-1")
> > gmake: *** [/var/tmp/qemu-test.2n2tBU/tests/Makefile.include:805:
> > check-qtest-sparc] Error 1
> >
> > and similarly with i386/qmp/oob and sparc/qmp/oob.
> 
> I haven't been able to reproduce, yet.  I'm going to respin with fewer
> patches, to get the other stuff in while I debug.

Strange, "make check -j8" failed on my hosts (I tried two) with either
Markus's pull tree or qemu master:

hw/core/ptimer.o: In function `timer_new_tl':
/home/xz/git/qemu/include/qemu/timer.h:536: undefined reference to `timer_init_tl'
collect2: error: ld returned 1 exit status
make: *** [/home/xz/git/qemu/rules.mak:124: tests/ptimer-test] Error 1

Is that only happening on my hosts?

And I can't reproduce the error too by only running the standalone
qmp-test.

Regaring to the test failure reported - when I saw that I thought
about the patch "monitor: resume the monitor earlier if needed" in my
series since that seems to have the possibility to bring such an
uncertainty, though I noticed that Markus has already removed that
patch from the pull.

-- 
Peter Xu

Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30
Posted by Peter Maydell 6 years, 12 months ago
On 7 November 2018 at 02:56, Peter Xu <peterx@redhat.com> wrote:
> Strange, "make check -j8" failed on my hosts (I tried two) with either
> Markus's pull tree or qemu master:
>
> hw/core/ptimer.o: In function `timer_new_tl':
> /home/xz/git/qemu/include/qemu/timer.h:536: undefined reference to `timer_init_tl'
> collect2: error: ld returned 1 exit status
> make: *** [/home/xz/git/qemu/rules.mak:124: tests/ptimer-test] Error 1
>
> Is that only happening on my hosts?

Commit 89a603a0c80ae3 changed things so that there is no
timer_new_tl() or timer_init_tl() any more, so if you have
an object file that's referring to it then it's probably
stale. Try a make clean.

thanks
-- PMM

Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30
Posted by Eric Blake 6 years, 12 months ago
On 11/7/18 5:21 AM, Peter Maydell wrote:
> On 7 November 2018 at 02:56, Peter Xu <peterx@redhat.com> wrote:
>> Strange, "make check -j8" failed on my hosts (I tried two) with either
>> Markus's pull tree or qemu master:
>>
>> hw/core/ptimer.o: In function `timer_new_tl':
>> /home/xz/git/qemu/include/qemu/timer.h:536: undefined reference to `timer_init_tl'
>> collect2: error: ld returned 1 exit status
>> make: *** [/home/xz/git/qemu/rules.mak:124: tests/ptimer-test] Error 1
>>
>> Is that only happening on my hosts?
> 
> Commit 89a603a0c80ae3 changed things so that there is no
> timer_new_tl() or timer_init_tl() any more, so if you have
> an object file that's referring to it then it's probably
> stale. Try a make clean.

I ran into it as well:
https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg00876.html

Wonder what makefile dependency we are missing that is resulting in 
stale .o files being picked up?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30
Posted by Peter Xu 6 years, 12 months ago
On Wed, Nov 07, 2018 at 11:21:32AM +0000, Peter Maydell wrote:
> On 7 November 2018 at 02:56, Peter Xu <peterx@redhat.com> wrote:
> > Strange, "make check -j8" failed on my hosts (I tried two) with either
> > Markus's pull tree or qemu master:
> >
> > hw/core/ptimer.o: In function `timer_new_tl':
> > /home/xz/git/qemu/include/qemu/timer.h:536: undefined reference to `timer_init_tl'
> > collect2: error: ld returned 1 exit status
> > make: *** [/home/xz/git/qemu/rules.mak:124: tests/ptimer-test] Error 1
> >
> > Is that only happening on my hosts?
> 
> Commit 89a603a0c80ae3 changed things so that there is no
> timer_new_tl() or timer_init_tl() any more, so if you have
> an object file that's referring to it then it's probably
> stale. Try a make clean.

Yeh it worked for me, thanks Peter.

Though after running a few more rounds of "configure --enable-debug &&
make check -j8" I still cannot see anything wrong with Markus's tree.
I'll see whether there's any news from Markus and then I'll consider
whether I should install a FreeBSD.

Regards,

-- 
Peter Xu

Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30
Posted by Peter Xu 6 years, 11 months ago
On Thu, Nov 08, 2018 at 10:44:06AM +0800, Peter Xu wrote:
> On Wed, Nov 07, 2018 at 11:21:32AM +0000, Peter Maydell wrote:
> > On 7 November 2018 at 02:56, Peter Xu <peterx@redhat.com> wrote:
> > > Strange, "make check -j8" failed on my hosts (I tried two) with either
> > > Markus's pull tree or qemu master:
> > >
> > > hw/core/ptimer.o: In function `timer_new_tl':
> > > /home/xz/git/qemu/include/qemu/timer.h:536: undefined reference to `timer_init_tl'
> > > collect2: error: ld returned 1 exit status
> > > make: *** [/home/xz/git/qemu/rules.mak:124: tests/ptimer-test] Error 1
> > >
> > > Is that only happening on my hosts?
> > 
> > Commit 89a603a0c80ae3 changed things so that there is no
> > timer_new_tl() or timer_init_tl() any more, so if you have
> > an object file that's referring to it then it's probably
> > stale. Try a make clean.
> 
> Yeh it worked for me, thanks Peter.
> 
> Though after running a few more rounds of "configure --enable-debug &&
> make check -j8" I still cannot see anything wrong with Markus's tree.
> I'll see whether there's any news from Markus and then I'll consider
> whether I should install a FreeBSD.

I reproduced the error with a FreeBSD guest and this change (which
possibly can be squashed into "tests: qmp-test: add queue full test")
worked for me:

diff --git a/tests/qmp-test.c b/tests/qmp-test.c
index 6989acbca4..83f353db4f 100644
--- a/tests/qmp-test.c
+++ b/tests/qmp-test.c
@@ -281,8 +281,15 @@ static void test_qmp_oob(void)
      * will only be able to be handled after the queue is shrinked, so
      * it'll be processed only after one existing in-band command
      * finishes.
+     *
+     * NOTE: we need to feed the queue with one extra request to make
+     * sure it'll be stuck since when we have sent the Nth request
+     * it's possible that we have already popped and processing the
+     * 1st request so the Nth request (which could potentially be the
+     * [N-1]th element on the queue) might not trigger the
+     * monitor-full condition deterministically.
      */
-    for (i = 1; i <= QMP_REQ_QUEUE_LEN_MAX; i++) {
+    for (i = 1; i <= QMP_REQ_QUEUE_LEN_MAX + 1; i++) {
         id = g_strdup_printf("queue-blocks-%d", i);
         send_cmd_that_blocks(qts, id);
         g_free(id);
@@ -291,7 +298,7 @@ static void test_qmp_oob(void)
     unblock_blocked_cmd();
     recv_cmd_id(qts, "queue-blocks-1");
     recv_cmd_id(qts, "oob-1");
-    for (i = 2; i <= QMP_REQ_QUEUE_LEN_MAX; i++) {
+    for (i = 2; i <= QMP_REQ_QUEUE_LEN_MAX + 1; i++) {
         unblock_blocked_cmd();
         id = g_strdup_printf("queue-blocks-%d", i);
         recv_cmd_id(qts, id);

So the problem here is that the queue-block-N command might not really
suspend the monitor everytime if we already popped the 1st request,
which will let the N-th request to be (N-1)th, then the parser will
continue to eat the oob command and it could "preempt" the previous
commands.

Maybe FreeBSD is scheduling the threads in some pattern so it happens
only on FreeBSD and very constantly, but anyway it should be a general
fix to the test program.

Regards,

-- 
Peter Xu

Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30
Posted by Peter Xu 6 years, 11 months ago
On Mon, Nov 19, 2018 at 02:17:27PM +0800, Peter Xu wrote:
> On Thu, Nov 08, 2018 at 10:44:06AM +0800, Peter Xu wrote:
> > On Wed, Nov 07, 2018 at 11:21:32AM +0000, Peter Maydell wrote:
> > > On 7 November 2018 at 02:56, Peter Xu <peterx@redhat.com> wrote:
> > > > Strange, "make check -j8" failed on my hosts (I tried two) with either
> > > > Markus's pull tree or qemu master:
> > > >
> > > > hw/core/ptimer.o: In function `timer_new_tl':
> > > > /home/xz/git/qemu/include/qemu/timer.h:536: undefined reference to `timer_init_tl'
> > > > collect2: error: ld returned 1 exit status
> > > > make: *** [/home/xz/git/qemu/rules.mak:124: tests/ptimer-test] Error 1
> > > >
> > > > Is that only happening on my hosts?
> > > 
> > > Commit 89a603a0c80ae3 changed things so that there is no
> > > timer_new_tl() or timer_init_tl() any more, so if you have
> > > an object file that's referring to it then it's probably
> > > stale. Try a make clean.
> > 
> > Yeh it worked for me, thanks Peter.
> > 
> > Though after running a few more rounds of "configure --enable-debug &&
> > make check -j8" I still cannot see anything wrong with Markus's tree.
> > I'll see whether there's any news from Markus and then I'll consider
> > whether I should install a FreeBSD.
> 
> I reproduced the error with a FreeBSD guest and this change (which
> possibly can be squashed into "tests: qmp-test: add queue full test")
> worked for me:
> 
> diff --git a/tests/qmp-test.c b/tests/qmp-test.c
> index 6989acbca4..83f353db4f 100644
> --- a/tests/qmp-test.c
> +++ b/tests/qmp-test.c
> @@ -281,8 +281,15 @@ static void test_qmp_oob(void)
>       * will only be able to be handled after the queue is shrinked, so
>       * it'll be processed only after one existing in-band command
>       * finishes.
> +     *
> +     * NOTE: we need to feed the queue with one extra request to make
> +     * sure it'll be stuck since when we have sent the Nth request
> +     * it's possible that we have already popped and processing the
> +     * 1st request so the Nth request (which could potentially be the
> +     * [N-1]th element on the queue) might not trigger the
> +     * monitor-full condition deterministically.
>       */
> -    for (i = 1; i <= QMP_REQ_QUEUE_LEN_MAX; i++) {
> +    for (i = 1; i <= QMP_REQ_QUEUE_LEN_MAX + 1; i++) {
>          id = g_strdup_printf("queue-blocks-%d", i);
>          send_cmd_that_blocks(qts, id);
>          g_free(id);
> @@ -291,7 +298,7 @@ static void test_qmp_oob(void)
>      unblock_blocked_cmd();
>      recv_cmd_id(qts, "queue-blocks-1");
>      recv_cmd_id(qts, "oob-1");
> -    for (i = 2; i <= QMP_REQ_QUEUE_LEN_MAX; i++) {
> +    for (i = 2; i <= QMP_REQ_QUEUE_LEN_MAX + 1; i++) {
>          unblock_blocked_cmd();
>          id = g_strdup_printf("queue-blocks-%d", i);
>          recv_cmd_id(qts, id);
> 
> So the problem here is that the queue-block-N command might not really
> suspend the monitor everytime if we already popped the 1st request,
> which will let the N-th request to be (N-1)th, then the parser will
> continue to eat the oob command and it could "preempt" the previous
> commands.
> 
> Maybe FreeBSD is scheduling the threads in some pattern so it happens
> only on FreeBSD and very constantly, but anyway it should be a general
> fix to the test program.

Markus, do you want me to repost a new version with this change?  Is
it still possible to have the oob-default series for 3.1?

-- 
Peter Xu

Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30
Posted by Markus Armbruster 6 years, 11 months ago
Peter Xu <peterx@redhat.com> writes:

> On Mon, Nov 19, 2018 at 02:17:27PM +0800, Peter Xu wrote:
>> I reproduced the error with a FreeBSD guest and this change (which
>> possibly can be squashed into "tests: qmp-test: add queue full test")
>> worked for me:
>> 
>> diff --git a/tests/qmp-test.c b/tests/qmp-test.c
>> index 6989acbca4..83f353db4f 100644
>> --- a/tests/qmp-test.c
>> +++ b/tests/qmp-test.c
>> @@ -281,8 +281,15 @@ static void test_qmp_oob(void)
>>       * will only be able to be handled after the queue is shrinked, so
>>       * it'll be processed only after one existing in-band command
>>       * finishes.
>> +     *
>> +     * NOTE: we need to feed the queue with one extra request to make
>> +     * sure it'll be stuck since when we have sent the Nth request
>> +     * it's possible that we have already popped and processing the
>> +     * 1st request so the Nth request (which could potentially be the
>> +     * [N-1]th element on the queue) might not trigger the
>> +     * monitor-full condition deterministically.
>>       */
>> -    for (i = 1; i <= QMP_REQ_QUEUE_LEN_MAX; i++) {
>> +    for (i = 1; i <= QMP_REQ_QUEUE_LEN_MAX + 1; i++) {
>>          id = g_strdup_printf("queue-blocks-%d", i);
>>          send_cmd_that_blocks(qts, id);
>>          g_free(id);
>> @@ -291,7 +298,7 @@ static void test_qmp_oob(void)
>>      unblock_blocked_cmd();
>>      recv_cmd_id(qts, "queue-blocks-1");
>>      recv_cmd_id(qts, "oob-1");
>> -    for (i = 2; i <= QMP_REQ_QUEUE_LEN_MAX; i++) {
>> +    for (i = 2; i <= QMP_REQ_QUEUE_LEN_MAX + 1; i++) {
>>          unblock_blocked_cmd();
>>          id = g_strdup_printf("queue-blocks-%d", i);
>>          recv_cmd_id(qts, id);
>> 
>> So the problem here is that the queue-block-N command might not really
>> suspend the monitor everytime if we already popped the 1st request,
>> which will let the N-th request to be (N-1)th, then the parser will
>> continue to eat the oob command and it could "preempt" the previous
>> commands.
>> 
>> Maybe FreeBSD is scheduling the threads in some pattern so it happens
>> only on FreeBSD and very constantly, but anyway it should be a general
>> fix to the test program.

I feel particularly dense right now, and need a more detailed analysis
to understand.  Let me try.

Here's what the test does before your fix

    Send QMP_REQ_QUEUE_LEN_MAX "blocking" in-band commands
    queue-blocks-1, queue-blocks-2, ...  "Blocking" means the command
    blocks until we explicitly unblock it.

    Send an out-of-band command oob-1.

    Unblock queue-blocks-1.

    Receive response to queue-blocks-1.

    Receive response to oob-1.

    Unblock queue-blocks-2, receive response
    ...

Here's what test test expects QEMU to do:

    In I/O thread:

        Receive and queue queue-blocks-1, ...

        Queue is full, stop receiving commands.

    In main thread:

        Dequeue queue-blocks-1.

        Execute queue-blocks-1, block for a while, finish and send
        response.

    In I/O thread:

        Queue is no longer full, resume receiving commands.

        Receive and execute oob-1, send response

    In main thread:

        Dequeue queue-blocks-2, execute, block for a while, finish and
        send response.
        ...

No good because the two threads race.  On my Linux box the test passes.
Let's have a closer look how it does that.  The appended patch that adds
another tracepoint.  With tracepoints monitor_* and handle_qmp_command
enabled, I get:

    19271@1542641581.372890:handle_qmp_command mon 0x563e19be4e80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-1", "driver": "blkdebug", "config": "/tmp/qmp-test-EZroW3/fifo"}}

I/O thread receives and queues queue-blocks-1.

    19271@1542641581.372954:monitor_qmp_cmd_in_band queue-blocks-1

Main thread dequeues qemu-blocks-1 and starts executing it.

We already lost: the queue won't actually fill up.  The test passes
anyway, but it doesn't actually test "queue full".  But why does it
pass?  Let's see.

    19271@1542641581.373211:monitor_qmp_respond queue-blocks-1

Main thread sends response to queue-blocks-1.  This means queue-blocks-1
has been unblocked already.  This is possible, because qmp-test sends
all the commands and then immediately unblocks queue-blocks-1, and the
main thread can win the race with the I/O thread.

    19271@1542641581.373826:handle_qmp_command mon 0x563e19be4e80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-2", "driver": "blkdebug", "config": "/tmp/qmp-test-EZroW3/fifo"}}
    19271@1542641581.373864:monitor_qmp_cmd_in_band queue-blocks-2
    19271@1542641581.374575:handle_qmp_command mon 0x563e19be4e80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-3", "driver": "blkdebug", "config": "/tmp/qmp-test-EZroW3/fifo"}}
    19271@1542641581.375344:handle_qmp_command mon 0x563e19be4e80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-4", "driver": "blkdebug", "config": "/tmp/qmp-test-EZroW3/fifo"}}
    19271@1542641581.376162:handle_qmp_command mon 0x563e19be4e80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-5", "driver": "blkdebug", "config": "/tmp/qmp-test-EZroW3/fifo"}}
    19271@1542641581.376996:handle_qmp_command mon 0x563e19be4e80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-6", "driver": "blkdebug", "config": "/tmp/qmp-test-EZroW3/fifo"}}
    19271@1542641581.377790:handle_qmp_command mon 0x563e19be4e80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-7", "driver": "blkdebug", "config": "/tmp/qmp-test-EZroW3/fifo"}}
    19271@1542641581.378562:handle_qmp_command mon 0x563e19be4e80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-8", "driver": "blkdebug", "config": "/tmp/qmp-test-EZroW3/fifo"}}

I/O thread receives and queues queue-blocks-2, ... queue-blocks-8.

    19271@1542641581.378796:handle_qmp_command mon 0x563e19be4e80 req: {"exec-oob": "migrate-pause"}
    19271@1542641581.378800:monitor_qmp_cmd_out_of_band oob-1
    19271@1542641581.378823:monitor_qmp_respond oob-1

I/O thread receives, executes and responds to oob-1.

    19271@1542641581.379197:monitor_qmp_respond queue-blocks-2
    19271@1542641581.379228:monitor_qmp_cmd_in_band queue-blocks-3
    19271@1542641581.379435:monitor_qmp_respond queue-blocks-3
    19271@1542641581.379477:monitor_qmp_cmd_in_band queue-blocks-4
    19271@1542641581.379597:monitor_qmp_respond queue-blocks-4
    19271@1542641581.379621:monitor_qmp_cmd_in_band queue-blocks-5
    19271@1542641581.379800:monitor_qmp_respond queue-blocks-5
    19271@1542641581.379824:monitor_qmp_cmd_in_band queue-blocks-6
    19271@1542641581.379953:monitor_qmp_respond queue-blocks-6
    19271@1542641581.379992:monitor_qmp_cmd_in_band queue-blocks-7
    19271@1542641581.380132:monitor_qmp_respond queue-blocks-7
    19271@1542641581.380170:monitor_qmp_cmd_in_band queue-blocks-8
    19271@1542641581.380283:monitor_qmp_respond queue-blocks-8

Main thread dequeues, executes and responds to queue-blocks-2,
... queue-blocks-8.

If I make the main thread lose the race by sticking a sleep(1) before
the unblock_blocked_cmd() that unblocks queue-blocks-1, I get:

    22762@1542641707.818838:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-1", "driver": "blkdebug", "config": "/tmp/qmp-test-PqO2uz/fifo"}}

I/O thread receives and queues queue-blocks-1.

    22762@1542641707.818856:monitor_qmp_cmd_in_band queue-blocks-1

Main thread dequeues qemu-blocks-1 and starts executing it.  We already
lost.

    22762@1542641707.819585:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-2", "driver": "blkdebug", "config": "/tmp/qmp-test-PqO2uz/fifo"}}
    22762@1542641707.820331:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-3", "driver": "blkdebug", "config": "/tmp/qmp-test-PqO2uz/fifo"}}
    22762@1542641707.821072:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-4", "driver": "blkdebug", "config": "/tmp/qmp-test-PqO2uz/fifo"}}
    22762@1542641707.821821:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-5", "driver": "blkdebug", "config": "/tmp/qmp-test-PqO2uz/fifo"}}
    22762@1542641707.822569:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-6", "driver": "blkdebug", "config": "/tmp/qmp-test-PqO2uz/fifo"}}
    22762@1542641707.823314:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-7", "driver": "blkdebug", "config": "/tmp/qmp-test-PqO2uz/fifo"}}
    22762@1542641707.824069:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-8", "driver": "blkdebug", "config": "/tmp/qmp-test-PqO2uz/fifo"}}

I/O thread receives and queues queue-blocks-2, ... queue-blocks-8.

    22762@1542641707.824253:handle_qmp_command mon 0x55f5b6a5de80 req: {"exec-oob": "migrate-pause"}
    22762@1542641707.824257:monitor_qmp_cmd_out_of_band oob-1
    22762@1542641707.824282:monitor_qmp_respond oob-1

I/O thread receives, executes and responds to oob-1.

Test is going to fail: response to oob-1 sent before response to
queue-blocks-1.

Now let me try your fix.  Here's what I see:

    16585@1542650042.980953:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-1", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
    16585@1542650042.981018:monitor_qmp_cmd_in_band queue-blocks-1
    16585@1542650042.981252:monitor_qmp_respond queue-blocks-1
    16585@1542650042.981838:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-2", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
    16585@1542650042.981916:monitor_qmp_cmd_in_band queue-blocks-2
    16585@1542650042.982656:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-3", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
    16585@1542650042.983406:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-4", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
    16585@1542650042.984170:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-5", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
    16585@1542650042.984975:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-6", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
    16585@1542650042.985778:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-7", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
    16585@1542650042.986565:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-8", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
    16585@1542650042.987396:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-9", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
    16585@1542650042.987626:handle_qmp_command mon 0x5597b8e0de80 req: {"exec-oob": "migrate-pause"}
    16585@1542650042.987630:monitor_qmp_cmd_out_of_band oob-1
    16585@1542650042.987676:monitor_qmp_respond oob-1
    16585@1542650042.988035:monitor_qmp_respond queue-blocks-2
    16585@1542650042.988066:monitor_qmp_cmd_in_band queue-blocks-3
    16585@1542650042.988329:monitor_qmp_respond queue-blocks-3
    16585@1542650042.988360:monitor_qmp_cmd_in_band queue-blocks-4
    16585@1542650042.988575:monitor_qmp_respond queue-blocks-4
    16585@1542650042.988616:monitor_qmp_cmd_in_band queue-blocks-5
    16585@1542650042.988803:monitor_qmp_respond queue-blocks-5
    16585@1542650042.988826:monitor_qmp_cmd_in_band queue-blocks-6
    16585@1542650042.988977:monitor_qmp_respond queue-blocks-6
    16585@1542650042.989001:monitor_qmp_cmd_in_band queue-blocks-7
    16585@1542650042.989223:monitor_qmp_respond queue-blocks-7
    16585@1542650042.989279:monitor_qmp_cmd_in_band queue-blocks-8
    16585@1542650042.989447:monitor_qmp_respond queue-blocks-8
    16585@1542650042.989469:monitor_qmp_cmd_in_band queue-blocks-9
    16585@1542650042.989582:monitor_qmp_respond queue-blocks-9

Tracepoint monitor_suspend still not hit.  This means the queue does not
fill up.

> Markus, do you want me to repost a new version with this change?  Is
> it still possible to have the oob-default series for 3.1?

We're trying to, but we need to get the test to work.

Two problems:

1. The test doesn't seem to succed at testing "queue full".

2. The test might still be racy.

I suspect the code we test is solid, and it's "only" the test we screwed
up.


diff --git a/monitor.c b/monitor.c
index 431795d686..52112889ef 100644
--- a/monitor.c
+++ b/monitor.c
@@ -4064,6 +4064,7 @@ static int monitor_can_read(void *opaque)
  */
 static void monitor_qmp_respond(Monitor *mon, QDict *rsp, QObject *id)
 {
+    trace_monitor_qmp_respond(qobject_get_try_str(id) ?: "");
     if (rsp) {
         if (id) {
             qdict_put_obj(rsp, "id", qobject_ref(id));
diff --git a/trace-events b/trace-events
index 4fd2cb4b97..57c5f6cc94 100644
--- a/trace-events
+++ b/trace-events
@@ -50,6 +50,7 @@ handle_qmp_command(void *mon, const char *req) "mon %p req: %s"
 monitor_suspend(void *ptr, int cnt) "mon %p: %d"
 monitor_qmp_cmd_in_band(const char *id) "%s"
 monitor_qmp_cmd_out_of_band(const char *id) "%s"
+monitor_qmp_respond(const char *id) "%s"
 
 # dma-helpers.c
 dma_blk_io(void *dbs, void *bs, int64_t offset, bool to_dev) "dbs=%p bs=%p offset=%" PRId64 " to_dev=%d"

Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30
Posted by Peter Xu 6 years, 11 months ago
On Mon, Nov 19, 2018 at 07:08:13PM +0100, Markus Armbruster wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Mon, Nov 19, 2018 at 02:17:27PM +0800, Peter Xu wrote:
> >> I reproduced the error with a FreeBSD guest and this change (which
> >> possibly can be squashed into "tests: qmp-test: add queue full test")
> >> worked for me:
> >> 
> >> diff --git a/tests/qmp-test.c b/tests/qmp-test.c
> >> index 6989acbca4..83f353db4f 100644
> >> --- a/tests/qmp-test.c
> >> +++ b/tests/qmp-test.c
> >> @@ -281,8 +281,15 @@ static void test_qmp_oob(void)
> >>       * will only be able to be handled after the queue is shrinked, so
> >>       * it'll be processed only after one existing in-band command
> >>       * finishes.
> >> +     *
> >> +     * NOTE: we need to feed the queue with one extra request to make
> >> +     * sure it'll be stuck since when we have sent the Nth request
> >> +     * it's possible that we have already popped and processing the
> >> +     * 1st request so the Nth request (which could potentially be the
> >> +     * [N-1]th element on the queue) might not trigger the
> >> +     * monitor-full condition deterministically.
> >>       */
> >> -    for (i = 1; i <= QMP_REQ_QUEUE_LEN_MAX; i++) {
> >> +    for (i = 1; i <= QMP_REQ_QUEUE_LEN_MAX + 1; i++) {
> >>          id = g_strdup_printf("queue-blocks-%d", i);
> >>          send_cmd_that_blocks(qts, id);
> >>          g_free(id);
> >> @@ -291,7 +298,7 @@ static void test_qmp_oob(void)
> >>      unblock_blocked_cmd();
> >>      recv_cmd_id(qts, "queue-blocks-1");
> >>      recv_cmd_id(qts, "oob-1");
> >> -    for (i = 2; i <= QMP_REQ_QUEUE_LEN_MAX; i++) {
> >> +    for (i = 2; i <= QMP_REQ_QUEUE_LEN_MAX + 1; i++) {
> >>          unblock_blocked_cmd();
> >>          id = g_strdup_printf("queue-blocks-%d", i);
> >>          recv_cmd_id(qts, id);
> >> 
> >> So the problem here is that the queue-block-N command might not really
> >> suspend the monitor everytime if we already popped the 1st request,
> >> which will let the N-th request to be (N-1)th, then the parser will
> >> continue to eat the oob command and it could "preempt" the previous
> >> commands.
> >> 
> >> Maybe FreeBSD is scheduling the threads in some pattern so it happens
> >> only on FreeBSD and very constantly, but anyway it should be a general
> >> fix to the test program.
> 
> I feel particularly dense right now, and need a more detailed analysis
> to understand.  Let me try.
> 
> Here's what the test does before your fix
> 
>     Send QMP_REQ_QUEUE_LEN_MAX "blocking" in-band commands
>     queue-blocks-1, queue-blocks-2, ...  "Blocking" means the command
>     blocks until we explicitly unblock it.
> 
>     Send an out-of-band command oob-1.
> 
>     Unblock queue-blocks-1.
> 
>     Receive response to queue-blocks-1.
> 
>     Receive response to oob-1.
> 
>     Unblock queue-blocks-2, receive response
>     ...
> 
> Here's what test test expects QEMU to do:
> 
>     In I/O thread:
> 
>         Receive and queue queue-blocks-1, ...
> 
>         Queue is full, stop receiving commands.
> 
>     In main thread:
> 
>         Dequeue queue-blocks-1.
> 
>         Execute queue-blocks-1, block for a while, finish and send
>         response.
> 
>     In I/O thread:
> 
>         Queue is no longer full, resume receiving commands.
> 
>         Receive and execute oob-1, send response
> 
>     In main thread:
> 
>         Dequeue queue-blocks-2, execute, block for a while, finish and
>         send response.
>         ...
> 
> No good because the two threads race.  On my Linux box the test passes.
> Let's have a closer look how it does that.  The appended patch that adds
> another tracepoint.  With tracepoints monitor_* and handle_qmp_command
> enabled, I get:
> 
>     19271@1542641581.372890:handle_qmp_command mon 0x563e19be4e80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-1", "driver": "blkdebug", "config": "/tmp/qmp-test-EZroW3/fifo"}}
> 
> I/O thread receives and queues queue-blocks-1.
> 
>     19271@1542641581.372954:monitor_qmp_cmd_in_band queue-blocks-1
> 
> Main thread dequeues qemu-blocks-1 and starts executing it.
> 
> We already lost: the queue won't actually fill up.  The test passes
> anyway, but it doesn't actually test "queue full".  But why does it
> pass?  Let's see.
> 
>     19271@1542641581.373211:monitor_qmp_respond queue-blocks-1
> 
> Main thread sends response to queue-blocks-1.  This means queue-blocks-1
> has been unblocked already.  This is possible, because qmp-test sends
> all the commands and then immediately unblocks queue-blocks-1, and the
> main thread can win the race with the I/O thread.
> 
>     19271@1542641581.373826:handle_qmp_command mon 0x563e19be4e80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-2", "driver": "blkdebug", "config": "/tmp/qmp-test-EZroW3/fifo"}}
>     19271@1542641581.373864:monitor_qmp_cmd_in_band queue-blocks-2
>     19271@1542641581.374575:handle_qmp_command mon 0x563e19be4e80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-3", "driver": "blkdebug", "config": "/tmp/qmp-test-EZroW3/fifo"}}
>     19271@1542641581.375344:handle_qmp_command mon 0x563e19be4e80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-4", "driver": "blkdebug", "config": "/tmp/qmp-test-EZroW3/fifo"}}
>     19271@1542641581.376162:handle_qmp_command mon 0x563e19be4e80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-5", "driver": "blkdebug", "config": "/tmp/qmp-test-EZroW3/fifo"}}
>     19271@1542641581.376996:handle_qmp_command mon 0x563e19be4e80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-6", "driver": "blkdebug", "config": "/tmp/qmp-test-EZroW3/fifo"}}
>     19271@1542641581.377790:handle_qmp_command mon 0x563e19be4e80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-7", "driver": "blkdebug", "config": "/tmp/qmp-test-EZroW3/fifo"}}
>     19271@1542641581.378562:handle_qmp_command mon 0x563e19be4e80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-8", "driver": "blkdebug", "config": "/tmp/qmp-test-EZroW3/fifo"}}
> 
> I/O thread receives and queues queue-blocks-2, ... queue-blocks-8.
> 
>     19271@1542641581.378796:handle_qmp_command mon 0x563e19be4e80 req: {"exec-oob": "migrate-pause"}
>     19271@1542641581.378800:monitor_qmp_cmd_out_of_band oob-1
>     19271@1542641581.378823:monitor_qmp_respond oob-1
> 
> I/O thread receives, executes and responds to oob-1.
> 
>     19271@1542641581.379197:monitor_qmp_respond queue-blocks-2
>     19271@1542641581.379228:monitor_qmp_cmd_in_band queue-blocks-3
>     19271@1542641581.379435:monitor_qmp_respond queue-blocks-3
>     19271@1542641581.379477:monitor_qmp_cmd_in_band queue-blocks-4
>     19271@1542641581.379597:monitor_qmp_respond queue-blocks-4
>     19271@1542641581.379621:monitor_qmp_cmd_in_band queue-blocks-5
>     19271@1542641581.379800:monitor_qmp_respond queue-blocks-5
>     19271@1542641581.379824:monitor_qmp_cmd_in_band queue-blocks-6
>     19271@1542641581.379953:monitor_qmp_respond queue-blocks-6
>     19271@1542641581.379992:monitor_qmp_cmd_in_band queue-blocks-7
>     19271@1542641581.380132:monitor_qmp_respond queue-blocks-7
>     19271@1542641581.380170:monitor_qmp_cmd_in_band queue-blocks-8
>     19271@1542641581.380283:monitor_qmp_respond queue-blocks-8
> 
> Main thread dequeues, executes and responds to queue-blocks-2,
> ... queue-blocks-8.
> 
> If I make the main thread lose the race by sticking a sleep(1) before
> the unblock_blocked_cmd() that unblocks queue-blocks-1, I get:
> 
>     22762@1542641707.818838:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-1", "driver": "blkdebug", "config": "/tmp/qmp-test-PqO2uz/fifo"}}
> 
> I/O thread receives and queues queue-blocks-1.
> 
>     22762@1542641707.818856:monitor_qmp_cmd_in_band queue-blocks-1
> 
> Main thread dequeues qemu-blocks-1 and starts executing it.  We already
> lost.
> 
>     22762@1542641707.819585:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-2", "driver": "blkdebug", "config": "/tmp/qmp-test-PqO2uz/fifo"}}
>     22762@1542641707.820331:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-3", "driver": "blkdebug", "config": "/tmp/qmp-test-PqO2uz/fifo"}}
>     22762@1542641707.821072:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-4", "driver": "blkdebug", "config": "/tmp/qmp-test-PqO2uz/fifo"}}
>     22762@1542641707.821821:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-5", "driver": "blkdebug", "config": "/tmp/qmp-test-PqO2uz/fifo"}}
>     22762@1542641707.822569:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-6", "driver": "blkdebug", "config": "/tmp/qmp-test-PqO2uz/fifo"}}
>     22762@1542641707.823314:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-7", "driver": "blkdebug", "config": "/tmp/qmp-test-PqO2uz/fifo"}}
>     22762@1542641707.824069:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-8", "driver": "blkdebug", "config": "/tmp/qmp-test-PqO2uz/fifo"}}
> 
> I/O thread receives and queues queue-blocks-2, ... queue-blocks-8.
> 
>     22762@1542641707.824253:handle_qmp_command mon 0x55f5b6a5de80 req: {"exec-oob": "migrate-pause"}
>     22762@1542641707.824257:monitor_qmp_cmd_out_of_band oob-1
>     22762@1542641707.824282:monitor_qmp_respond oob-1
> 
> I/O thread receives, executes and responds to oob-1.
> 
> Test is going to fail: response to oob-1 sent before response to
> queue-blocks-1.
> 
> Now let me try your fix.  Here's what I see:
> 
>     16585@1542650042.980953:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-1", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
>     16585@1542650042.981018:monitor_qmp_cmd_in_band queue-blocks-1
>     16585@1542650042.981252:monitor_qmp_respond queue-blocks-1
>     16585@1542650042.981838:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-2", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
>     16585@1542650042.981916:monitor_qmp_cmd_in_band queue-blocks-2
>     16585@1542650042.982656:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-3", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
>     16585@1542650042.983406:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-4", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
>     16585@1542650042.984170:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-5", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
>     16585@1542650042.984975:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-6", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
>     16585@1542650042.985778:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-7", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
>     16585@1542650042.986565:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-8", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
>     16585@1542650042.987396:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-9", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
>     16585@1542650042.987626:handle_qmp_command mon 0x5597b8e0de80 req: {"exec-oob": "migrate-pause"}
>     16585@1542650042.987630:monitor_qmp_cmd_out_of_band oob-1
>     16585@1542650042.987676:monitor_qmp_respond oob-1
>     16585@1542650042.988035:monitor_qmp_respond queue-blocks-2
>     16585@1542650042.988066:monitor_qmp_cmd_in_band queue-blocks-3
>     16585@1542650042.988329:monitor_qmp_respond queue-blocks-3
>     16585@1542650042.988360:monitor_qmp_cmd_in_band queue-blocks-4
>     16585@1542650042.988575:monitor_qmp_respond queue-blocks-4
>     16585@1542650042.988616:monitor_qmp_cmd_in_band queue-blocks-5
>     16585@1542650042.988803:monitor_qmp_respond queue-blocks-5
>     16585@1542650042.988826:monitor_qmp_cmd_in_band queue-blocks-6
>     16585@1542650042.988977:monitor_qmp_respond queue-blocks-6
>     16585@1542650042.989001:monitor_qmp_cmd_in_band queue-blocks-7
>     16585@1542650042.989223:monitor_qmp_respond queue-blocks-7
>     16585@1542650042.989279:monitor_qmp_cmd_in_band queue-blocks-8
>     16585@1542650042.989447:monitor_qmp_respond queue-blocks-8
>     16585@1542650042.989469:monitor_qmp_cmd_in_band queue-blocks-9
>     16585@1542650042.989582:monitor_qmp_respond queue-blocks-9
> 
> Tracepoint monitor_suspend still not hit.  This means the queue does not
> fill up.

You are right.  I think I figured out one thing but I missed another
point here that "sent the command" does not mean "received by the QMP
server": we can never identify when the queue is full, meanwhile
that's what we need to write a unit test for it.

For example: in this test what we want to achieve is:

  for i in 1 .. N:
      send blocking commands Bi
  ... wait until queue is full ...  [1]
  send oob command O1
  unblock blocking command B1
  receive response B1
  receive response O1
  for i in 2 .. N:
      unblock blocking command Bi
      receive response Bi

All looks sane except the assumption at [1].  In the code we didn't do
anything for [1] assuming that it's naturally happening.  However it's
not since when finishing the first for loop we only queued the
commands into the send buffer, but we can't guarantee that the
commands are well received and further queued in the qmp request list
on the server side.  Now when the server qmp request queue gets full
it will only suspend the monitor, and that event is totally unseen
from the client POV, so IMHO we don't have a good way to guarantee
this.  E.g., we can do a very long sleep(100) at [1] to pretty make
sure the queue full is happened, however it might still break on a
super slow machine.

In other words, I don't see a good way to automate the testing of
queue full for now (but I'm pretty sure the queue full is working
there since if you really insert a sleep you'll definitely see the
monitor_suspend tracepoint).  We might need something like an event to
make sure the queue full happened and then the client can detect this
and insert the next out-of-band command, but then we're going backword
and I don't see much point to introduce a new event only for testing
purpose...

> 
> > Markus, do you want me to repost a new version with this change?  Is
> > it still possible to have the oob-default series for 3.1?
> 
> We're trying to, but we need to get the test to work.
> 
> Two problems:
> 
> 1. The test doesn't seem to succed at testing "queue full".
> 
> 2. The test might still be racy.
> 
> I suspect the code we test is solid, and it's "only" the test we screwed
> up.

I agree.  Do you have better idea to test queue full?  I'd be more
than glad to try any suggestions here, otherwise I am thinking whether
we can just simply drop the queue full test (which is the last patch
of the oob series) for now but merge the rest.  We can introduce new
queue full test after 3.1 if we want, though again I really doubt
whether we can without new things introduced into the procedure.

Thoughts?

> 
> 
> diff --git a/monitor.c b/monitor.c
> index 431795d686..52112889ef 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -4064,6 +4064,7 @@ static int monitor_can_read(void *opaque)
>   */
>  static void monitor_qmp_respond(Monitor *mon, QDict *rsp, QObject *id)
>  {
> +    trace_monitor_qmp_respond(qobject_get_try_str(id) ?: "");
>      if (rsp) {
>          if (id) {
>              qdict_put_obj(rsp, "id", qobject_ref(id));
> diff --git a/trace-events b/trace-events
> index 4fd2cb4b97..57c5f6cc94 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -50,6 +50,7 @@ handle_qmp_command(void *mon, const char *req) "mon %p req: %s"
>  monitor_suspend(void *ptr, int cnt) "mon %p: %d"
>  monitor_qmp_cmd_in_band(const char *id) "%s"
>  monitor_qmp_cmd_out_of_band(const char *id) "%s"
> +monitor_qmp_respond(const char *id) "%s"
>  
>  # dma-helpers.c
>  dma_blk_io(void *dbs, void *bs, int64_t offset, bool to_dev) "dbs=%p bs=%p offset=%" PRId64 " to_dev=%d"

Regards,

-- 
Peter Xu

Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30
Posted by Markus Armbruster 6 years, 11 months ago
Peter Xu <peterx@redhat.com> writes:

> On Mon, Nov 19, 2018 at 07:08:13PM +0100, Markus Armbruster wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Mon, Nov 19, 2018 at 02:17:27PM +0800, Peter Xu wrote:
>> >> I reproduced the error with a FreeBSD guest and this change (which
>> >> possibly can be squashed into "tests: qmp-test: add queue full test")
>> >> worked for me:
>> >> 
>> >> diff --git a/tests/qmp-test.c b/tests/qmp-test.c
>> >> index 6989acbca4..83f353db4f 100644
>> >> --- a/tests/qmp-test.c
>> >> +++ b/tests/qmp-test.c
>> >> @@ -281,8 +281,15 @@ static void test_qmp_oob(void)
>> >>       * will only be able to be handled after the queue is shrinked, so
>> >>       * it'll be processed only after one existing in-band command
>> >>       * finishes.
>> >> +     *
>> >> +     * NOTE: we need to feed the queue with one extra request to make
>> >> +     * sure it'll be stuck since when we have sent the Nth request
>> >> +     * it's possible that we have already popped and processing the
>> >> +     * 1st request so the Nth request (which could potentially be the
>> >> +     * [N-1]th element on the queue) might not trigger the
>> >> +     * monitor-full condition deterministically.
>> >>       */
>> >> -    for (i = 1; i <= QMP_REQ_QUEUE_LEN_MAX; i++) {
>> >> +    for (i = 1; i <= QMP_REQ_QUEUE_LEN_MAX + 1; i++) {
>> >>          id = g_strdup_printf("queue-blocks-%d", i);
>> >>          send_cmd_that_blocks(qts, id);
>> >>          g_free(id);
>> >> @@ -291,7 +298,7 @@ static void test_qmp_oob(void)
>> >>      unblock_blocked_cmd();
>> >>      recv_cmd_id(qts, "queue-blocks-1");
>> >>      recv_cmd_id(qts, "oob-1");
>> >> -    for (i = 2; i <= QMP_REQ_QUEUE_LEN_MAX; i++) {
>> >> +    for (i = 2; i <= QMP_REQ_QUEUE_LEN_MAX + 1; i++) {
>> >>          unblock_blocked_cmd();
>> >>          id = g_strdup_printf("queue-blocks-%d", i);
>> >>          recv_cmd_id(qts, id);
>> >> 
>> >> So the problem here is that the queue-block-N command might not really
>> >> suspend the monitor everytime if we already popped the 1st request,
>> >> which will let the N-th request to be (N-1)th, then the parser will
>> >> continue to eat the oob command and it could "preempt" the previous
>> >> commands.
>> >> 
>> >> Maybe FreeBSD is scheduling the threads in some pattern so it happens
>> >> only on FreeBSD and very constantly, but anyway it should be a general
>> >> fix to the test program.
>> 
>> I feel particularly dense right now, and need a more detailed analysis
>> to understand.  Let me try.
>> 
>> Here's what the test does before your fix
>> 
>>     Send QMP_REQ_QUEUE_LEN_MAX "blocking" in-band commands
>>     queue-blocks-1, queue-blocks-2, ...  "Blocking" means the command
>>     blocks until we explicitly unblock it.
>> 
>>     Send an out-of-band command oob-1.
>> 
>>     Unblock queue-blocks-1.
>> 
>>     Receive response to queue-blocks-1.
>> 
>>     Receive response to oob-1.
>> 
>>     Unblock queue-blocks-2, receive response
>>     ...
>> 
>> Here's what test test expects QEMU to do:
>> 
>>     In I/O thread:
>> 
>>         Receive and queue queue-blocks-1, ...
>> 
>>         Queue is full, stop receiving commands.
>> 
>>     In main thread:
>> 
>>         Dequeue queue-blocks-1.
>> 
>>         Execute queue-blocks-1, block for a while, finish and send
>>         response.
>> 
>>     In I/O thread:
>> 
>>         Queue is no longer full, resume receiving commands.
>> 
>>         Receive and execute oob-1, send response
>> 
>>     In main thread:
>> 
>>         Dequeue queue-blocks-2, execute, block for a while, finish and
>>         send response.
>>         ...
>> 
>> No good because the two threads race.  On my Linux box the test passes.
>> Let's have a closer look how it does that.  The appended patch that adds
>> another tracepoint.  With tracepoints monitor_* and handle_qmp_command
>> enabled, I get:
>> 
>>     19271@1542641581.372890:handle_qmp_command mon 0x563e19be4e80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-1", "driver": "blkdebug", "config": "/tmp/qmp-test-EZroW3/fifo"}}
>> 
>> I/O thread receives and queues queue-blocks-1.
>> 
>>     19271@1542641581.372954:monitor_qmp_cmd_in_band queue-blocks-1
>> 
>> Main thread dequeues qemu-blocks-1 and starts executing it.
>> 
>> We already lost: the queue won't actually fill up.  The test passes
>> anyway, but it doesn't actually test "queue full".  But why does it
>> pass?  Let's see.
>> 
>>     19271@1542641581.373211:monitor_qmp_respond queue-blocks-1
>> 
>> Main thread sends response to queue-blocks-1.  This means queue-blocks-1
>> has been unblocked already.  This is possible, because qmp-test sends
>> all the commands and then immediately unblocks queue-blocks-1, and the
>> main thread can win the race with the I/O thread.
>> 
>>     19271@1542641581.373826:handle_qmp_command mon 0x563e19be4e80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-2", "driver": "blkdebug", "config": "/tmp/qmp-test-EZroW3/fifo"}}
>>     19271@1542641581.373864:monitor_qmp_cmd_in_band queue-blocks-2
>>     19271@1542641581.374575:handle_qmp_command mon 0x563e19be4e80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-3", "driver": "blkdebug", "config": "/tmp/qmp-test-EZroW3/fifo"}}
>>     19271@1542641581.375344:handle_qmp_command mon 0x563e19be4e80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-4", "driver": "blkdebug", "config": "/tmp/qmp-test-EZroW3/fifo"}}
>>     19271@1542641581.376162:handle_qmp_command mon 0x563e19be4e80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-5", "driver": "blkdebug", "config": "/tmp/qmp-test-EZroW3/fifo"}}
>>     19271@1542641581.376996:handle_qmp_command mon 0x563e19be4e80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-6", "driver": "blkdebug", "config": "/tmp/qmp-test-EZroW3/fifo"}}
>>     19271@1542641581.377790:handle_qmp_command mon 0x563e19be4e80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-7", "driver": "blkdebug", "config": "/tmp/qmp-test-EZroW3/fifo"}}
>>     19271@1542641581.378562:handle_qmp_command mon 0x563e19be4e80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-8", "driver": "blkdebug", "config": "/tmp/qmp-test-EZroW3/fifo"}}
>> 
>> I/O thread receives and queues queue-blocks-2, ... queue-blocks-8.
>> 
>>     19271@1542641581.378796:handle_qmp_command mon 0x563e19be4e80 req: {"exec-oob": "migrate-pause"}
>>     19271@1542641581.378800:monitor_qmp_cmd_out_of_band oob-1
>>     19271@1542641581.378823:monitor_qmp_respond oob-1
>> 
>> I/O thread receives, executes and responds to oob-1.
>> 
>>     19271@1542641581.379197:monitor_qmp_respond queue-blocks-2
>>     19271@1542641581.379228:monitor_qmp_cmd_in_band queue-blocks-3
>>     19271@1542641581.379435:monitor_qmp_respond queue-blocks-3
>>     19271@1542641581.379477:monitor_qmp_cmd_in_band queue-blocks-4
>>     19271@1542641581.379597:monitor_qmp_respond queue-blocks-4
>>     19271@1542641581.379621:monitor_qmp_cmd_in_band queue-blocks-5
>>     19271@1542641581.379800:monitor_qmp_respond queue-blocks-5
>>     19271@1542641581.379824:monitor_qmp_cmd_in_band queue-blocks-6
>>     19271@1542641581.379953:monitor_qmp_respond queue-blocks-6
>>     19271@1542641581.379992:monitor_qmp_cmd_in_band queue-blocks-7
>>     19271@1542641581.380132:monitor_qmp_respond queue-blocks-7
>>     19271@1542641581.380170:monitor_qmp_cmd_in_band queue-blocks-8
>>     19271@1542641581.380283:monitor_qmp_respond queue-blocks-8
>> 
>> Main thread dequeues, executes and responds to queue-blocks-2,
>> ... queue-blocks-8.
>> 
>> If I make the main thread lose the race by sticking a sleep(1) before
>> the unblock_blocked_cmd() that unblocks queue-blocks-1, I get:
>> 
>>     22762@1542641707.818838:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-1", "driver": "blkdebug", "config": "/tmp/qmp-test-PqO2uz/fifo"}}
>> 
>> I/O thread receives and queues queue-blocks-1.
>> 
>>     22762@1542641707.818856:monitor_qmp_cmd_in_band queue-blocks-1
>> 
>> Main thread dequeues qemu-blocks-1 and starts executing it.  We already
>> lost.
>> 
>>     22762@1542641707.819585:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-2", "driver": "blkdebug", "config": "/tmp/qmp-test-PqO2uz/fifo"}}
>>     22762@1542641707.820331:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-3", "driver": "blkdebug", "config": "/tmp/qmp-test-PqO2uz/fifo"}}
>>     22762@1542641707.821072:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-4", "driver": "blkdebug", "config": "/tmp/qmp-test-PqO2uz/fifo"}}
>>     22762@1542641707.821821:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-5", "driver": "blkdebug", "config": "/tmp/qmp-test-PqO2uz/fifo"}}
>>     22762@1542641707.822569:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-6", "driver": "blkdebug", "config": "/tmp/qmp-test-PqO2uz/fifo"}}
>>     22762@1542641707.823314:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-7", "driver": "blkdebug", "config": "/tmp/qmp-test-PqO2uz/fifo"}}
>>     22762@1542641707.824069:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-8", "driver": "blkdebug", "config": "/tmp/qmp-test-PqO2uz/fifo"}}
>> 
>> I/O thread receives and queues queue-blocks-2, ... queue-blocks-8.
>> 
>>     22762@1542641707.824253:handle_qmp_command mon 0x55f5b6a5de80 req: {"exec-oob": "migrate-pause"}
>>     22762@1542641707.824257:monitor_qmp_cmd_out_of_band oob-1
>>     22762@1542641707.824282:monitor_qmp_respond oob-1
>> 
>> I/O thread receives, executes and responds to oob-1.
>> 
>> Test is going to fail: response to oob-1 sent before response to
>> queue-blocks-1.
>> 
>> Now let me try your fix.  Here's what I see:
>> 
>>     16585@1542650042.980953:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-1", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
>>     16585@1542650042.981018:monitor_qmp_cmd_in_band queue-blocks-1
>>     16585@1542650042.981252:monitor_qmp_respond queue-blocks-1
>>     16585@1542650042.981838:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-2", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
>>     16585@1542650042.981916:monitor_qmp_cmd_in_band queue-blocks-2
>>     16585@1542650042.982656:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-3", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
>>     16585@1542650042.983406:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-4", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
>>     16585@1542650042.984170:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-5", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
>>     16585@1542650042.984975:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-6", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
>>     16585@1542650042.985778:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-7", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
>>     16585@1542650042.986565:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-8", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
>>     16585@1542650042.987396:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-9", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
>>     16585@1542650042.987626:handle_qmp_command mon 0x5597b8e0de80 req: {"exec-oob": "migrate-pause"}
>>     16585@1542650042.987630:monitor_qmp_cmd_out_of_band oob-1
>>     16585@1542650042.987676:monitor_qmp_respond oob-1
>>     16585@1542650042.988035:monitor_qmp_respond queue-blocks-2
>>     16585@1542650042.988066:monitor_qmp_cmd_in_band queue-blocks-3
>>     16585@1542650042.988329:monitor_qmp_respond queue-blocks-3
>>     16585@1542650042.988360:monitor_qmp_cmd_in_band queue-blocks-4
>>     16585@1542650042.988575:monitor_qmp_respond queue-blocks-4
>>     16585@1542650042.988616:monitor_qmp_cmd_in_band queue-blocks-5
>>     16585@1542650042.988803:monitor_qmp_respond queue-blocks-5
>>     16585@1542650042.988826:monitor_qmp_cmd_in_band queue-blocks-6
>>     16585@1542650042.988977:monitor_qmp_respond queue-blocks-6
>>     16585@1542650042.989001:monitor_qmp_cmd_in_band queue-blocks-7
>>     16585@1542650042.989223:monitor_qmp_respond queue-blocks-7
>>     16585@1542650042.989279:monitor_qmp_cmd_in_band queue-blocks-8
>>     16585@1542650042.989447:monitor_qmp_respond queue-blocks-8
>>     16585@1542650042.989469:monitor_qmp_cmd_in_band queue-blocks-9
>>     16585@1542650042.989582:monitor_qmp_respond queue-blocks-9
>> 
>> Tracepoint monitor_suspend still not hit.  This means the queue does not
>> fill up.
>
> You are right.  I think I figured out one thing but I missed another
> point here that "sent the command" does not mean "received by the QMP
> server": we can never identify when the queue is full, meanwhile
> that's what we need to write a unit test for it.
>
> For example: in this test what we want to achieve is:
>
>   for i in 1 .. N:
>       send blocking commands Bi
>   ... wait until queue is full ...  [1]
>   send oob command O1
>   unblock blocking command B1
>   receive response B1
>   receive response O1
>   for i in 2 .. N:
>       unblock blocking command Bi
>       receive response Bi
>
> All looks sane except the assumption at [1].  In the code we didn't do
> anything for [1] assuming that it's naturally happening.  However it's
> not since when finishing the first for loop we only queued the
> commands into the send buffer, but we can't guarantee that the
> commands are well received and further queued in the qmp request list
> on the server side.  Now when the server qmp request queue gets full
> it will only suspend the monitor, and that event is totally unseen
> from the client POV, so IMHO we don't have a good way to guarantee
> this.  E.g., we can do a very long sleep(100) at [1] to pretty make
> sure the queue full is happened, however it might still break on a
> super slow machine.
>
> In other words, I don't see a good way to automate the testing of
> queue full for now (but I'm pretty sure the queue full is working
> there since if you really insert a sleep you'll definitely see the
> monitor_suspend tracepoint).  We might need something like an event to
> make sure the queue full happened and then the client can detect this
> and insert the next out-of-band command, but then we're going backword
> and I don't see much point to introduce a new event only for testing
> purpose...

We want the test to drive QEMU into the "queue full" state with an
out-of-band command unreceived, then watch the queue drain and the
out-of-band command overtaking part of it.

Driving QEMU into this state is easy enough: we can use a blocking
in-band command to plug the queue's drain.

Once we have QEMU in this state, we can unblock that command and watch
the show.  But we really have to wait until QEMU reaches this state
before we unblock.

Our problem is the test can't easily observe when QEMU reaches this
state.

You mentioned sending a "queue full" event, and that adding such an
event just for testing is hard to justify.  I agree.  Could such an
event have a non-testing use?  All I can think of is letting the
management application detect "we mismanaged the queue and compromised
monitor availability for out-of-band commands".  Doesn't sound
particularly convincing to me, but we can talk with libvirt guys to see
whether they want it.

Can the test observe the monitor is suspended just from the behavior of
its monitor socket?  Not reliably: sending lots of input eventually
fails with EAGAIN when the monitor is suspended, but it can also fail
when the monitor reads the input too slowly.

Can the test observe the monitor_suspend trace point?  Awkward, because
the trace backends are configurable.  We could have the test monitor the
"simple" backend's output file, skipping the test when "simple" isn't
available.  Hmm.

Here comes the heavy hammer: extend the qtest protocol.  When the
monitor suspends / resumes, send a suitable asynchronous message to the
qtest socket.  Have libqtest use that to track monitor suspend state.
The test can then busy-wait for the state to flip to suspended.

Other ideas?

>> > Markus, do you want me to repost a new version with this change?  Is
>> > it still possible to have the oob-default series for 3.1?
>> 
>> We're trying to, but we need to get the test to work.
>> 
>> Two problems:
>> 
>> 1. The test doesn't seem to succed at testing "queue full".
>> 
>> 2. The test might still be racy.
>> 
>> I suspect the code we test is solid, and it's "only" the test we screwed
>> up.
>
> I agree.  Do you have better idea to test queue full?  I'd be more
> than glad to try any suggestions here, otherwise I am thinking whether
> we can just simply drop the queue full test (which is the last patch
> of the oob series) for now but merge the rest.  We can introduce new
> queue full test after 3.1 if we want, though again I really doubt
> whether we can without new things introduced into the procedure.
>
> Thoughts?

I'm afraid dropping the test (for now?) is what we have to do.

Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30
Posted by Peter Xu 6 years, 11 months ago
On Tue, Nov 20, 2018 at 06:44:27PM +0100, Markus Armbruster wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Mon, Nov 19, 2018 at 07:08:13PM +0100, Markus Armbruster wrote:
> >> Peter Xu <peterx@redhat.com> writes:
> >> 
> >> > On Mon, Nov 19, 2018 at 02:17:27PM +0800, Peter Xu wrote:
> >> >> I reproduced the error with a FreeBSD guest and this change (which
> >> >> possibly can be squashed into "tests: qmp-test: add queue full test")
> >> >> worked for me:
> >> >> 
> >> >> diff --git a/tests/qmp-test.c b/tests/qmp-test.c
> >> >> index 6989acbca4..83f353db4f 100644
> >> >> --- a/tests/qmp-test.c
> >> >> +++ b/tests/qmp-test.c
> >> >> @@ -281,8 +281,15 @@ static void test_qmp_oob(void)
> >> >>       * will only be able to be handled after the queue is shrinked, so
> >> >>       * it'll be processed only after one existing in-band command
> >> >>       * finishes.
> >> >> +     *
> >> >> +     * NOTE: we need to feed the queue with one extra request to make
> >> >> +     * sure it'll be stuck since when we have sent the Nth request
> >> >> +     * it's possible that we have already popped and processing the
> >> >> +     * 1st request so the Nth request (which could potentially be the
> >> >> +     * [N-1]th element on the queue) might not trigger the
> >> >> +     * monitor-full condition deterministically.
> >> >>       */
> >> >> -    for (i = 1; i <= QMP_REQ_QUEUE_LEN_MAX; i++) {
> >> >> +    for (i = 1; i <= QMP_REQ_QUEUE_LEN_MAX + 1; i++) {
> >> >>          id = g_strdup_printf("queue-blocks-%d", i);
> >> >>          send_cmd_that_blocks(qts, id);
> >> >>          g_free(id);
> >> >> @@ -291,7 +298,7 @@ static void test_qmp_oob(void)
> >> >>      unblock_blocked_cmd();
> >> >>      recv_cmd_id(qts, "queue-blocks-1");
> >> >>      recv_cmd_id(qts, "oob-1");
> >> >> -    for (i = 2; i <= QMP_REQ_QUEUE_LEN_MAX; i++) {
> >> >> +    for (i = 2; i <= QMP_REQ_QUEUE_LEN_MAX + 1; i++) {
> >> >>          unblock_blocked_cmd();
> >> >>          id = g_strdup_printf("queue-blocks-%d", i);
> >> >>          recv_cmd_id(qts, id);
> >> >> 
> >> >> So the problem here is that the queue-block-N command might not really
> >> >> suspend the monitor everytime if we already popped the 1st request,
> >> >> which will let the N-th request to be (N-1)th, then the parser will
> >> >> continue to eat the oob command and it could "preempt" the previous
> >> >> commands.
> >> >> 
> >> >> Maybe FreeBSD is scheduling the threads in some pattern so it happens
> >> >> only on FreeBSD and very constantly, but anyway it should be a general
> >> >> fix to the test program.
> >> 
> >> I feel particularly dense right now, and need a more detailed analysis
> >> to understand.  Let me try.
> >> 
> >> Here's what the test does before your fix
> >> 
> >>     Send QMP_REQ_QUEUE_LEN_MAX "blocking" in-band commands
> >>     queue-blocks-1, queue-blocks-2, ...  "Blocking" means the command
> >>     blocks until we explicitly unblock it.
> >> 
> >>     Send an out-of-band command oob-1.
> >> 
> >>     Unblock queue-blocks-1.
> >> 
> >>     Receive response to queue-blocks-1.
> >> 
> >>     Receive response to oob-1.
> >> 
> >>     Unblock queue-blocks-2, receive response
> >>     ...
> >> 
> >> Here's what test test expects QEMU to do:
> >> 
> >>     In I/O thread:
> >> 
> >>         Receive and queue queue-blocks-1, ...
> >> 
> >>         Queue is full, stop receiving commands.
> >> 
> >>     In main thread:
> >> 
> >>         Dequeue queue-blocks-1.
> >> 
> >>         Execute queue-blocks-1, block for a while, finish and send
> >>         response.
> >> 
> >>     In I/O thread:
> >> 
> >>         Queue is no longer full, resume receiving commands.
> >> 
> >>         Receive and execute oob-1, send response
> >> 
> >>     In main thread:
> >> 
> >>         Dequeue queue-blocks-2, execute, block for a while, finish and
> >>         send response.
> >>         ...
> >> 
> >> No good because the two threads race.  On my Linux box the test passes.
> >> Let's have a closer look how it does that.  The appended patch that adds
> >> another tracepoint.  With tracepoints monitor_* and handle_qmp_command
> >> enabled, I get:
> >> 
> >>     19271@1542641581.372890:handle_qmp_command mon 0x563e19be4e80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-1", "driver": "blkdebug", "config": "/tmp/qmp-test-EZroW3/fifo"}}
> >> 
> >> I/O thread receives and queues queue-blocks-1.
> >> 
> >>     19271@1542641581.372954:monitor_qmp_cmd_in_band queue-blocks-1
> >> 
> >> Main thread dequeues qemu-blocks-1 and starts executing it.
> >> 
> >> We already lost: the queue won't actually fill up.  The test passes
> >> anyway, but it doesn't actually test "queue full".  But why does it
> >> pass?  Let's see.
> >> 
> >>     19271@1542641581.373211:monitor_qmp_respond queue-blocks-1
> >> 
> >> Main thread sends response to queue-blocks-1.  This means queue-blocks-1
> >> has been unblocked already.  This is possible, because qmp-test sends
> >> all the commands and then immediately unblocks queue-blocks-1, and the
> >> main thread can win the race with the I/O thread.
> >> 
> >>     19271@1542641581.373826:handle_qmp_command mon 0x563e19be4e80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-2", "driver": "blkdebug", "config": "/tmp/qmp-test-EZroW3/fifo"}}
> >>     19271@1542641581.373864:monitor_qmp_cmd_in_band queue-blocks-2
> >>     19271@1542641581.374575:handle_qmp_command mon 0x563e19be4e80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-3", "driver": "blkdebug", "config": "/tmp/qmp-test-EZroW3/fifo"}}
> >>     19271@1542641581.375344:handle_qmp_command mon 0x563e19be4e80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-4", "driver": "blkdebug", "config": "/tmp/qmp-test-EZroW3/fifo"}}
> >>     19271@1542641581.376162:handle_qmp_command mon 0x563e19be4e80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-5", "driver": "blkdebug", "config": "/tmp/qmp-test-EZroW3/fifo"}}
> >>     19271@1542641581.376996:handle_qmp_command mon 0x563e19be4e80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-6", "driver": "blkdebug", "config": "/tmp/qmp-test-EZroW3/fifo"}}
> >>     19271@1542641581.377790:handle_qmp_command mon 0x563e19be4e80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-7", "driver": "blkdebug", "config": "/tmp/qmp-test-EZroW3/fifo"}}
> >>     19271@1542641581.378562:handle_qmp_command mon 0x563e19be4e80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-8", "driver": "blkdebug", "config": "/tmp/qmp-test-EZroW3/fifo"}}
> >> 
> >> I/O thread receives and queues queue-blocks-2, ... queue-blocks-8.
> >> 
> >>     19271@1542641581.378796:handle_qmp_command mon 0x563e19be4e80 req: {"exec-oob": "migrate-pause"}
> >>     19271@1542641581.378800:monitor_qmp_cmd_out_of_band oob-1
> >>     19271@1542641581.378823:monitor_qmp_respond oob-1
> >> 
> >> I/O thread receives, executes and responds to oob-1.
> >> 
> >>     19271@1542641581.379197:monitor_qmp_respond queue-blocks-2
> >>     19271@1542641581.379228:monitor_qmp_cmd_in_band queue-blocks-3
> >>     19271@1542641581.379435:monitor_qmp_respond queue-blocks-3
> >>     19271@1542641581.379477:monitor_qmp_cmd_in_band queue-blocks-4
> >>     19271@1542641581.379597:monitor_qmp_respond queue-blocks-4
> >>     19271@1542641581.379621:monitor_qmp_cmd_in_band queue-blocks-5
> >>     19271@1542641581.379800:monitor_qmp_respond queue-blocks-5
> >>     19271@1542641581.379824:monitor_qmp_cmd_in_band queue-blocks-6
> >>     19271@1542641581.379953:monitor_qmp_respond queue-blocks-6
> >>     19271@1542641581.379992:monitor_qmp_cmd_in_band queue-blocks-7
> >>     19271@1542641581.380132:monitor_qmp_respond queue-blocks-7
> >>     19271@1542641581.380170:monitor_qmp_cmd_in_band queue-blocks-8
> >>     19271@1542641581.380283:monitor_qmp_respond queue-blocks-8
> >> 
> >> Main thread dequeues, executes and responds to queue-blocks-2,
> >> ... queue-blocks-8.
> >> 
> >> If I make the main thread lose the race by sticking a sleep(1) before
> >> the unblock_blocked_cmd() that unblocks queue-blocks-1, I get:
> >> 
> >>     22762@1542641707.818838:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-1", "driver": "blkdebug", "config": "/tmp/qmp-test-PqO2uz/fifo"}}
> >> 
> >> I/O thread receives and queues queue-blocks-1.
> >> 
> >>     22762@1542641707.818856:monitor_qmp_cmd_in_band queue-blocks-1
> >> 
> >> Main thread dequeues qemu-blocks-1 and starts executing it.  We already
> >> lost.
> >> 
> >>     22762@1542641707.819585:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-2", "driver": "blkdebug", "config": "/tmp/qmp-test-PqO2uz/fifo"}}
> >>     22762@1542641707.820331:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-3", "driver": "blkdebug", "config": "/tmp/qmp-test-PqO2uz/fifo"}}
> >>     22762@1542641707.821072:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-4", "driver": "blkdebug", "config": "/tmp/qmp-test-PqO2uz/fifo"}}
> >>     22762@1542641707.821821:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-5", "driver": "blkdebug", "config": "/tmp/qmp-test-PqO2uz/fifo"}}
> >>     22762@1542641707.822569:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-6", "driver": "blkdebug", "config": "/tmp/qmp-test-PqO2uz/fifo"}}
> >>     22762@1542641707.823314:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-7", "driver": "blkdebug", "config": "/tmp/qmp-test-PqO2uz/fifo"}}
> >>     22762@1542641707.824069:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-8", "driver": "blkdebug", "config": "/tmp/qmp-test-PqO2uz/fifo"}}
> >> 
> >> I/O thread receives and queues queue-blocks-2, ... queue-blocks-8.
> >> 
> >>     22762@1542641707.824253:handle_qmp_command mon 0x55f5b6a5de80 req: {"exec-oob": "migrate-pause"}
> >>     22762@1542641707.824257:monitor_qmp_cmd_out_of_band oob-1
> >>     22762@1542641707.824282:monitor_qmp_respond oob-1
> >> 
> >> I/O thread receives, executes and responds to oob-1.
> >> 
> >> Test is going to fail: response to oob-1 sent before response to
> >> queue-blocks-1.
> >> 
> >> Now let me try your fix.  Here's what I see:
> >> 
> >>     16585@1542650042.980953:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-1", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
> >>     16585@1542650042.981018:monitor_qmp_cmd_in_band queue-blocks-1
> >>     16585@1542650042.981252:monitor_qmp_respond queue-blocks-1
> >>     16585@1542650042.981838:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-2", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
> >>     16585@1542650042.981916:monitor_qmp_cmd_in_band queue-blocks-2
> >>     16585@1542650042.982656:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-3", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
> >>     16585@1542650042.983406:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-4", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
> >>     16585@1542650042.984170:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-5", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
> >>     16585@1542650042.984975:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-6", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
> >>     16585@1542650042.985778:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-7", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
> >>     16585@1542650042.986565:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-8", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
> >>     16585@1542650042.987396:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": "queue-blocks-9", "driver": "blkdebug", "config": "/tmp/qmp-test-8YRbOJ/fifo"}}
> >>     16585@1542650042.987626:handle_qmp_command mon 0x5597b8e0de80 req: {"exec-oob": "migrate-pause"}
> >>     16585@1542650042.987630:monitor_qmp_cmd_out_of_band oob-1
> >>     16585@1542650042.987676:monitor_qmp_respond oob-1
> >>     16585@1542650042.988035:monitor_qmp_respond queue-blocks-2
> >>     16585@1542650042.988066:monitor_qmp_cmd_in_band queue-blocks-3
> >>     16585@1542650042.988329:monitor_qmp_respond queue-blocks-3
> >>     16585@1542650042.988360:monitor_qmp_cmd_in_band queue-blocks-4
> >>     16585@1542650042.988575:monitor_qmp_respond queue-blocks-4
> >>     16585@1542650042.988616:monitor_qmp_cmd_in_band queue-blocks-5
> >>     16585@1542650042.988803:monitor_qmp_respond queue-blocks-5
> >>     16585@1542650042.988826:monitor_qmp_cmd_in_band queue-blocks-6
> >>     16585@1542650042.988977:monitor_qmp_respond queue-blocks-6
> >>     16585@1542650042.989001:monitor_qmp_cmd_in_band queue-blocks-7
> >>     16585@1542650042.989223:monitor_qmp_respond queue-blocks-7
> >>     16585@1542650042.989279:monitor_qmp_cmd_in_band queue-blocks-8
> >>     16585@1542650042.989447:monitor_qmp_respond queue-blocks-8
> >>     16585@1542650042.989469:monitor_qmp_cmd_in_band queue-blocks-9
> >>     16585@1542650042.989582:monitor_qmp_respond queue-blocks-9
> >> 
> >> Tracepoint monitor_suspend still not hit.  This means the queue does not
> >> fill up.
> >
> > You are right.  I think I figured out one thing but I missed another
> > point here that "sent the command" does not mean "received by the QMP
> > server": we can never identify when the queue is full, meanwhile
> > that's what we need to write a unit test for it.
> >
> > For example: in this test what we want to achieve is:
> >
> >   for i in 1 .. N:
> >       send blocking commands Bi
> >   ... wait until queue is full ...  [1]
> >   send oob command O1
> >   unblock blocking command B1
> >   receive response B1
> >   receive response O1
> >   for i in 2 .. N:
> >       unblock blocking command Bi
> >       receive response Bi
> >
> > All looks sane except the assumption at [1].  In the code we didn't do
> > anything for [1] assuming that it's naturally happening.  However it's
> > not since when finishing the first for loop we only queued the
> > commands into the send buffer, but we can't guarantee that the
> > commands are well received and further queued in the qmp request list
> > on the server side.  Now when the server qmp request queue gets full
> > it will only suspend the monitor, and that event is totally unseen
> > from the client POV, so IMHO we don't have a good way to guarantee
> > this.  E.g., we can do a very long sleep(100) at [1] to pretty make
> > sure the queue full is happened, however it might still break on a
> > super slow machine.
> >
> > In other words, I don't see a good way to automate the testing of
> > queue full for now (but I'm pretty sure the queue full is working
> > there since if you really insert a sleep you'll definitely see the
> > monitor_suspend tracepoint).  We might need something like an event to
> > make sure the queue full happened and then the client can detect this
> > and insert the next out-of-band command, but then we're going backword
> > and I don't see much point to introduce a new event only for testing
> > purpose...
> 
> We want the test to drive QEMU into the "queue full" state with an
> out-of-band command unreceived, then watch the queue drain and the
> out-of-band command overtaking part of it.
> 
> Driving QEMU into this state is easy enough: we can use a blocking
> in-band command to plug the queue's drain.
> 
> Once we have QEMU in this state, we can unblock that command and watch
> the show.  But we really have to wait until QEMU reaches this state
> before we unblock.
> 
> Our problem is the test can't easily observe when QEMU reaches this
> state.
> 
> You mentioned sending a "queue full" event, and that adding such an
> event just for testing is hard to justify.  I agree.  Could such an
> event have a non-testing use?  All I can think of is letting the
> management application detect "we mismanaged the queue and compromised
> monitor availability for out-of-band commands".  Doesn't sound
> particularly convincing to me, but we can talk with libvirt guys to see
> whether they want it.
> 
> Can the test observe the monitor is suspended just from the behavior of
> its monitor socket?  Not reliably: sending lots of input eventually
> fails with EAGAIN when the monitor is suspended, but it can also fail
> when the monitor reads the input too slowly.
> 
> Can the test observe the monitor_suspend trace point?  Awkward, because
> the trace backends are configurable.  We could have the test monitor the
> "simple" backend's output file, skipping the test when "simple" isn't
> available.  Hmm.
> 
> Here comes the heavy hammer: extend the qtest protocol.  When the
> monitor suspends / resumes, send a suitable asynchronous message to the
> qtest socket.  Have libqtest use that to track monitor suspend state.
> The test can then busy-wait for the state to flip to suspended.
> 
> Other ideas?

I don't have more to add (already a lot of ideas! :-).  I'd say I
would prefer to start with the queue full event and we can discuss
with libvirt on that after 3.1 when proper.  And besides libvirt
(which I am not quite sure whether this event could be really useful
in the near future) maybe it can also be used as a hint for any new
QMP client when people want to complain about "hey, why my QMP client
didn't get any respond from QEMU" - if the client didn't get any
response but however it sees this event then we know the answer even
before debugging more.

> 
> >> > Markus, do you want me to repost a new version with this change?  Is
> >> > it still possible to have the oob-default series for 3.1?
> >> 
> >> We're trying to, but we need to get the test to work.
> >> 
> >> Two problems:
> >> 
> >> 1. The test doesn't seem to succed at testing "queue full".
> >> 
> >> 2. The test might still be racy.
> >> 
> >> I suspect the code we test is solid, and it's "only" the test we screwed
> >> up.
> >
> > I agree.  Do you have better idea to test queue full?  I'd be more
> > than glad to try any suggestions here, otherwise I am thinking whether
> > we can just simply drop the queue full test (which is the last patch
> > of the oob series) for now but merge the rest.  We can introduce new
> > queue full test after 3.1 if we want, though again I really doubt
> > whether we can without new things introduced into the procedure.
> >
> > Thoughts?
> 
> I'm afraid dropping the test (for now?) is what we have to do.

Agreed.

Thanks,

-- 
Peter Xu

Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30
Posted by Markus Armbruster 6 years, 11 months ago
Peter Xu <peterx@redhat.com> writes:

> On Tue, Nov 20, 2018 at 06:44:27PM +0100, Markus Armbruster wrote:
>> 
>> We want the test to drive QEMU into the "queue full" state with an
>> out-of-band command unreceived, then watch the queue drain and the
>> out-of-band command overtaking part of it.
>> 
>> Driving QEMU into this state is easy enough: we can use a blocking
>> in-band command to plug the queue's drain.
>> 
>> Once we have QEMU in this state, we can unblock that command and watch
>> the show.  But we really have to wait until QEMU reaches this state
>> before we unblock.
>> 
>> Our problem is the test can't easily observe when QEMU reaches this
>> state.
>> 
>> You mentioned sending a "queue full" event, and that adding such an
>> event just for testing is hard to justify.  I agree.  Could such an
>> event have a non-testing use?  All I can think of is letting the
>> management application detect "we mismanaged the queue and compromised
>> monitor availability for out-of-band commands".  Doesn't sound
>> particularly convincing to me, but we can talk with libvirt guys to see
>> whether they want it.
>> 
>> Can the test observe the monitor is suspended just from the behavior of
>> its monitor socket?  Not reliably: sending lots of input eventually
>> fails with EAGAIN when the monitor is suspended, but it can also fail
>> when the monitor reads the input too slowly.
>> 
>> Can the test observe the monitor_suspend trace point?  Awkward, because
>> the trace backends are configurable.  We could have the test monitor the
>> "simple" backend's output file, skipping the test when "simple" isn't
>> available.  Hmm.
>> 
>> Here comes the heavy hammer: extend the qtest protocol.  When the
>> monitor suspends / resumes, send a suitable asynchronous message to the
>> qtest socket.  Have libqtest use that to track monitor suspend state.
>> The test can then busy-wait for the state to flip to suspended.
>> 
>> Other ideas?
>
> I don't have more to add (already a lot of ideas! :-).  I'd say I
> would prefer to start with the queue full event and we can discuss
> with libvirt on that after 3.1 when proper.  And besides libvirt
> (which I am not quite sure whether this event could be really useful
> in the near future) maybe it can also be used as a hint for any new
> QMP client when people want to complain about "hey, why my QMP client
> didn't get any respond from QEMU" - if the client didn't get any
> response but however it sees this event then we know the answer even
> before debugging more.

Good point.

Say we implement full flow control, i.e. we also suspend when the output
buffer becomes too large.  Then the event won't really help, because the
client won't see it until it fixed the situation by draining the output
buffer.  But it won't really hurt, either: even a client that misbehaves
in a really unfortunate way can't trigger many events without also
generating much more other output.  Orders of magnitude more.

>> >> > Markus, do you want me to repost a new version with this change?  Is
>> >> > it still possible to have the oob-default series for 3.1?
>> >> 
>> >> We're trying to, but we need to get the test to work.
>> >> 
>> >> Two problems:
>> >> 
>> >> 1. The test doesn't seem to succed at testing "queue full".
>> >> 
>> >> 2. The test might still be racy.
>> >> 
>> >> I suspect the code we test is solid, and it's "only" the test we screwed
>> >> up.
>> >
>> > I agree.  Do you have better idea to test queue full?  I'd be more
>> > than glad to try any suggestions here, otherwise I am thinking whether
>> > we can just simply drop the queue full test (which is the last patch
>> > of the oob series) for now but merge the rest.  We can introduce new
>> > queue full test after 3.1 if we want, though again I really doubt
>> > whether we can without new things introduced into the procedure.
>> >
>> > Thoughts?
>> 
>> I'm afraid dropping the test (for now?) is what we have to do.
>
> Agreed.

Okay, I'll prepare the pull request later today.

Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30
Posted by Markus Armbruster 6 years, 11 months ago
Markus Armbruster <armbru@redhat.com> writes:

> Peter Xu <peterx@redhat.com> writes:
>
>> On Tue, Nov 20, 2018 at 06:44:27PM +0100, Markus Armbruster wrote:
>>> 
>>> We want the test to drive QEMU into the "queue full" state with an
>>> out-of-band command unreceived, then watch the queue drain and the
>>> out-of-band command overtaking part of it.
>>> 
>>> Driving QEMU into this state is easy enough: we can use a blocking
>>> in-band command to plug the queue's drain.
>>> 
>>> Once we have QEMU in this state, we can unblock that command and watch
>>> the show.  But we really have to wait until QEMU reaches this state
>>> before we unblock.
>>> 
>>> Our problem is the test can't easily observe when QEMU reaches this
>>> state.
>>> 
>>> You mentioned sending a "queue full" event, and that adding such an
>>> event just for testing is hard to justify.  I agree.  Could such an
>>> event have a non-testing use?  All I can think of is letting the
>>> management application detect "we mismanaged the queue and compromised
>>> monitor availability for out-of-band commands".  Doesn't sound
>>> particularly convincing to me, but we can talk with libvirt guys to see
>>> whether they want it.
>>> 
>>> Can the test observe the monitor is suspended just from the behavior of
>>> its monitor socket?  Not reliably: sending lots of input eventually
>>> fails with EAGAIN when the monitor is suspended, but it can also fail
>>> when the monitor reads the input too slowly.
>>> 
>>> Can the test observe the monitor_suspend trace point?  Awkward, because
>>> the trace backends are configurable.  We could have the test monitor the
>>> "simple" backend's output file, skipping the test when "simple" isn't
>>> available.  Hmm.
>>> 
>>> Here comes the heavy hammer: extend the qtest protocol.  When the
>>> monitor suspends / resumes, send a suitable asynchronous message to the
>>> qtest socket.  Have libqtest use that to track monitor suspend state.
>>> The test can then busy-wait for the state to flip to suspended.
>>> 
>>> Other ideas?
>>
>> I don't have more to add (already a lot of ideas! :-).  I'd say I
>> would prefer to start with the queue full event and we can discuss
>> with libvirt on that after 3.1 when proper.  And besides libvirt
>> (which I am not quite sure whether this event could be really useful
>> in the near future) maybe it can also be used as a hint for any new
>> QMP client when people want to complain about "hey, why my QMP client
>> didn't get any respond from QEMU" - if the client didn't get any
>> response but however it sees this event then we know the answer even
>> before debugging more.
>
> Good point.
>
> Say we implement full flow control, i.e. we also suspend when the output
> buffer becomes too large.  Then the event won't really help, because the
> client won't see it until it fixed the situation by draining the output
> buffer.  But it won't really hurt, either: even a client that misbehaves
> in a really unfortunate way can't trigger many events without also
> generating much more other output.  Orders of magnitude more.
>
>>> >> > Markus, do you want me to repost a new version with this change?  Is
>>> >> > it still possible to have the oob-default series for 3.1?
>>> >> 
>>> >> We're trying to, but we need to get the test to work.
>>> >> 
>>> >> Two problems:
>>> >> 
>>> >> 1. The test doesn't seem to succed at testing "queue full".
>>> >> 
>>> >> 2. The test might still be racy.
>>> >> 
>>> >> I suspect the code we test is solid, and it's "only" the test we screwed
>>> >> up.
>>> >
>>> > I agree.  Do you have better idea to test queue full?  I'd be more
>>> > than glad to try any suggestions here, otherwise I am thinking whether
>>> > we can just simply drop the queue full test (which is the last patch
>>> > of the oob series) for now but merge the rest.  We can introduce new
>>> > queue full test after 3.1 if we want, though again I really doubt
>>> > whether we can without new things introduced into the procedure.
>>> >
>>> > Thoughts?
>>> 
>>> I'm afraid dropping the test (for now?) is what we have to do.
>>
>> Agreed.
>
> Okay, I'll prepare the pull request later today.

I didn't.  I decided it's too late for 3.1.  Sigh...

I'm keeping it queued for 4.0.