[PATCH v2] monitor/qmp: resume monitor when clearing its queue

Wolfgang Bumiller posted 1 patch 4 years, 6 months ago
Test FreeBSD passed
Test checkpatch passed
Test docker-clang@ubuntu passed
Test docker-mingw@fedora passed
Test asan passed
Test docker-quick@centos7 passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20191024081231.19087-1-w.bumiller@proxmox.com
Maintainers: Markus Armbruster <armbru@redhat.com>
There is a newer version of this series
monitor/qmp.c | 24 ++++++++++++++++++++++--
1 file changed, 22 insertions(+), 2 deletions(-)
[PATCH v2] monitor/qmp: resume monitor when clearing its queue
Posted by Wolfgang Bumiller 4 years, 6 months ago
When a monitor's queue is filled up in handle_qmp_command()
it gets suspended. It's the dispatcher bh's job currently to
resume the monitor, which it does after processing an event
from the queue. However, it is possible for a
CHR_EVENT_CLOSED event to be processed before before the bh
is scheduled, which will clear the queue without resuming
the monitor, thereby preventing the dispatcher from reaching
the resume() call.
Any new connections to the qmp socket will be accept()ed and
show the greeting, but will not respond to any messages sent
afterwards (as they will not be read from the
still-suspended socket).
Fix this by resuming the monitor when clearing a queue which
was filled up.

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
Changes to v1:
  * Update commit message to include the resulting symptoms.
  * Moved the resume code from `monitor_qmp_cleanup_req_queue_locked` to
    `monitor_qmp_cleanup_queues` to avoid an unnecessary resume when
    destroying the monitor (as the `_locked` version is also used by
    `monitor_data_destroy()`.
  * Renamed `monitor_qmp_cleanup_queues` to
    `monitor_qmp_cleanup_queues_and_resume` to reflect the change and be
    verbose about it for potential future users of the function.
    Currently the only user is `monitor_qmp_event()` in the
    `CHR_EVENT_CLOSED` case, which is exactly the problematic case currently.

Sorry for the deleay :|

 monitor/qmp.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/monitor/qmp.c b/monitor/qmp.c
index 9d9e5d8b27..df689aa95e 100644
--- a/monitor/qmp.c
+++ b/monitor/qmp.c
@@ -75,10 +75,30 @@ static void monitor_qmp_cleanup_req_queue_locked(MonitorQMP *mon)
     }
 }
 
-static void monitor_qmp_cleanup_queues(MonitorQMP *mon)
+static void monitor_qmp_cleanup_queues_and_resume(MonitorQMP *mon)
 {
     qemu_mutex_lock(&mon->qmp_queue_lock);
+
+    /*
+     * Same condition as in monitor_qmp_bh_dispatcher(), but before removing an
+     * element from the queue (hence no `- 1`), also, the queue should not be
+     * empty either, otherwise the monitor hasn't been suspended yet (or was
+     * already resumed).
+     */
+    bool need_resume = (!qmp_oob_enabled(mon) && mon->qmp_requests->length > 0)
+        || mon->qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX;
+
     monitor_qmp_cleanup_req_queue_locked(mon);
+
+    if (need_resume) {
+        /*
+         * Pairs with the monitor_suspend() in handle_qmp_command() in case the
+         * queue gets cleared from a CH_EVENT_CLOSED event before the dispatch
+         * bh got scheduled.
+         */
+        monitor_resume(&mon->common);
+    }
+
     qemu_mutex_unlock(&mon->qmp_queue_lock);
 }
 
@@ -332,7 +352,7 @@ static void monitor_qmp_event(void *opaque, int event)
          * stdio, it's possible that stdout is still open when stdin
          * is closed.
          */
-        monitor_qmp_cleanup_queues(mon);
+        monitor_qmp_cleanup_queues_and_resume(mon);
         json_message_parser_destroy(&mon->parser);
         json_message_parser_init(&mon->parser, handle_qmp_command,
                                  mon, NULL);
-- 
2.20.1



Re: [PATCH v2] monitor/qmp: resume monitor when clearing its queue
Posted by Markus Armbruster 4 years, 5 months ago
Wolfgang Bumiller <w.bumiller@proxmox.com> writes:

> When a monitor's queue is filled up in handle_qmp_command()
> it gets suspended. It's the dispatcher bh's job currently to
> resume the monitor, which it does after processing an event
> from the queue. However, it is possible for a
> CHR_EVENT_CLOSED event to be processed before before the bh
> is scheduled, which will clear the queue without resuming
> the monitor, thereby preventing the dispatcher from reaching
> the resume() call.
> Any new connections to the qmp socket will be accept()ed and
> show the greeting, but will not respond to any messages sent
> afterwards (as they will not be read from the
> still-suspended socket).
> Fix this by resuming the monitor when clearing a queue which
> was filled up.
>
> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
> ---
> Changes to v1:
>   * Update commit message to include the resulting symptoms.
>   * Moved the resume code from `monitor_qmp_cleanup_req_queue_locked` to
>     `monitor_qmp_cleanup_queues` to avoid an unnecessary resume when
>     destroying the monitor (as the `_locked` version is also used by
>     `monitor_data_destroy()`.
>   * Renamed `monitor_qmp_cleanup_queues` to
>     `monitor_qmp_cleanup_queues_and_resume` to reflect the change and be
>     verbose about it for potential future users of the function.
>     Currently the only user is `monitor_qmp_event()` in the
>     `CHR_EVENT_CLOSED` case, which is exactly the problematic case currently.
>
> Sorry for the deleay :|

Same to you (my sorry excuse is KVM Forum).  Now we need to hurry to get
this fix into 4.2.  Let's try.

>  monitor/qmp.c | 24 ++++++++++++++++++++++--
>  1 file changed, 22 insertions(+), 2 deletions(-)
>
> diff --git a/monitor/qmp.c b/monitor/qmp.c
> index 9d9e5d8b27..df689aa95e 100644
> --- a/monitor/qmp.c
> +++ b/monitor/qmp.c
> @@ -75,10 +75,30 @@ static void monitor_qmp_cleanup_req_queue_locked(MonitorQMP *mon)
>      }
>  }
>  
> -static void monitor_qmp_cleanup_queues(MonitorQMP *mon)
> +static void monitor_qmp_cleanup_queues_and_resume(MonitorQMP *mon)

Let's rename to _cleanup_queue_and resume().  The plural is a remnant
from when we also had a response queue.  Gone since commit 27656018d86.

>  {
>      qemu_mutex_lock(&mon->qmp_queue_lock);
> +
> +    /*
> +     * Same condition as in monitor_qmp_bh_dispatcher(), but before removing an
> +     * element from the queue (hence no `- 1`), also, the queue should not be
> +     * empty either, otherwise the monitor hasn't been suspended yet (or was
> +     * already resumed).
> +     */

Comment lines should be wrapped around colum 70.

> +    bool need_resume = (!qmp_oob_enabled(mon) && mon->qmp_requests->length > 0)
> +        || mon->qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX;

Now let me digest the comment.  Here's condition in
monitor_qmp_bh_dispatcher():

    need_resume = !qmp_oob_enabled(mon) ||
        mon->qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX - 1;

"Same but before removing" is

       bool need_resume = !qmp_oob_enabled(mon)
           || mon->qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX;

That leaves the "also" part.  It's been too long since v1, I don't
remember a thing, and I'm too dense today to understand without more
help.  Can you help me some more?

> +
>      monitor_qmp_cleanup_req_queue_locked(mon);
> +
> +    if (need_resume) {
> +        /*
> +         * Pairs with the monitor_suspend() in handle_qmp_command() in case the
> +         * queue gets cleared from a CH_EVENT_CLOSED event before the dispatch
> +         * bh got scheduled.
> +         */

CHR_EVENT_CLOSED

Suggest:

           /*
            * handle_qmp_command() suspened the monitor because the
            * request queue filled up, to be resumed when the queue has
            * space again.  We just emptied it; resume the monitor.
            */

If we want to record the issue that made us fix the missing resume, we
can add:

            * Without this, the monitor would remain suspended forever
            * when we get here while the monitor is suspended.  An
            * unfortunately times CHR_EVENT_CLOSED can do the trick.

Also update handle_qmp_command()'s comment:

    /*
     * Suspend the monitor when we can't queue more requests after
     * this one.  Dequeuing in monitor_qmp_bh_dispatcher() or
     * monitor_qmp_cleanup_queue_and_resume() will resume it.
     * Note that when OOB is disabled, we queue at most one command,
     * for backward compatibility.
     */

> +        monitor_resume(&mon->common);
> +    }
> +
>      qemu_mutex_unlock(&mon->qmp_queue_lock);
>  }
>  
> @@ -332,7 +352,7 @@ static void monitor_qmp_event(void *opaque, int event)
>           * stdio, it's possible that stdout is still open when stdin
>           * is closed.
>           */
> -        monitor_qmp_cleanup_queues(mon);
> +        monitor_qmp_cleanup_queues_and_resume(mon);
>          json_message_parser_destroy(&mon->parser);
>          json_message_parser_init(&mon->parser, handle_qmp_command,
>                                   mon, NULL);


Re: [PATCH v2] monitor/qmp: resume monitor when clearing its queue
Posted by Wolfgang Bumiller 4 years, 5 months ago
On Wed, Nov 13, 2019 at 05:45:57PM +0100, Markus Armbruster wrote:
> Wolfgang Bumiller <w.bumiller@proxmox.com> writes:
> 
> > When a monitor's queue is filled up in handle_qmp_command()
> > it gets suspended. It's the dispatcher bh's job currently to
> > resume the monitor, which it does after processing an event
> > from the queue. However, it is possible for a
> > CHR_EVENT_CLOSED event to be processed before before the bh
> > is scheduled, which will clear the queue without resuming
> > the monitor, thereby preventing the dispatcher from reaching
> > the resume() call.
> > Any new connections to the qmp socket will be accept()ed and
> > show the greeting, but will not respond to any messages sent
> > afterwards (as they will not be read from the
> > still-suspended socket).
> > Fix this by resuming the monitor when clearing a queue which
> > was filled up.
> >
> > Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
> > ---
> > Changes to v1:
> >   * Update commit message to include the resulting symptoms.
> >   * Moved the resume code from `monitor_qmp_cleanup_req_queue_locked` to
> >     `monitor_qmp_cleanup_queues` to avoid an unnecessary resume when
> >     destroying the monitor (as the `_locked` version is also used by
> >     `monitor_data_destroy()`.
> >   * Renamed `monitor_qmp_cleanup_queues` to
> >     `monitor_qmp_cleanup_queues_and_resume` to reflect the change and be
> >     verbose about it for potential future users of the function.
> >     Currently the only user is `monitor_qmp_event()` in the
> >     `CHR_EVENT_CLOSED` case, which is exactly the problematic case currently.
> >
> > Sorry for the deleay :|
> 
> Same to you (my sorry excuse is KVM Forum).  Now we need to hurry to get
> this fix into 4.2.  Let's try.

Uh, that's already in Rcs. *hurries*

> >  monitor/qmp.c | 24 ++++++++++++++++++++++--
> >  1 file changed, 22 insertions(+), 2 deletions(-)
> >
> > diff --git a/monitor/qmp.c b/monitor/qmp.c
> > index 9d9e5d8b27..df689aa95e 100644
> > --- a/monitor/qmp.c
> > +++ b/monitor/qmp.c
> > @@ -75,10 +75,30 @@ static void monitor_qmp_cleanup_req_queue_locked(MonitorQMP *mon)
> >      }
> >  }
> >  
> > -static void monitor_qmp_cleanup_queues(MonitorQMP *mon)
> > +static void monitor_qmp_cleanup_queues_and_resume(MonitorQMP *mon)
> 
> Let's rename to _cleanup_queue_and resume().  The plural is a remnant
> from when we also had a response queue.  Gone since commit 27656018d86.
> 
> >  {
> >      qemu_mutex_lock(&mon->qmp_queue_lock);
> > +
> > +    /*
> > +     * Same condition as in monitor_qmp_bh_dispatcher(), but before removing an
> > +     * element from the queue (hence no `- 1`), also, the queue should not be
> > +     * empty either, otherwise the monitor hasn't been suspended yet (or was
> > +     * already resumed).
> > +     */
> 
> Comment lines should be wrapped around colum 70.
> 
> > +    bool need_resume = (!qmp_oob_enabled(mon) && mon->qmp_requests->length > 0)
> > +        || mon->qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX;
> 
> Now let me digest the comment.  Here's condition in
> monitor_qmp_bh_dispatcher():
> 
>     need_resume = !qmp_oob_enabled(mon) ||
>         mon->qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX - 1;
> 
> "Same but before removing" is
> 
>        bool need_resume = !qmp_oob_enabled(mon)
>            || mon->qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX;
> 
> That leaves the "also" part.  It's been too long since v1, I don't
> remember a thing, and I'm too dense today to understand without more
> help.  Can you help me some more?

Not empty: `&& length > 0`. The 2nd part of the disjunction already
implies it (as it is `length == max`) so I only added it to the first
part. A pointless optimization on my part given that it's supposed to be
more easily verified against the comment, so I'll change it to be
clearer: ($condition_without_'-1') && (!g_queue_is_empty()), so:

    bool need_resume = (!qmp_oob_enabled(mon) ||
        mon->qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX)
        && !g_queue_is_empty(mon->qmp_requests);

> 
> > +
> >      monitor_qmp_cleanup_req_queue_locked(mon);
> > +
> > +    if (need_resume) {
> > +        /*
> > +         * Pairs with the monitor_suspend() in handle_qmp_command() in case the
> > +         * queue gets cleared from a CH_EVENT_CLOSED event before the dispatch
> > +         * bh got scheduled.
> > +         */
> 
> CHR_EVENT_CLOSED
> 
> Suggest:
> 
>            /*
>             * handle_qmp_command() suspened the monitor because the
>             * request queue filled up, to be resumed when the queue has
>             * space again.  We just emptied it; resume the monitor.
>             */
> 
> If we want to record the issue that made us fix the missing resume, we
> can add:
> 
>             * Without this, the monitor would remain suspended forever
>             * when we get here while the monitor is suspended.  An
>             * unfortunately times CHR_EVENT_CLOSED can do the trick.
> 
> Also update handle_qmp_command()'s comment:
> 
>     /*
>      * Suspend the monitor when we can't queue more requests after
>      * this one.  Dequeuing in monitor_qmp_bh_dispatcher() or
>      * monitor_qmp_cleanup_queue_and_resume() will resume it.
>      * Note that when OOB is disabled, we queue at most one command,
>      * for backward compatibility.
>      */

will do.