qapi/misc.json | 3 +- qapi/ui.json | 3 +- scripts/qapi/commands.py | 151 ++++++++++++++--- scripts/qapi/common.py | 15 +- scripts/qapi/doc.py | 3 +- scripts/qapi/introspect.py | 3 +- hmp.h | 3 +- include/monitor/monitor.h | 3 + include/qapi/qmp/dispatch.h | 89 +++++++++- include/qapi/qmp/json-parser.h | 7 +- include/ui/console.h | 5 + hmp.c | 6 +- hw/display/qxl-render.c | 9 +- hw/display/qxl.c | 1 + monitor.c | 198 ++++++++++++++-------- qapi/qmp-dispatch.c | 214 +++++++++++++++++++----- qapi/qmp-registry.c | 33 +++- qga/commands.c | 2 +- qga/main.c | 51 ++---- qobject/json-lexer.c | 5 +- qobject/json-streamer.c | 3 +- tests/test-qmp-cmds.c | 206 +++++++++++++++++++---- ui/console.c | 100 +++++++++-- hmp-commands.hx | 3 +- tests/qapi-schema/qapi-schema-test.json | 5 + tests/qapi-schema/qapi-schema-test.out | 8 + tests/qapi-schema/test-qapi.py | 8 +- 27 files changed, 877 insertions(+), 260 deletions(-)
Hi, HMP and QMP commands are handled synchronously in qemu today. But there are benefits allowing the command handler to re-enter the main loop if the command cannot be handled synchronously, or if it is long-lasting. Some bugs such as rhbz#1230527 are difficult to solve without it. The common solution is to use a pair of command+event in this case. But this approach has a number of issues: - you can't "fix" an existing command: you need a new API, and ad-hoc documentation for that command+signal association, and old/broken command deprecation - since the reply event is broadcasted and 'id' is used for matching the request, it may conflict with other clients request 'id' space - it is arguably less efficient and elegant (weird API, useless return in most cases, broadcast reply, no cancelling on disconnect etc) The following series implements an async command solution instead. By introducing a session context and a command return handler, it can: - defer the return, allowing the mainloop to reenter - return only to the caller (instead of broadcast events for reply) - optionnally allow cancellation when the client is gone - track on-going qapi command(s) per client/session and without introduction of new QMP APIs or client visible change. Existing qemu commands can be gradually replaced by async:true variants when needed, while carefully reviewing the concurrency aspects. The async:true commands marshaller helpers are splitted in half, the calling and return functions. The command is called with a QmpReturn context, that can return immediately or later, using the generated return helper, which allows for a step-by-step conversion. The screendump command is converted to an async:true version to solve rhbz#1230527. The command shows basic cancellation (this could be extended if needed). It could be further improved to do asynchronous IO writes as well. v4: - rebased, mostly adapting to new OOB code (there was not much feedback in v3 for the async command part, but preliminary patches got merged!) - drop the RFC status v3: - complete rework, dropping the asynchronous commands visibility from the protocol side entirely (until there is a real need for it) - rebased, with a few preliminary cleanup patches - teach asynchronous commands to HMP v2: - documentation fixes and improvements - fix calling async commands sync without id - fix bad hmp monitor assert - add a few extra asserts - add async with no-id failure and screendump test Marc-André Lureau (20): qmp: constify QmpCommand and list json-lexer: make it safe to call destroy multiple times qmp: add QmpSession QmpSession: add a return callback QmpSession: add json parser and use it in qga monitor: use qmp session to parse json feed qga: simplify dispatch_return_cb QmpSession: introduce QmpReturn qmp: simplify qmp_return_error() QmpSession: keep a queue of pending commands QmpSession: return orderly qmp: introduce asynchronous command type scripts: learn 'async' qapi commands qmp: add qmp_return_is_cancelled() monitor: add qmp_return_get_monitor() console: add graphic_hw_update_done() console: make screendump asynchronous monitor: start making qmp_human_monitor_command() asynchronous monitor: teach HMP about asynchronous commands hmp: call the asynchronous QMP screendump to fix outdated/glitches qapi/misc.json | 3 +- qapi/ui.json | 3 +- scripts/qapi/commands.py | 151 ++++++++++++++--- scripts/qapi/common.py | 15 +- scripts/qapi/doc.py | 3 +- scripts/qapi/introspect.py | 3 +- hmp.h | 3 +- include/monitor/monitor.h | 3 + include/qapi/qmp/dispatch.h | 89 +++++++++- include/qapi/qmp/json-parser.h | 7 +- include/ui/console.h | 5 + hmp.c | 6 +- hw/display/qxl-render.c | 9 +- hw/display/qxl.c | 1 + monitor.c | 198 ++++++++++++++-------- qapi/qmp-dispatch.c | 214 +++++++++++++++++++----- qapi/qmp-registry.c | 33 +++- qga/commands.c | 2 +- qga/main.c | 51 ++---- qobject/json-lexer.c | 5 +- qobject/json-streamer.c | 3 +- tests/test-qmp-cmds.c | 206 +++++++++++++++++++---- ui/console.c | 100 +++++++++-- hmp-commands.hx | 3 +- tests/qapi-schema/qapi-schema-test.json | 5 + tests/qapi-schema/qapi-schema-test.out | 8 + tests/qapi-schema/test-qapi.py | 8 +- 27 files changed, 877 insertions(+), 260 deletions(-) -- 2.21.0.196.g041f5ea1cf
Patchew URL: https://patchew.org/QEMU/20190409161009.6322-1-marcandre.lureau@redhat.com/ Hi, This series failed the asan build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. === TEST SCRIPT BEGIN === #!/bin/bash time make docker-test-debug@fedora TARGET_LIST=x86_64-softmmu J=14 NETWORK=1 === TEST SCRIPT END === The full log is available at http://patchew.org/logs/20190409161009.6322-1-marcandre.lureau@redhat.com/testing.asan/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-devel@redhat.com
Marc-André, before you invest your time to answer my questions below: my bandwidth for non-trivial QAPI features like this one is painfully limited. To get your QAPI conditionals in, I had to postpone other QAPI projects. I don't regret doing that, I'm rather pleased with how QAPI conditionals turned out. But I can't keep postponing all other QAPI projects. Because of that, this one will be slow going, at best. Sorry about that. Marc-André Lureau <marcandre.lureau@redhat.com> writes: > Hi, > > HMP and QMP commands are handled synchronously in qemu today. But > there are benefits allowing the command handler to re-enter the main > loop if the command cannot be handled synchronously, or if it is > long-lasting. Some bugs such as rhbz#1230527 are difficult to solve > without it. > > The common solution is to use a pair of command+event in this case. In particular, background jobs (qapi/jobs.json). They grew out of block jobs, and are still used only for "blocky" things. Using them more widely would probably make sense. > But this approach has a number of issues: > - you can't "fix" an existing command: you need a new API, and ad-hoc > documentation for that command+signal association, and old/broken > command deprecation Making a synchronous command asynchronous is an incompatible change. We need to let the client needs opt in. How is that done in this series? > - since the reply event is broadcasted and 'id' is used for matching the > request, it may conflict with other clients request 'id' space Any event that does that now is broken and needs to be fixed. The obvious fix is to include a monitor ID with the command ID. For events that can only ever be useful in the context of one particular monitor, we could unicast to that monitor instead; see below. Corollary: this is just a fixable bug, not a fundamental advantage of the async feature. > - it is arguably less efficient and elegant (weird API, useless return > in most cases, broadcast reply, no cancelling on disconnect etc) The return value is useful for synchronously reporting failure to start the background task. I grant you that background tasks may exist that won't ever fail to start. I challenge the idea that it's most of them. Broadcast reply could be avoided by unicasting events. If I remember correctly, Peter Xu even posted patches some time ago. We ended up not using them, because we found a better solution for the problem at hand. My point is: this isn't a fundamental problem, it's just the way we coded things up. What do you mean by "no cancelling on disconnect"? I'm ignoring "etc" unless you expand it into something specific. I'm also not taking the "weird" bait :) > The following series implements an async command solution instead. By > introducing a session context and a command return handler, it can: > - defer the return, allowing the mainloop to reenter > - return only to the caller (instead of broadcast events for reply) > - optionnally allow cancellation when the client is gone > - track on-going qapi command(s) per client/session > > and without introduction of new QMP APIs or client visible change. What do async commands provide that jobs lack? Why do we want both? I started to write a feature-by-feature comparison, but realized I don't have the time to figure out either jobs or async from their (rather sparse) documentation, let alone from code. > Existing qemu commands can be gradually replaced by async:true > variants when needed, while carefully reviewing the concurrency > aspects. The async:true commands marshaller helpers are splitted in > half, the calling and return functions. The command is called with a > QmpReturn context, that can return immediately or later, using the > generated return helper, which allows for a step-by-step conversion. > > The screendump command is converted to an async:true version to solve > rhbz#1230527. The command shows basic cancellation (this could be > extended if needed). It could be further improved to do asynchronous > IO writes as well. What is "basic cancellation"? What extension(s) do you have in mind? What's the impact of screendump writing synchronously?
Hi On Tue, May 21, 2019 at 4:18 PM Markus Armbruster <armbru@redhat.com> wrote: > > Marc-André, before you invest your time to answer my questions below: my > bandwidth for non-trivial QAPI features like this one is painfully > limited. To get your QAPI conditionals in, I had to postpone other QAPI > projects. I don't regret doing that, I'm rather pleased with how QAPI > conditionals turned out. But I can't keep postponing all other QAPI > projects. Because of that, this one will be slow going, at best. Sorry > about that. We have different priorities, fair enough. > > Marc-André Lureau <marcandre.lureau@redhat.com> writes: > > > Hi, > > > > HMP and QMP commands are handled synchronously in qemu today. But > > there are benefits allowing the command handler to re-enter the main > > loop if the command cannot be handled synchronously, or if it is > > long-lasting. Some bugs such as rhbz#1230527 are difficult to solve > > without it. > > > > The common solution is to use a pair of command+event in this case. > > In particular, background jobs (qapi/jobs.json). They grew out of block > jobs, and are still used only for "blocky" things. Using them more > widely would probably make sense. > > > But this approach has a number of issues: > > - you can't "fix" an existing command: you need a new API, and ad-hoc > > documentation for that command+signal association, and old/broken > > command deprecation > > Making a synchronous command asynchronous is an incompatible change. We > need to let the client needs opt in. How is that done in this series? No change visible on client side. I dropped the async command support a while ago already, based on your recommendations. I can dig the archive for the discussion if necessary. > > > - since the reply event is broadcasted and 'id' is used for matching the > > request, it may conflict with other clients request 'id' space > > Any event that does that now is broken and needs to be fixed. The > obvious fix is to include a monitor ID with the command ID. For events > that can only ever be useful in the context of one particular monitor, > we could unicast to that monitor instead; see below. > > Corollary: this is just a fixable bug, not a fundamental advantage of > the async feature. I am just pointing out today drawbacks of turning a function async by introducing new commands and signals. > > > - it is arguably less efficient and elegant (weird API, useless return > > in most cases, broadcast reply, no cancelling on disconnect etc) > > The return value is useful for synchronously reporting failure to start > the background task. I grant you that background tasks may exist that > won't ever fail to start. I challenge the idea that it's most of them. > > Broadcast reply could be avoided by unicasting events. If I remember > correctly, Peter Xu even posted patches some time ago. We ended up not > using them, because we found a better solution for the problem at hand. > My point is: this isn't a fundamental problem, it's just the way we > coded things up. > > What do you mean by "no cancelling on disconnect"? When the client disconnects, the background task keeps running, and there is no simple way to know about that event afaik. My proposal has a simple API for that (see "qmp: add qmp_return_is_cancelled()" patch). > > I'm ignoring "etc" unless you expand it into something specific. > > I'm also not taking the "weird" bait :) > > The following series implements an async command solution instead. By > > introducing a session context and a command return handler, it can: > > - defer the return, allowing the mainloop to reenter > > - return only to the caller (instead of broadcast events for reply) > > - optionnally allow cancellation when the client is gone > > - track on-going qapi command(s) per client/session > > > > and without introduction of new QMP APIs or client visible change. > > What do async commands provide that jobs lack? > > Why do we want both? They are different things, last we discussed it: jobs are geared toward block device operations, and do not provide simple qmp-level facilities that I listed above. What I introduce is a way for an *existing* QMP command to be splitted, so it can re-enter the main loop sanely (and not by introducing new commands or signals or making things unnecessarily more complicated). My proposal is fairly small: 27 files changed, 877 insertions(+), 260 deletions(-) Including test, and the qxl screendump fix, which account for about 1/3 of the series. > I started to write a feature-by-feature comparison, but realized I don't > have the time to figure out either jobs or async from their (rather > sparse) documentation, let alone from code. > > > Existing qemu commands can be gradually replaced by async:true > > variants when needed, while carefully reviewing the concurrency > > aspects. The async:true commands marshaller helpers are splitted in > > half, the calling and return functions. The command is called with a > > QmpReturn context, that can return immediately or later, using the > > generated return helper, which allows for a step-by-step conversion. > > > > The screendump command is converted to an async:true version to solve > > rhbz#1230527. The command shows basic cancellation (this could be > > extended if needed). It could be further improved to do asynchronous > > IO writes as well. > > What is "basic cancellation"? > What extension(s) do you have in mind? It checks for cancellation in a few places, between IO. Full cancellation would allow to cancel at any time. > > What's the impact of screendump writing synchronously? It can be pretty bad, think about 4k screens. It is 33177600 bytes, written in PPM format, blocking the main loop.. QMP operation doing large IO (dumps), or blocking on events, could be switched to this async form without introducing user-visible change, and with minimal effort compared to jobs. -- Marc-André Lureau
Marc-André Lureau <marcandre.lureau@gmail.com> writes: > Hi > > On Tue, May 21, 2019 at 4:18 PM Markus Armbruster <armbru@redhat.com> wrote: >> >> Marc-André, before you invest your time to answer my questions below: my >> bandwidth for non-trivial QAPI features like this one is painfully >> limited. To get your QAPI conditionals in, I had to postpone other QAPI >> projects. I don't regret doing that, I'm rather pleased with how QAPI >> conditionals turned out. But I can't keep postponing all other QAPI >> projects. Because of that, this one will be slow going, at best. Sorry >> about that. > > We have different priorities, fair enough. I wish I could give you better service. But no use pretending. >> Marc-André Lureau <marcandre.lureau@redhat.com> writes: >> >> > Hi, >> > >> > HMP and QMP commands are handled synchronously in qemu today. But >> > there are benefits allowing the command handler to re-enter the main >> > loop if the command cannot be handled synchronously, or if it is >> > long-lasting. Some bugs such as rhbz#1230527 are difficult to solve >> > without it. >> > >> > The common solution is to use a pair of command+event in this case. >> >> In particular, background jobs (qapi/jobs.json). They grew out of block >> jobs, and are still used only for "blocky" things. Using them more >> widely would probably make sense. >> >> > But this approach has a number of issues: >> > - you can't "fix" an existing command: you need a new API, and ad-hoc >> > documentation for that command+signal association, and old/broken >> > command deprecation >> >> Making a synchronous command asynchronous is an incompatible change. We >> need to let the client needs opt in. How is that done in this series? > > No change visible on client side. I dropped the async command support > a while ago already, based on your recommendations. I can dig the > archive for the discussion if necessary. Not right now. >> > - since the reply event is broadcasted and 'id' is used for matching the >> > request, it may conflict with other clients request 'id' space >> >> Any event that does that now is broken and needs to be fixed. The >> obvious fix is to include a monitor ID with the command ID. For events >> that can only ever be useful in the context of one particular monitor, >> we could unicast to that monitor instead; see below. >> >> Corollary: this is just a fixable bug, not a fundamental advantage of >> the async feature. > > I am just pointing out today drawbacks of turning a function async by > introducing new commands and signals. And I'm just pointing out that some of today's drawbacks could also be addressed differently :) >> > - it is arguably less efficient and elegant (weird API, useless return >> > in most cases, broadcast reply, no cancelling on disconnect etc) >> >> The return value is useful for synchronously reporting failure to start >> the background task. I grant you that background tasks may exist that >> won't ever fail to start. I challenge the idea that it's most of them. >> >> Broadcast reply could be avoided by unicasting events. If I remember >> correctly, Peter Xu even posted patches some time ago. We ended up not >> using them, because we found a better solution for the problem at hand. >> My point is: this isn't a fundamental problem, it's just the way we >> coded things up. >> >> What do you mean by "no cancelling on disconnect"? > > When the client disconnects, the background task keeps running, and > there is no simple way to know about that event afaik. My proposal has > a simple API for that (see "qmp: add qmp_return_is_cancelled()" > patch). Auto-cancellation on client disconnect may be exactly what's wanted for simple use cases. Jobs are designed with more use cases in mind. Consider a backup job that's take some time. We certainly don't want to cancel it just because the management application hiccups and disconnects. Instead, we want to permit the management application to recover, reconnect, find the backup job, examine its state, and resume managing it. To support this, jobs have a unique ID. Job cancellation is explicit. Jobs could acquire a "auto-cancel on disconnect" feature if there's a need. I'm not sure how asynchronous commands could support reconnect and resume. >> I'm ignoring "etc" unless you expand it into something specific. >> >> I'm also not taking the "weird" bait :) >> > The following series implements an async command solution instead. By >> > introducing a session context and a command return handler, it can: >> > - defer the return, allowing the mainloop to reenter >> > - return only to the caller (instead of broadcast events for reply) >> > - optionnally allow cancellation when the client is gone >> > - track on-going qapi command(s) per client/session >> > >> > and without introduction of new QMP APIs or client visible change. >> >> What do async commands provide that jobs lack? >> >> Why do we want both? > > They are different things, last we discussed it: jobs are geared > toward block device operations, Historical accident. We've discussed using them for non-blocky stuff, such as migration. Of course, discussions are cheap, code is what counts. > and do not provide simple qmp-level > facilities that I listed above. What I introduce is a way for an > *existing* QMP command to be splitted, so it can re-enter the main > loop sanely (and not by introducing new commands or signals or making > things unnecessarily more complicated). > > My proposal is fairly small: > 27 files changed, 877 insertions(+), 260 deletions(-) > > Including test, and the qxl screendump fix, which account for about > 1/3 of the series. > >> I started to write a feature-by-feature comparison, but realized I don't >> have the time to figure out either jobs or async from their (rather >> sparse) documentation, let alone from code. >> >> > Existing qemu commands can be gradually replaced by async:true >> > variants when needed, while carefully reviewing the concurrency >> > aspects. The async:true commands marshaller helpers are splitted in >> > half, the calling and return functions. The command is called with a >> > QmpReturn context, that can return immediately or later, using the >> > generated return helper, which allows for a step-by-step conversion. >> > >> > The screendump command is converted to an async:true version to solve >> > rhbz#1230527. The command shows basic cancellation (this could be >> > extended if needed). It could be further improved to do asynchronous >> > IO writes as well. >> >> What is "basic cancellation"? >> What extension(s) do you have in mind? > > It checks for cancellation in a few places, between IO. Full > cancellation would allow to cancel at any time. > >> >> What's the impact of screendump writing synchronously? > > It can be pretty bad, think about 4k screens. It is 33177600 bytes, > written in PPM format, blocking the main loop.. My question was specifically about "could be further improved to do asynchronous IO writes as well". What's the impact of not having this improvement? I *guess* it means that even with the asynchronous command, the synchronous writes still block "something", but I'm not sure what "something" may be, and how it could impact behavior. Hence my question. > QMP operation doing large IO (dumps), or blocking on events, could be > switched to this async form without introducing user-visible change, Letting the next QMP command start before the current one is done is a user-visible change. We can discuss whether the change is harmless. > and with minimal effort compared to jobs. To gauge the difference in effort, we'd need actual code to compare.
Hi On Thu, May 23, 2019 at 9:52 AM Markus Armbruster <armbru@redhat.com> wrote: > I'm not sure how asynchronous commands could support reconnect and > resume. The same way as current commands, including job commands. > > >> I'm ignoring "etc" unless you expand it into something specific. > >> > >> I'm also not taking the "weird" bait :) > >> > The following series implements an async command solution instead. By > >> > introducing a session context and a command return handler, it can: > >> > - defer the return, allowing the mainloop to reenter > >> > - return only to the caller (instead of broadcast events for reply) > >> > - optionnally allow cancellation when the client is gone > >> > - track on-going qapi command(s) per client/session > >> > > >> > and without introduction of new QMP APIs or client visible change. > >> > >> What do async commands provide that jobs lack? > >> > >> Why do we want both? > > > > They are different things, last we discussed it: jobs are geared > > toward block device operations, > > Historical accident. We've discussed using them for non-blocky stuff, > such as migration. Of course, discussions are cheap, code is what > counts. Using job API means providing new (& more complex) APIs to client. The screendump fix here doesn't need new API, it needs new internal dispatch of QMP commands: the purpose of this series. Whenever we can solve things on qemu side, I would rather not deprecate current API. > > and do not provide simple qmp-level > > facilities that I listed above. What I introduce is a way for an > > *existing* QMP command to be splitted, so it can re-enter the main > > loop sanely (and not by introducing new commands or signals or making > > things unnecessarily more complicated). > > > > My proposal is fairly small: > > 27 files changed, 877 insertions(+), 260 deletions(-) > > > > Including test, and the qxl screendump fix, which account for about > > 1/3 of the series. > > > >> I started to write a feature-by-feature comparison, but realized I don't > >> have the time to figure out either jobs or async from their (rather > >> sparse) documentation, let alone from code. > >> > >> > Existing qemu commands can be gradually replaced by async:true > >> > variants when needed, while carefully reviewing the concurrency > >> > aspects. The async:true commands marshaller helpers are splitted in > >> > half, the calling and return functions. The command is called with a > >> > QmpReturn context, that can return immediately or later, using the > >> > generated return helper, which allows for a step-by-step conversion. > >> > > >> > The screendump command is converted to an async:true version to solve > >> > rhbz#1230527. The command shows basic cancellation (this could be > >> > extended if needed). It could be further improved to do asynchronous > >> > IO writes as well. > >> > >> What is "basic cancellation"? > >> What extension(s) do you have in mind? > > > > It checks for cancellation in a few places, between IO. Full > > cancellation would allow to cancel at any time. > > > >> > >> What's the impact of screendump writing synchronously? > > > > It can be pretty bad, think about 4k screens. It is 33177600 bytes, > > written in PPM format, blocking the main loop.. > > My question was specifically about "could be further improved to do > asynchronous IO writes as well". What's the impact of not having this > improvement? I *guess* it means that even with the asynchronous > command, the synchronous writes still block "something", but I'm not > sure what "something" may be, and how it could impact behavior. Hence > my question. It blocks many things since the BQL is taken. The goal is not to improve responsiveness at this point, but to fix the QXL screendump bug, by introducing a split dispatch in QMP commands: callback for starting, and a separate return function. This is not rocket science. See below. > > > QMP operation doing large IO (dumps), or blocking on events, could be > > switched to this async form without introducing user-visible change, > > Letting the next QMP command start before the current one is done is a > user-visible change. We can discuss whether the change is harmless. Agree, from cover letter: Existing qemu commands can be gradually replaced by async:true variants when needed, while carefully reviewing the concurrency aspects. > > > and with minimal effort compared to jobs. > > To gauge the difference in effort, we'd need actual code to compare. It's a no-go to me, you don't want to teach all users out there with new job API for existing commands when you can improve or fix things in QEMU. The QEMU change for the command can't really be simpler than what I propose. You go from: qmp_foo() { // do foo synchronously and return something } to: qmp_foo_async(QmpReturn *r) { // do foo asynchronously (or return synchronously) } foo_done() { qmp_foo_async_return(r, something) } See "scripts: learn 'async' qapi commands" for the details. thanks -- Marc-André Lureau
Marc-André Lureau <marcandre.lureau@gmail.com> writes: > Hi > > On Thu, May 23, 2019 at 9:52 AM Markus Armbruster <armbru@redhat.com> wrote: >> I'm not sure how asynchronous commands could support reconnect and >> resume. > > The same way as current commands, including job commands. Consider the following scenario: a management application such as libvirt starts a long-running task with the intent to monitor it until it finishes. Half-way through, the management application needs to disconnect and reconnect for some reason (systemctl restart, or crash & recover, or whatever). If the long-running task is a job, the management application can resume after reconnect: the job's ID is as valid as it was before, and the commands to query and control the job work as before. What if it's and asynchronous command? >> >> I'm ignoring "etc" unless you expand it into something specific. >> >> >> >> I'm also not taking the "weird" bait :) >> >> > The following series implements an async command solution instead. By >> >> > introducing a session context and a command return handler, it can: >> >> > - defer the return, allowing the mainloop to reenter >> >> > - return only to the caller (instead of broadcast events for reply) >> >> > - optionnally allow cancellation when the client is gone >> >> > - track on-going qapi command(s) per client/session >> >> > >> >> > and without introduction of new QMP APIs or client visible change. >> >> >> >> What do async commands provide that jobs lack? >> >> >> >> Why do we want both? >> > >> > They are different things, last we discussed it: jobs are geared >> > toward block device operations, >> >> Historical accident. We've discussed using them for non-blocky stuff, >> such as migration. Of course, discussions are cheap, code is what >> counts. > > Using job API means providing new (& more complex) APIs to client. > > The screendump fix here doesn't need new API, it needs new internal > dispatch of QMP commands: the purpose of this series. > > Whenever we can solve things on qemu side, I would rather not > deprecate current API. Making a synchronous command asynchronous definitely changes API. You could still argue the change is easier to handle for QMP clients than a replacement by a job. [...]
On Mon, May 27, 2019 at 10:18:42AM +0200, Markus Armbruster wrote: > Marc-André Lureau <marcandre.lureau@gmail.com> writes: > > > Hi > > > > On Thu, May 23, 2019 at 9:52 AM Markus Armbruster <armbru@redhat.com> wrote: > >> I'm not sure how asynchronous commands could support reconnect and > >> resume. > > > > The same way as current commands, including job commands. > > Consider the following scenario: a management application such as > libvirt starts a long-running task with the intent to monitor it until > it finishes. Half-way through, the management application needs to > disconnect and reconnect for some reason (systemctl restart, or crash & > recover, or whatever). > > If the long-running task is a job, the management application can resume > after reconnect: the job's ID is as valid as it was before, and the > commands to query and control the job work as before. > > What if it's and asynchronous command? This is not meant for some long-running job which you have to manage. Allowing commands being asynchronous makes sense for things which (a) typically don't take long, and (b) don't need any management. So, if the connection goes down the job is simply canceled, and after reconnecting the management can simply send the same command again. > > Whenever we can solve things on qemu side, I would rather not > > deprecate current API. > > Making a synchronous command asynchronous definitely changes API. Inside qemu yes, sure. But for the QMP client nothing changes. cheers, Gerd
Gerd Hoffmann <kraxel@redhat.com> writes: > On Mon, May 27, 2019 at 10:18:42AM +0200, Markus Armbruster wrote: >> Marc-André Lureau <marcandre.lureau@gmail.com> writes: >> >> > Hi >> > >> > On Thu, May 23, 2019 at 9:52 AM Markus Armbruster <armbru@redhat.com> wrote: >> >> I'm not sure how asynchronous commands could support reconnect and >> >> resume. >> > >> > The same way as current commands, including job commands. >> >> Consider the following scenario: a management application such as >> libvirt starts a long-running task with the intent to monitor it until >> it finishes. Half-way through, the management application needs to >> disconnect and reconnect for some reason (systemctl restart, or crash & >> recover, or whatever). >> >> If the long-running task is a job, the management application can resume >> after reconnect: the job's ID is as valid as it was before, and the >> commands to query and control the job work as before. >> >> What if it's and asynchronous command? > > This is not meant for some long-running job which you have to manage. > > Allowing commands being asynchronous makes sense for things which (a) > typically don't take long, and (b) don't need any management. > > So, if the connection goes down the job is simply canceled, and after > reconnecting the management can simply send the same command again. Is this worth its own infrastructure? Would you hazard a guess on how many commands can take long enough to demand a conversion to asynchronous, yet not need any management? >> > Whenever we can solve things on qemu side, I would rather not >> > deprecate current API. >> >> Making a synchronous command asynchronous definitely changes API. > > Inside qemu yes, sure. But for the QMP client nothing changes. Command replies can arrive out of order, can't they?
Hi On Mon, May 27, 2019 at 3:23 PM Markus Armbruster <armbru@redhat.com> wrote: > > Gerd Hoffmann <kraxel@redhat.com> writes: > > > On Mon, May 27, 2019 at 10:18:42AM +0200, Markus Armbruster wrote: > >> Marc-André Lureau <marcandre.lureau@gmail.com> writes: > >> > >> > Hi > >> > > >> > On Thu, May 23, 2019 at 9:52 AM Markus Armbruster <armbru@redhat.com> wrote: > >> >> I'm not sure how asynchronous commands could support reconnect and > >> >> resume. > >> > > >> > The same way as current commands, including job commands. > >> > >> Consider the following scenario: a management application such as > >> libvirt starts a long-running task with the intent to monitor it until > >> it finishes. Half-way through, the management application needs to > >> disconnect and reconnect for some reason (systemctl restart, or crash & > >> recover, or whatever). > >> > >> If the long-running task is a job, the management application can resume > >> after reconnect: the job's ID is as valid as it was before, and the > >> commands to query and control the job work as before. > >> > >> What if it's and asynchronous command? > > > > This is not meant for some long-running job which you have to manage. > > > > Allowing commands being asynchronous makes sense for things which (a) > > typically don't take long, and (b) don't need any management. > > > > So, if the connection goes down the job is simply canceled, and after > > reconnecting the management can simply send the same command again. > > Is this worth its own infrastructure? Yes, not having to change/break the client side API is worth some effort. > Would you hazard a guess on how many commands can take long enough to > demand a conversion to asynchronous, yet not need any management? Some of the currently synchronous commands that are doing some substantial task (many of them are not simply reading values from memory) could be gradually converted, as needed. > >> > Whenever we can solve things on qemu side, I would rather not > >> > deprecate current API. > >> > >> Making a synchronous command asynchronous definitely changes API. > > > > Inside qemu yes, sure. But for the QMP client nothing changes. > > Command replies can arrive out of order, can't they? They are returned in order, see "QmpSession: return orderly". -- Marc-André Lureau
Hi, > > This is not meant for some long-running job which you have to manage. > > > > Allowing commands being asynchronous makes sense for things which (a) > > typically don't take long, and (b) don't need any management. > > > > So, if the connection goes down the job is simply canceled, and after > > reconnecting the management can simply send the same command again. > > Is this worth its own infrastructure? > > Would you hazard a guess on how many commands can take long enough to > demand a conversion to asynchronous, yet not need any management? Required: screendump with qxl (needs round-drop to spice-server display thread for fully up-to-date screen content, due to lazy rendering). Nice to have: Move anything which needs more than a milisecond to a thread or coroutine, so we avoid monitor commands causing guest-visible latency spikes due to holding the big qemu lock for too long. From a quick scan through monitor help hot candidates are screendump and pmemsave because they might write rather large data files. Dunno about savevm/loadvm. I think they stop the guest anyway. So moving them to async probably doesn't buy us much, at least from a latency point of view. cheers, Gerd
Am 27.05.2019 um 15:23 hat Markus Armbruster geschrieben: > Gerd Hoffmann <kraxel@redhat.com> writes: > > > On Mon, May 27, 2019 at 10:18:42AM +0200, Markus Armbruster wrote: > >> Marc-André Lureau <marcandre.lureau@gmail.com> writes: > >> > >> > Hi > >> > > >> > On Thu, May 23, 2019 at 9:52 AM Markus Armbruster <armbru@redhat.com> wrote: > >> >> I'm not sure how asynchronous commands could support reconnect and > >> >> resume. > >> > > >> > The same way as current commands, including job commands. > >> > >> Consider the following scenario: a management application such as > >> libvirt starts a long-running task with the intent to monitor it until > >> it finishes. Half-way through, the management application needs to > >> disconnect and reconnect for some reason (systemctl restart, or crash & > >> recover, or whatever). > >> > >> If the long-running task is a job, the management application can resume > >> after reconnect: the job's ID is as valid as it was before, and the > >> commands to query and control the job work as before. > >> > >> What if it's and asynchronous command? > > > > This is not meant for some long-running job which you have to manage. > > > > Allowing commands being asynchronous makes sense for things which (a) > > typically don't take long, and (b) don't need any management. > > > > So, if the connection goes down the job is simply canceled, and after > > reconnecting the management can simply send the same command again. > > Is this worth its own infrastructure? > > Would you hazard a guess on how many commands can take long enough to > demand a conversion to asynchronous, yet not need any management? Candidates are any commands that perform I/O. You don't want to hold the BQL while doing I/O. Probably most block layer commands fall into this category. In fact, even the commands to start a block job could probably make use of this infrastructure because they typically do some I/O before returning success for starting the job. > >> > Whenever we can solve things on qemu side, I would rather not > >> > deprecate current API. > >> > >> Making a synchronous command asynchronous definitely changes API. > > > > Inside qemu yes, sure. But for the QMP client nothing changes. > > Command replies can arrive out of order, can't they? My understanding is that this is just an internal change and commands still aren't processed in parallel. Kevin
On 5/21/19 10:17 AM, Markus Armbruster wrote: > Marc-André, before you invest your time to answer my questions below: my > bandwidth for non-trivial QAPI features like this one is painfully > limited. To get your QAPI conditionals in, I had to postpone other QAPI > projects. I don't regret doing that, I'm rather pleased with how QAPI > conditionals turned out. But I can't keep postponing all other QAPI > projects. Because of that, this one will be slow going, at best. Sorry > about that. > > Marc-André Lureau <marcandre.lureau@redhat.com> writes: > >> Hi, >> >> HMP and QMP commands are handled synchronously in qemu today. But >> there are benefits allowing the command handler to re-enter the main >> loop if the command cannot be handled synchronously, or if it is >> long-lasting. Some bugs such as rhbz#1230527 are difficult to solve >> without it. >> >> The common solution is to use a pair of command+event in this case. > > In particular, background jobs (qapi/jobs.json). They grew out of block > jobs, and are still used only for "blocky" things. Using them more > widely would probably make sense. > >> But this approach has a number of issues: >> - you can't "fix" an existing command: you need a new API, and ad-hoc >> documentation for that command+signal association, and old/broken >> command deprecation > > Making a synchronous command asynchronous is an incompatible change. We > need to let the client needs opt in. How is that done in this series? > >> - since the reply event is broadcasted and 'id' is used for matching the >> request, it may conflict with other clients request 'id' space > > Any event that does that now is broken and needs to be fixed. The > obvious fix is to include a monitor ID with the command ID. For events > that can only ever be useful in the context of one particular monitor, > we could unicast to that monitor instead; see below. > > Corollary: this is just a fixable bug, not a fundamental advantage of > the async feature. > >> - it is arguably less efficient and elegant (weird API, useless return >> in most cases, broadcast reply, no cancelling on disconnect etc) > > The return value is useful for synchronously reporting failure to start > the background task. I grant you that background tasks may exist that > won't ever fail to start. I challenge the idea that it's most of them. > > Broadcast reply could be avoided by unicasting events. If I remember > correctly, Peter Xu even posted patches some time ago. We ended up not > using them, because we found a better solution for the problem at hand. > My point is: this isn't a fundamental problem, it's just the way we > coded things up. > > What do you mean by "no cancelling on disconnect"? > > I'm ignoring "etc" unless you expand it into something specific. > > I'm also not taking the "weird" bait :) > >> The following series implements an async command solution instead. By >> introducing a session context and a command return handler, it can: >> - defer the return, allowing the mainloop to reenter >> - return only to the caller (instead of broadcast events for reply) >> - optionnally allow cancellation when the client is gone >> - track on-going qapi command(s) per client/session >> >> and without introduction of new QMP APIs or client visible change. > > What do async commands provide that jobs lack? > > Why do we want both? > > I started to write a feature-by-feature comparison, but realized I don't > have the time to figure out either jobs or async from their (rather > sparse) documentation, let alone from code. > Sorry about that. I still have a todo item from you in my inbox to improve the documentation there, but I've been focusing on bitmaps documentation lately instead, but I hope to branch it out as part of my caring a bit more about docs lately. I'll keep an eye out here. I don't really want to prohibit things on the sole basis that they aren't jobs, but I do want to make sure that adding a new lifecycle paradigm for commands doesn't complicate the jobs code in accidental ways. I'll try to look this over too, but I have a bit of a backlog right now. >> Existing qemu commands can be gradually replaced by async:true >> variants when needed, while carefully reviewing the concurrency >> aspects. The async:true commands marshaller helpers are splitted in >> half, the calling and return functions. The command is called with a >> QmpReturn context, that can return immediately or later, using the >> generated return helper, which allows for a step-by-step conversion. >> >> The screendump command is converted to an async:true version to solve >> rhbz#1230527. The command shows basic cancellation (this could be >> extended if needed). It could be further improved to do asynchronous >> IO writes as well. > > What is "basic cancellation"? > > What extension(s) do you have in mind? > > What's the impact of screendump writing synchronously? >
On Tue, Apr 9, 2019 at 6:12 PM Marc-André Lureau <marcandre.lureau@redhat.com> wrote: > > Hi, > > HMP and QMP commands are handled synchronously in qemu today. But > there are benefits allowing the command handler to re-enter the main > loop if the command cannot be handled synchronously, or if it is > long-lasting. Some bugs such as rhbz#1230527 are difficult to solve > without it. > > The common solution is to use a pair of command+event in this case. > But this approach has a number of issues: > - you can't "fix" an existing command: you need a new API, and ad-hoc > documentation for that command+signal association, and old/broken > command deprecation > - since the reply event is broadcasted and 'id' is used for matching the > request, it may conflict with other clients request 'id' space > - it is arguably less efficient and elegant (weird API, useless return > in most cases, broadcast reply, no cancelling on disconnect etc) > > The following series implements an async command solution instead. By > introducing a session context and a command return handler, it can: > - defer the return, allowing the mainloop to reenter > - return only to the caller (instead of broadcast events for reply) > - optionnally allow cancellation when the client is gone > - track on-going qapi command(s) per client/session > > and without introduction of new QMP APIs or client visible change. > > Existing qemu commands can be gradually replaced by async:true > variants when needed, while carefully reviewing the concurrency > aspects. The async:true commands marshaller helpers are splitted in > half, the calling and return functions. The command is called with a > QmpReturn context, that can return immediately or later, using the > generated return helper, which allows for a step-by-step conversion. > > The screendump command is converted to an async:true version to solve > rhbz#1230527. The command shows basic cancellation (this could be > extended if needed). It could be further improved to do asynchronous > IO writes as well. > > v4: > - rebased, mostly adapting to new OOB code > (there was not much feedback in v3 for the async command part, > but preliminary patches got merged!) > - drop the RFC status ping > > v3: > - complete rework, dropping the asynchronous commands visibility from > the protocol side entirely (until there is a real need for it) > - rebased, with a few preliminary cleanup patches > - teach asynchronous commands to HMP > > v2: > - documentation fixes and improvements > - fix calling async commands sync without id > - fix bad hmp monitor assert > - add a few extra asserts > - add async with no-id failure and screendump test > > Marc-André Lureau (20): > qmp: constify QmpCommand and list > json-lexer: make it safe to call destroy multiple times > qmp: add QmpSession > QmpSession: add a return callback > QmpSession: add json parser and use it in qga > monitor: use qmp session to parse json feed > qga: simplify dispatch_return_cb > QmpSession: introduce QmpReturn > qmp: simplify qmp_return_error() > QmpSession: keep a queue of pending commands > QmpSession: return orderly > qmp: introduce asynchronous command type > scripts: learn 'async' qapi commands > qmp: add qmp_return_is_cancelled() > monitor: add qmp_return_get_monitor() > console: add graphic_hw_update_done() > console: make screendump asynchronous > monitor: start making qmp_human_monitor_command() asynchronous > monitor: teach HMP about asynchronous commands > hmp: call the asynchronous QMP screendump to fix outdated/glitches > > qapi/misc.json | 3 +- > qapi/ui.json | 3 +- > scripts/qapi/commands.py | 151 ++++++++++++++--- > scripts/qapi/common.py | 15 +- > scripts/qapi/doc.py | 3 +- > scripts/qapi/introspect.py | 3 +- > hmp.h | 3 +- > include/monitor/monitor.h | 3 + > include/qapi/qmp/dispatch.h | 89 +++++++++- > include/qapi/qmp/json-parser.h | 7 +- > include/ui/console.h | 5 + > hmp.c | 6 +- > hw/display/qxl-render.c | 9 +- > hw/display/qxl.c | 1 + > monitor.c | 198 ++++++++++++++-------- > qapi/qmp-dispatch.c | 214 +++++++++++++++++++----- > qapi/qmp-registry.c | 33 +++- > qga/commands.c | 2 +- > qga/main.c | 51 ++---- > qobject/json-lexer.c | 5 +- > qobject/json-streamer.c | 3 +- > tests/test-qmp-cmds.c | 206 +++++++++++++++++++---- > ui/console.c | 100 +++++++++-- > hmp-commands.hx | 3 +- > tests/qapi-schema/qapi-schema-test.json | 5 + > tests/qapi-schema/qapi-schema-test.out | 8 + > tests/qapi-schema/test-qapi.py | 8 +- > 27 files changed, 877 insertions(+), 260 deletions(-) > > -- > 2.21.0.196.g041f5ea1cf > > -- Marc-André Lureau
Patchew URL: https://patchew.org/QEMU/20190409161009.6322-1-marcandre.lureau@redhat.com/ Hi, This series failed the docker-mingw@fedora build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. === TEST SCRIPT BEGIN === #!/bin/bash time make docker-test-mingw@fedora SHOW_ENV=1 J=14 NETWORK=1 === TEST SCRIPT END === The full log is available at http://patchew.org/logs/20190409161009.6322-1-marcandre.lureau@redhat.com/testing.docker-mingw@fedora/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-devel@redhat.com
© 2016 - 2024 Red Hat, Inc.