[Qemu-devel] [RFC PATCH] coroutines: generate wrapper code

Vladimir Sementsov-Ogievskiy posted 1 patch 5 years, 2 months ago
Failed in applying to current master (apply log)
There is a newer version of this series
Makefile                     |  5 ++
Makefile.objs                |  2 +-
include/block/block.h        |  3 ++
block.c                      | 77 ++---------------------------
coroutine-wrapper            |  2 +
scripts/coroutine-wrapper.py | 96 ++++++++++++++++++++++++++++++++++++
6 files changed, 110 insertions(+), 75 deletions(-)
create mode 100644 coroutine-wrapper
create mode 100755 scripts/coroutine-wrapper.py
[Qemu-devel] [RFC PATCH] coroutines: generate wrapper code
Posted by Vladimir Sementsov-Ogievskiy 5 years, 2 months ago
Hi all!

We have a very frequent pattern of wrapping a coroutine_fn function
to be called from non-coroutine context:

  - create structure to pack parameters
  - create function to call original function taking parameters from
    struct
  - create wrapper, which in case of non-coroutine context will
    create a coroutine, enter it and start poll-loop.

Here is a draft of template code + example how it can be used to drop a
lot of similar code.

Hope someone like it except me)

Hmm, it's possible to write it in pure python without jinja2, but I
think it's not a true way.

It's also possible to generate functions which will just create a
corresponding coroutines, for not generic cases. Such generated
generators should look like

Coroutine *bdrv_co_check__create_co(
    BlockDriverState *bs, BdrvCheckResult *res, BdrvCheckMode fix,
    int *_ret, bool *_in_progress);

 - same parameters as in original bdrv_co_check(), plus out-pointers to
   set return value and mark finished.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 Makefile                     |  5 ++
 Makefile.objs                |  2 +-
 include/block/block.h        |  3 ++
 block.c                      | 77 ++---------------------------
 coroutine-wrapper            |  2 +
 scripts/coroutine-wrapper.py | 96 ++++++++++++++++++++++++++++++++++++
 6 files changed, 110 insertions(+), 75 deletions(-)
 create mode 100644 coroutine-wrapper
 create mode 100755 scripts/coroutine-wrapper.py

diff --git a/Makefile b/Makefile
index 1278a3eb52..8d19b06cf1 100644
--- a/Makefile
+++ b/Makefile
@@ -122,6 +122,8 @@ endif
 
 GENERATED_FILES += module_block.h
 
+GENERATED_FILES += block-gen.c
+
 TRACE_HEADERS = trace-root.h $(trace-events-subdirs:%=%/trace.h)
 TRACE_SOURCES = trace-root.c $(trace-events-subdirs:%=%/trace.c)
 TRACE_DTRACE =
@@ -138,6 +140,9 @@ GENERATED_FILES += $(TRACE_SOURCES)
 GENERATED_FILES += $(BUILD_DIR)/trace-events-all
 GENERATED_FILES += .git-submodule-status
 
+block-gen.c: coroutine-wrapper $(SRC_PATH)/scripts/coroutine-wrapper.py
+	$(call quiet-command, python $(SRC_PATH)/scripts/coroutine-wrapper.py < $< > $@,"GEN","$(TARGET_DIR)$@")
+
 trace-group-name = $(shell dirname $1 | sed -e 's/[^a-zA-Z0-9]/_/g')
 
 tracetool-y = $(SRC_PATH)/scripts/tracetool.py
diff --git a/Makefile.objs b/Makefile.objs
index 67a054b08a..16159d3e0f 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -23,7 +23,7 @@ slirp-obj-$(CONFIG_SLIRP) = slirp/
 # block-obj-y is code used by both qemu system emulation and qemu-img
 
 block-obj-y += nbd/
-block-obj-y += block.o blockjob.o job.o
+block-obj-y += block.o blockjob.o job.o block-gen.o
 block-obj-y += block/ scsi/
 block-obj-y += qemu-io-cmds.o
 block-obj-$(CONFIG_REPLICATION) += replication.o
diff --git a/include/block/block.h b/include/block/block.h
index 57233cf2c0..61b2ca6fe5 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -371,6 +371,8 @@ typedef enum {
 } BdrvCheckMode;
 
 int bdrv_check(BlockDriverState *bs, BdrvCheckResult *res, BdrvCheckMode fix);
+int coroutine_fn bdrv_co_check(BlockDriverState *bs,
+                               BdrvCheckResult *res, BdrvCheckMode fix);
 
 /* The units of offset and total_work_size may be chosen arbitrarily by the
  * block driver; total_work_size may change during the course of the amendment
@@ -399,6 +401,7 @@ int bdrv_co_ioctl(BlockDriverState *bs, int req, void *buf);
 
 /* Invalidate any cached metadata used by image formats */
 void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp);
+void coroutine_fn bdrv_co_invalidate_cache(BlockDriverState *bs, Error **errp);
 void bdrv_invalidate_cache_all(Error **errp);
 int bdrv_inactivate_all(void);
 
diff --git a/block.c b/block.c
index b67d9b7b65..37ab050f0e 100644
--- a/block.c
+++ b/block.c
@@ -3706,8 +3706,8 @@ static void bdrv_delete(BlockDriverState *bs)
  * free of errors) or -errno when an internal error occurred. The results of the
  * check are stored in res.
  */
-static int coroutine_fn bdrv_co_check(BlockDriverState *bs,
-                                      BdrvCheckResult *res, BdrvCheckMode fix)
+int coroutine_fn bdrv_co_check(BlockDriverState *bs,
+                               BdrvCheckResult *res, BdrvCheckMode fix)
 {
     if (bs->drv == NULL) {
         return -ENOMEDIUM;
@@ -3720,43 +3720,6 @@ static int coroutine_fn bdrv_co_check(BlockDriverState *bs,
     return bs->drv->bdrv_co_check(bs, res, fix);
 }
 
-typedef struct CheckCo {
-    BlockDriverState *bs;
-    BdrvCheckResult *res;
-    BdrvCheckMode fix;
-    int ret;
-} CheckCo;
-
-static void bdrv_check_co_entry(void *opaque)
-{
-    CheckCo *cco = opaque;
-    cco->ret = bdrv_co_check(cco->bs, cco->res, cco->fix);
-    aio_wait_kick();
-}
-
-int bdrv_check(BlockDriverState *bs,
-               BdrvCheckResult *res, BdrvCheckMode fix)
-{
-    Coroutine *co;
-    CheckCo cco = {
-        .bs = bs,
-        .res = res,
-        .ret = -EINPROGRESS,
-        .fix = fix,
-    };
-
-    if (qemu_in_coroutine()) {
-        /* Fast-path if already in coroutine context */
-        bdrv_check_co_entry(&cco);
-    } else {
-        co = qemu_coroutine_create(bdrv_check_co_entry, &cco);
-        bdrv_coroutine_enter(bs, co);
-        BDRV_POLL_WHILE(bs, cco.ret == -EINPROGRESS);
-    }
-
-    return cco.ret;
-}
-
 /*
  * Return values:
  * 0        - success
@@ -4623,8 +4586,7 @@ void bdrv_init_with_whitelist(void)
     bdrv_init();
 }
 
-static void coroutine_fn bdrv_co_invalidate_cache(BlockDriverState *bs,
-                                                  Error **errp)
+void coroutine_fn bdrv_co_invalidate_cache(BlockDriverState *bs, Error **errp)
 {
     BdrvChild *child, *parent;
     uint64_t perm, shared_perm;
@@ -4705,39 +4667,6 @@ static void coroutine_fn bdrv_co_invalidate_cache(BlockDriverState *bs,
     }
 }
 
-typedef struct InvalidateCacheCo {
-    BlockDriverState *bs;
-    Error **errp;
-    bool done;
-} InvalidateCacheCo;
-
-static void coroutine_fn bdrv_invalidate_cache_co_entry(void *opaque)
-{
-    InvalidateCacheCo *ico = opaque;
-    bdrv_co_invalidate_cache(ico->bs, ico->errp);
-    ico->done = true;
-    aio_wait_kick();
-}
-
-void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp)
-{
-    Coroutine *co;
-    InvalidateCacheCo ico = {
-        .bs = bs,
-        .done = false,
-        .errp = errp
-    };
-
-    if (qemu_in_coroutine()) {
-        /* Fast-path if already in coroutine context */
-        bdrv_invalidate_cache_co_entry(&ico);
-    } else {
-        co = qemu_coroutine_create(bdrv_invalidate_cache_co_entry, &ico);
-        bdrv_coroutine_enter(bs, co);
-        BDRV_POLL_WHILE(bs, !ico.done);
-    }
-}
-
 void bdrv_invalidate_cache_all(Error **errp)
 {
     BlockDriverState *bs;
diff --git a/coroutine-wrapper b/coroutine-wrapper
new file mode 100644
index 0000000000..9d8bc35afc
--- /dev/null
+++ b/coroutine-wrapper
@@ -0,0 +1,2 @@
+bdrv_check: int bdrv_co_check(BlockDriverState *bs, BdrvCheckResult *res, BdrvCheckMode fix)
+bdrv_invalidate_cache: void bdrv_co_invalidate_cache(BlockDriverState *bs, Error **errp)
diff --git a/scripts/coroutine-wrapper.py b/scripts/coroutine-wrapper.py
new file mode 100755
index 0000000000..16cd2d5e71
--- /dev/null
+++ b/scripts/coroutine-wrapper.py
@@ -0,0 +1,96 @@
+#!/usr/bin/env python
+
+import re
+from jinja2 import Environment
+
+env = Environment(trim_blocks=True, lstrip_blocks=True)
+
+header = """/*
+ * File is generated by scripts/coroutine-wrapper.py
+ */
+
+#include "qemu/osdep.h"
+#include "block/block_int.h"
+"""
+
+template = """
+{% set has_ret = ret_type != 'void' %}
+{% set arg_names = args|map(attribute='name')|list %}
+
+/*
+ * Wrappers for {{name}}
+ */
+
+typedef struct {{name}}__ArgumentsPack {
+{% for arg in args %}
+    {{arg['full']}};
+{% endfor %}
+
+{% if has_ret %}
+    {{ret_type}} _ret;
+{% endif %}
+    bool _in_progress;
+} {{name}}__ArgumentsPack;
+
+static void {{name}}__entry(void *opaque)
+{
+    {{name}}__ArgumentsPack *pack = opaque;
+    {%+ if has_ret %}pack->_ret = {% endif %}{{name}}(pack->{{ arg_names|join(', pack->') }});
+    pack->_in_progress = false;
+    aio_wait_kick();
+}
+
+{{ret_type}} {{bdrv_poll_name}}({{args|join(', ', attribute='full')}})
+{
+    if (qemu_in_coroutine()) {
+        /* Fast-path if already in coroutine context */
+        {%+ if has_ret %}return {% endif %}{{name}}({{ arg_names|join(', ') }});
+    } else {
+        {{name}}__ArgumentsPack pack = {
+        {% for arg in args %}
+            .{{arg['name']}} = {{arg['name']}},
+        {% endfor %}
+
+            ._in_progress = true,
+        };
+        Coroutine *co = qemu_coroutine_create({{name}}__entry, &pack);
+        bdrv_coroutine_enter(bs, co);
+        BDRV_POLL_WHILE(bs, pack._in_progress);
+    {% if has_ret %}
+
+        return pack._ret;
+    {% endif %}
+    }
+}
+"""
+
+func_decl_re = re.compile(r'^(([^:]+):\s*)?([^(]+) ([a-z][a-z0-9_]*)\((.*)\)$')
+param_re = re.compile(r'(.*[ *])([a-z][a-z0-9_]*)')
+
+def make_wrapper(funcrion_declaration):
+    params = {}
+
+    try:
+        m = func_decl_re.match(funcrion_declaration)
+        params['ret_type'] = m.group(3)
+        params['name'] = m.group(4)
+        params['bdrv_poll_name'] = m.group(2) or params['name'] + '__bdrv_poll'
+        raw_args = m.group(5).split(', ')
+        args = []
+        for raw_arg in raw_args:
+            arg = param_re.match(raw_arg).group(0, 1, 2)
+            args.append({'full': arg[0], 'type': arg[1], 'name': arg[2]})
+
+        params['args'] = args
+    except ValueError:
+        sys.exit('Failed to parse function declaration')
+
+    return env.from_string(template).render(**params)
+
+
+if __name__ == '__main__':
+    import sys
+
+    print(header)
+    for line in sys.stdin:
+        print(make_wrapper(line))
-- 
2.18.0


Re: [Qemu-devel] [RFC PATCH] coroutines: generate wrapper code
Posted by Vladimir Sementsov-Ogievskiy 5 years, 2 months ago
attached generated file

-- 
Best regards,
Vladimir
Re: [Qemu-devel] [Qemu-block] [RFC PATCH] coroutines: generate wrapper code
Posted by Stefan Hajnoczi 5 years, 2 months ago
On Fri, Feb 08, 2019 at 05:11:22PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> We have a very frequent pattern of wrapping a coroutine_fn function
> to be called from non-coroutine context:
> 
>   - create structure to pack parameters
>   - create function to call original function taking parameters from
>     struct
>   - create wrapper, which in case of non-coroutine context will
>     create a coroutine, enter it and start poll-loop.
>
> Here is a draft of template code + example how it can be used to drop a
> lot of similar code.
> 
> Hope someone like it except me)

My 2 cents.  Cons:

 * Synchronous poll loops are an anti-pattern.  They block all of QEMU
   with the big mutex held.  Making them easier to write is
   questionable because we should aim to have as few of these as
   possible.

 * Code generation makes the code easier to write but harder to read.
   Code is read more than written.  In this case I think open coding
   isn't too bad and I prefer it to reading a code generation script to
   understand how it works.

If we were planning to add lots more of these then I agree code
generation would help.  But in this case I'd rather not.
Re: [Qemu-devel] [Qemu-block] [RFC PATCH] coroutines: generate wrapper code
Posted by Vladimir Sementsov-Ogievskiy 5 years, 2 months ago
11.02.2019 6:42, Stefan Hajnoczi wrote:
> On Fri, Feb 08, 2019 at 05:11:22PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> Hi all!
>>
>> We have a very frequent pattern of wrapping a coroutine_fn function
>> to be called from non-coroutine context:
>>
>>    - create structure to pack parameters
>>    - create function to call original function taking parameters from
>>      struct
>>    - create wrapper, which in case of non-coroutine context will
>>      create a coroutine, enter it and start poll-loop.
>>
>> Here is a draft of template code + example how it can be used to drop a
>> lot of similar code.
>>
>> Hope someone like it except me)
> 
> My 2 cents.  Cons:
> 
>   * Synchronous poll loops are an anti-pattern.  They block all of QEMU
>     with the big mutex held.  Making them easier to write is
>     questionable because we should aim to have as few of these as
>     possible.

Understand. Do we have a concept or a kind of target for a future to get rid of
these a lot of poll-loops? What is the right way? At least for block-layer?

> 
>   * Code generation makes the code easier to write but harder to read.
>     Code is read more than written.  In this case I think open coding
>     isn't too bad and I prefer it to reading a code generation script to
>     understand how it works.

But you can read generated code in same way. You only need to read generator
script if it generates something wrong, but should be rare.

> 
> If we were planning to add lots more of these then I agree code
> generation would help.  But in this case I'd rather not.
> 

What do you think at least of generating code to create a coroutine from a function
with multiple arguments?

-- 
Best regards,
Vladimir

Re: [Qemu-devel] [Qemu-block] [RFC PATCH] coroutines: generate wrapper code
Posted by Stefan Hajnoczi 5 years, 2 months ago
On Mon, Feb 11, 2019 at 09:38:37AM +0000, Vladimir Sementsov-Ogievskiy wrote:
> 11.02.2019 6:42, Stefan Hajnoczi wrote:
> > On Fri, Feb 08, 2019 at 05:11:22PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> >> Hi all!
> >>
> >> We have a very frequent pattern of wrapping a coroutine_fn function
> >> to be called from non-coroutine context:
> >>
> >>    - create structure to pack parameters
> >>    - create function to call original function taking parameters from
> >>      struct
> >>    - create wrapper, which in case of non-coroutine context will
> >>      create a coroutine, enter it and start poll-loop.
> >>
> >> Here is a draft of template code + example how it can be used to drop a
> >> lot of similar code.
> >>
> >> Hope someone like it except me)
> > 
> > My 2 cents.  Cons:
> > 
> >   * Synchronous poll loops are an anti-pattern.  They block all of QEMU
> >     with the big mutex held.  Making them easier to write is
> >     questionable because we should aim to have as few of these as
> >     possible.
> 
> Understand. Do we have a concept or a kind of target for a future to get rid of
> these a lot of poll-loops? What is the right way? At least for block-layer?

It's non-trivial.  The nested event loop could be flattened if there was
a mechanism to stop further activity on a specific object only (e.g.
BlockDriverState).  That way the event loop can continue processing
events for other objects and device emulation could continue for other
objects.

Unfortunately there are interactions between objects like in block jobs
that act on multiple BDSes, so it becomes even tricky.

A simple way of imagining this is to make each object an "actor"
coroutine.  The coroutine processes a single message (request) at a time
and yields when it needs to wait.  Callers send messages and expect
asynchronous responses.  This model is bad for efficiency (parallelism
is necessary) but at least it offers a sane way of thinking about
multiple asynchronous components coordinating together.  (It's another
way of saying, let's put everything into coroutines.)

The advantage of a flat event loop is that a hang in one object (e.g.
I/O getting stuck in one file) doesn't freeze the entire event loop.

> > 
> >   * Code generation makes the code easier to write but harder to read.
> >     Code is read more than written.  In this case I think open coding
> >     isn't too bad and I prefer it to reading a code generation script to
> >     understand how it works.
> 
> But you can read generated code in same way. You only need to read generator
> script if it generates something wrong, but should be rare.

Generated code isn't visible unless the code has been built and indexed
(if you're using ctags).  This makes it harder for people to navigate
the code.

> > 
> > If we were planning to add lots more of these then I agree code
> > generation would help.  But in this case I'd rather not.
> > 
> 
> What do you think at least of generating code to create a coroutine from a function
> with multiple arguments?

If it's easy to read without requiring one to figure out how the magic
works, then I like it.

Stefan
Re: [Qemu-devel] [Qemu-block] [RFC PATCH] coroutines: generate wrapper code
Posted by Vladimir Sementsov-Ogievskiy 5 years, 2 months ago
12.02.2019 6:22, Stefan Hajnoczi wrote:
> On Mon, Feb 11, 2019 at 09:38:37AM +0000, Vladimir Sementsov-Ogievskiy wrote:
>> 11.02.2019 6:42, Stefan Hajnoczi wrote:
>>> On Fri, Feb 08, 2019 at 05:11:22PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>> Hi all!
>>>>
>>>> We have a very frequent pattern of wrapping a coroutine_fn function
>>>> to be called from non-coroutine context:
>>>>
>>>>     - create structure to pack parameters
>>>>     - create function to call original function taking parameters from
>>>>       struct
>>>>     - create wrapper, which in case of non-coroutine context will
>>>>       create a coroutine, enter it and start poll-loop.
>>>>
>>>> Here is a draft of template code + example how it can be used to drop a
>>>> lot of similar code.
>>>>
>>>> Hope someone like it except me)
>>>
>>> My 2 cents.  Cons:
>>>
>>>    * Synchronous poll loops are an anti-pattern.  They block all of QEMU
>>>      with the big mutex held.  Making them easier to write is
>>>      questionable because we should aim to have as few of these as
>>>      possible.
>>
>> Understand. Do we have a concept or a kind of target for a future to get rid of
>> these a lot of poll-loops? What is the right way? At least for block-layer?
> 
> It's non-trivial.  The nested event loop could be flattened if there was
> a mechanism to stop further activity on a specific object only (e.g.
> BlockDriverState).  That way the event loop can continue processing
> events for other objects and device emulation could continue for other
> objects.
> 
> Unfortunately there are interactions between objects like in block jobs
> that act on multiple BDSes, so it becomes even tricky.
> 
> A simple way of imagining this is to make each object an "actor"
> coroutine.  The coroutine processes a single message (request) at a time
> and yields when it needs to wait.  Callers send messages and expect
> asynchronous responses.  This model is bad for efficiency (parallelism

hmm, and with all these loops, where is parallelism?

> is necessary) but at least it offers a sane way of thinking about
> multiple asynchronous components coordinating together.  (It's another
> way of saying, let's put everything into coroutines.)
> 
> The advantage of a flat event loop is that a hang in one object (e.g.
> I/O getting stuck in one file) doesn't freeze the entire event loop.
>  >>>
>>>    * Code generation makes the code easier to write but harder to read.
>>>      Code is read more than written.  In this case I think open coding
>>>      isn't too bad and I prefer it to reading a code generation script to
>>>      understand how it works.
>>
>> But you can read generated code in same way. You only need to read generator
>> script if it generates something wrong, but should be rare.
> 
> Generated code isn't visible unless the code has been built and indexed
> (if you're using ctags).  This makes it harder for people to navigate
> the code.
> 
>>>
>>> If we were planning to add lots more of these then I agree code
>>> generation would help.  But in this case I'd rather not.
>>>
>>
>> What do you think at least of generating code to create a coroutine from a function
>> with multiple arguments?
> 
> If it's easy to read without requiring one to figure out how the magic
> works, then I like it.
> 

Ok, I'll think about it.


-- 
Best regards,
Vladimir

Re: [Qemu-devel] [Qemu-block] [RFC PATCH] coroutines: generate wrapper code
Posted by Stefan Hajnoczi 5 years, 2 months ago
On Tue, Feb 12, 2019 at 10:03:19AM +0000, Vladimir Sementsov-Ogievskiy wrote:
> 12.02.2019 6:22, Stefan Hajnoczi wrote:
> > On Mon, Feb 11, 2019 at 09:38:37AM +0000, Vladimir Sementsov-Ogievskiy wrote:
> >> 11.02.2019 6:42, Stefan Hajnoczi wrote:
> >>> On Fri, Feb 08, 2019 at 05:11:22PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> >>>> Hi all!
> >>>>
> >>>> We have a very frequent pattern of wrapping a coroutine_fn function
> >>>> to be called from non-coroutine context:
> >>>>
> >>>>     - create structure to pack parameters
> >>>>     - create function to call original function taking parameters from
> >>>>       struct
> >>>>     - create wrapper, which in case of non-coroutine context will
> >>>>       create a coroutine, enter it and start poll-loop.
> >>>>
> >>>> Here is a draft of template code + example how it can be used to drop a
> >>>> lot of similar code.
> >>>>
> >>>> Hope someone like it except me)
> >>>
> >>> My 2 cents.  Cons:
> >>>
> >>>    * Synchronous poll loops are an anti-pattern.  They block all of QEMU
> >>>      with the big mutex held.  Making them easier to write is
> >>>      questionable because we should aim to have as few of these as
> >>>      possible.
> >>
> >> Understand. Do we have a concept or a kind of target for a future to get rid of
> >> these a lot of poll-loops? What is the right way? At least for block-layer?
> > 
> > It's non-trivial.  The nested event loop could be flattened if there was
> > a mechanism to stop further activity on a specific object only (e.g.
> > BlockDriverState).  That way the event loop can continue processing
> > events for other objects and device emulation could continue for other
> > objects.
> > 
> > Unfortunately there are interactions between objects like in block jobs
> > that act on multiple BDSes, so it becomes even tricky.
> > 
> > A simple way of imagining this is to make each object an "actor"
> > coroutine.  The coroutine processes a single message (request) at a time
> > and yields when it needs to wait.  Callers send messages and expect
> > asynchronous responses.  This model is bad for efficiency (parallelism
> 
> hmm, and with all these loops, where is parallelism?

In this model all activity, including the async operations that work in
parallel today, go through an actor coroutine.  The loss of parallelism
isn't in synchronous polling operations, it's in the async operations.

But I'm just using this as a simple model for reasoning about a world
where no synchronous polling loops are necessary.  I don't think we
would try to implement it - at least not without a lot more thought into
how to make it efficient.

Stefan
Re: [Qemu-devel] [Qemu-block] [RFC PATCH] coroutines: generate wrapper code
Posted by Kevin Wolf 5 years, 2 months ago
Am 12.02.2019 um 04:22 hat Stefan Hajnoczi geschrieben:
> On Mon, Feb 11, 2019 at 09:38:37AM +0000, Vladimir Sementsov-Ogievskiy wrote:
> > 11.02.2019 6:42, Stefan Hajnoczi wrote:
> > > On Fri, Feb 08, 2019 at 05:11:22PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > >> Hi all!
> > >>
> > >> We have a very frequent pattern of wrapping a coroutine_fn function
> > >> to be called from non-coroutine context:
> > >>
> > >>    - create structure to pack parameters
> > >>    - create function to call original function taking parameters from
> > >>      struct
> > >>    - create wrapper, which in case of non-coroutine context will
> > >>      create a coroutine, enter it and start poll-loop.
> > >>
> > >> Here is a draft of template code + example how it can be used to drop a
> > >> lot of similar code.
> > >>
> > >> Hope someone like it except me)
> > > 
> > > My 2 cents.  Cons:
> > > 
> > >   * Synchronous poll loops are an anti-pattern.  They block all of QEMU
> > >     with the big mutex held.  Making them easier to write is
> > >     questionable because we should aim to have as few of these as
> > >     possible.
> > 
> > Understand. Do we have a concept or a kind of target for a future to get rid of
> > these a lot of poll-loops? What is the right way? At least for block-layer?
> 
> It's non-trivial.  The nested event loop could be flattened if there was
> a mechanism to stop further activity on a specific object only (e.g.
> BlockDriverState).  That way the event loop can continue processing
> events for other objects and device emulation could continue for other
> objects.

The mechanism to stop activity on BlockDriverStates is bdrv_drain(). But
I don't see how this is related. Nested event loops aren't for stopping
concurrent activity (events related to async operations started earlier
are still processed in nested event loops), but for making progress on
the operation we're waiting for. They happen when synchronous code calls
into asynchronous code.

The way to get rid of them is making their callers async. I think we
would come a long way if we ran QMP command handlers (at least the block
related ones) and qemu-img operations in coroutines instead of blocking
while we wait for the result.

> Unfortunately there are interactions between objects like in block jobs
> that act on multiple BDSes, so it becomes even tricky.
> 
> A simple way of imagining this is to make each object an "actor"
> coroutine.  The coroutine processes a single message (request) at a time
> and yields when it needs to wait.  Callers send messages and expect
> asynchronous responses.  This model is bad for efficiency (parallelism
> is necessary) but at least it offers a sane way of thinking about
> multiple asynchronous components coordinating together.  (It's another
> way of saying, let's put everything into coroutines.)
> 
> The advantage of a flat event loop is that a hang in one object (e.g.
> I/O getting stuck in one file) doesn't freeze the entire event loop.

I think this one is more theoretical because you'll still have
dependencies between the components. blk_drain_all() isn't hanging
because the code is designed suboptimally, but because its semantics is
to wait until all requests have completed. And it's called because this
semantics is required.

Kevin
Re: [Qemu-devel] [Qemu-block] [RFC PATCH] coroutines: generate wrapper code
Posted by Stefan Hajnoczi 5 years, 2 months ago
On Tue, Feb 12, 2019 at 12:58:40PM +0100, Kevin Wolf wrote:
> Am 12.02.2019 um 04:22 hat Stefan Hajnoczi geschrieben:
> > On Mon, Feb 11, 2019 at 09:38:37AM +0000, Vladimir Sementsov-Ogievskiy wrote:
> > > 11.02.2019 6:42, Stefan Hajnoczi wrote:
> > > > On Fri, Feb 08, 2019 at 05:11:22PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > >> Hi all!
> > > >>
> > > >> We have a very frequent pattern of wrapping a coroutine_fn function
> > > >> to be called from non-coroutine context:
> > > >>
> > > >>    - create structure to pack parameters
> > > >>    - create function to call original function taking parameters from
> > > >>      struct
> > > >>    - create wrapper, which in case of non-coroutine context will
> > > >>      create a coroutine, enter it and start poll-loop.
> > > >>
> > > >> Here is a draft of template code + example how it can be used to drop a
> > > >> lot of similar code.
> > > >>
> > > >> Hope someone like it except me)
> > > > 
> > > > My 2 cents.  Cons:
> > > > 
> > > >   * Synchronous poll loops are an anti-pattern.  They block all of QEMU
> > > >     with the big mutex held.  Making them easier to write is
> > > >     questionable because we should aim to have as few of these as
> > > >     possible.
> > > 
> > > Understand. Do we have a concept or a kind of target for a future to get rid of
> > > these a lot of poll-loops? What is the right way? At least for block-layer?
> > 
> > It's non-trivial.  The nested event loop could be flattened if there was
> > a mechanism to stop further activity on a specific object only (e.g.
> > BlockDriverState).  That way the event loop can continue processing
> > events for other objects and device emulation could continue for other
> > objects.
> 
> The mechanism to stop activity on BlockDriverStates is bdrv_drain(). But
> I don't see how this is related. Nested event loops aren't for stopping
> concurrent activity (events related to async operations started earlier
> are still processed in nested event loops), but for making progress on
> the operation we're waiting for. They happen when synchronous code calls
> into asynchronous code.
> 
> The way to get rid of them is making their callers async. I think we
> would come a long way if we ran QMP command handlers (at least the block
> related ones) and qemu-img operations in coroutines instead of blocking
> while we wait for the result.

A difficult caller is device reset, where we need to drain all requests.
But even converting some sync code paths to async is a win because it
removes places where QEMU can get stuck.

Regarding block QMP handlers, do you mean suspending the monitor when
a command yields?  The monitor will be unresponsive to the outside
world, so this doesn't solve the problem from the QMP client's
perspective.  This is why async QMP and jobs are interesting but it's a
lot of work both inside QEMU and for clients like libvirt.

> 
> > Unfortunately there are interactions between objects like in block jobs
> > that act on multiple BDSes, so it becomes even tricky.
> > 
> > A simple way of imagining this is to make each object an "actor"
> > coroutine.  The coroutine processes a single message (request) at a time
> > and yields when it needs to wait.  Callers send messages and expect
> > asynchronous responses.  This model is bad for efficiency (parallelism
> > is necessary) but at least it offers a sane way of thinking about
> > multiple asynchronous components coordinating together.  (It's another
> > way of saying, let's put everything into coroutines.)
> > 
> > The advantage of a flat event loop is that a hang in one object (e.g.
> > I/O getting stuck in one file) doesn't freeze the entire event loop.
> 
> I think this one is more theoretical because you'll still have
> dependencies between the components. blk_drain_all() isn't hanging
> because the code is designed suboptimally, but because its semantics is
> to wait until all requests have completed. And it's called because this
> semantics is required.

If we try to convert everything to async there will be two cases:
1. Accidental sync code which can be made async.  (Rare nowadays?)
2. Fundamental synchronization points that require waiting.

When you reach a point that hangs there is still the possibility of a
timeout or an explicit cancel.  Today QEMU supports neither, so a
command that gets stuck will hang QEMU for as long as it takes.

If QMP clients want timeouts or cancel then making everything async is
necessary.  If not, then we can leave it as is and simply audit the code
for accidental sync code (there used to be a lot of this but it's rarer
now) and convert it.

Stefan
Re: [Qemu-devel] [Qemu-block] [RFC PATCH] coroutines: generate wrapper code
Posted by Kevin Wolf 5 years, 2 months ago
Am 13.02.2019 um 07:58 hat Stefan Hajnoczi geschrieben:
> On Tue, Feb 12, 2019 at 12:58:40PM +0100, Kevin Wolf wrote:
> > Am 12.02.2019 um 04:22 hat Stefan Hajnoczi geschrieben:
> > > On Mon, Feb 11, 2019 at 09:38:37AM +0000, Vladimir Sementsov-Ogievskiy wrote:
> > > > 11.02.2019 6:42, Stefan Hajnoczi wrote:
> > > > > On Fri, Feb 08, 2019 at 05:11:22PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > >> Hi all!
> > > > >>
> > > > >> We have a very frequent pattern of wrapping a coroutine_fn function
> > > > >> to be called from non-coroutine context:
> > > > >>
> > > > >>    - create structure to pack parameters
> > > > >>    - create function to call original function taking parameters from
> > > > >>      struct
> > > > >>    - create wrapper, which in case of non-coroutine context will
> > > > >>      create a coroutine, enter it and start poll-loop.
> > > > >>
> > > > >> Here is a draft of template code + example how it can be used to drop a
> > > > >> lot of similar code.
> > > > >>
> > > > >> Hope someone like it except me)
> > > > > 
> > > > > My 2 cents.  Cons:
> > > > > 
> > > > >   * Synchronous poll loops are an anti-pattern.  They block all of QEMU
> > > > >     with the big mutex held.  Making them easier to write is
> > > > >     questionable because we should aim to have as few of these as
> > > > >     possible.
> > > > 
> > > > Understand. Do we have a concept or a kind of target for a future to get rid of
> > > > these a lot of poll-loops? What is the right way? At least for block-layer?
> > > 
> > > It's non-trivial.  The nested event loop could be flattened if there was
> > > a mechanism to stop further activity on a specific object only (e.g.
> > > BlockDriverState).  That way the event loop can continue processing
> > > events for other objects and device emulation could continue for other
> > > objects.
> > 
> > The mechanism to stop activity on BlockDriverStates is bdrv_drain(). But
> > I don't see how this is related. Nested event loops aren't for stopping
> > concurrent activity (events related to async operations started earlier
> > are still processed in nested event loops), but for making progress on
> > the operation we're waiting for. They happen when synchronous code calls
> > into asynchronous code.
> > 
> > The way to get rid of them is making their callers async. I think we
> > would come a long way if we ran QMP command handlers (at least the block
> > related ones) and qemu-img operations in coroutines instead of blocking
> > while we wait for the result.
> 
> A difficult caller is device reset, where we need to drain all requests.
> But even converting some sync code paths to async is a win because it
> removes places where QEMU can get stuck.

Yes, as I said, draining a node can hang. And it can hang not because
there are nested event loops or because of any other bad design
decision, but because waiting for all requests to complete is required.

The only thing we could try to improve this is cancelling requests after
a timeout (or triggered by an OOB QMP command?) during a drain
operation, but cancelling requests hasn't really been a success story so
far.

> Regarding block QMP handlers, do you mean suspending the monitor when
> a command yields?  The monitor will be unresponsive to the outside
> world, so this doesn't solve the problem from the QMP client's
> perspective.  This is why async QMP and jobs are interesting but it's a
> lot of work both inside QEMU and for clients like libvirt.

Yes, it wouldn't keep the monitor responsive as long as the monitor
protocol is synchronous. But it would keep the VM running at least, the
GUI would stay responsive etc.

Blocking the monitor is again nothing that restructuring the code could
fix. It requires a change to the QMP protocol, but then it will easily
fit in the current design.

Nested event loops are unrelated.

> > > Unfortunately there are interactions between objects like in block jobs
> > > that act on multiple BDSes, so it becomes even tricky.
> > > 
> > > A simple way of imagining this is to make each object an "actor"
> > > coroutine.  The coroutine processes a single message (request) at a time
> > > and yields when it needs to wait.  Callers send messages and expect
> > > asynchronous responses.  This model is bad for efficiency (parallelism
> > > is necessary) but at least it offers a sane way of thinking about
> > > multiple asynchronous components coordinating together.  (It's another
> > > way of saying, let's put everything into coroutines.)
> > > 
> > > The advantage of a flat event loop is that a hang in one object (e.g.
> > > I/O getting stuck in one file) doesn't freeze the entire event loop.
> > 
> > I think this one is more theoretical because you'll still have
> > dependencies between the components. blk_drain_all() isn't hanging
> > because the code is designed suboptimally, but because its semantics is
> > to wait until all requests have completed. And it's called because this
> > semantics is required.
> 
> If we try to convert everything to async there will be two cases:
> 1. Accidental sync code which can be made async.  (Rare nowadays?)
> 2. Fundamental synchronization points that require waiting.
> 
> When you reach a point that hangs there is still the possibility of a
> timeout or an explicit cancel.  Today QEMU supports neither, so a
> command that gets stuck will hang QEMU for as long as it takes.
> 
> If QMP clients want timeouts or cancel then making everything async is
> necessary.  If not, then we can leave it as is and simply audit the code
> for accidental sync code (there used to be a lot of this but it's rarer
> now) and convert it.

I'm not sure what makes sync code only "accidentally" sync, but at least
everything the monitor does is sync.

With your assessment that we can leave everything as it is if QMP
clients don't want improvements, you're ignoring that currently, not
only the QMP monitor becomes unresponsive, but it holds the BQL while
doing so and brings down the whole VM this way. So even without changing
the QMP protocol, there is something to be gained by making it async
(i.e. executing the block layer command handlers in a coroutine).

Kevin
Re: [Qemu-devel] [Qemu-block] [RFC PATCH] coroutines: generate wrapper code
Posted by Stefan Hajnoczi 5 years, 2 months ago
On Wed, Feb 13, 2019 at 11:09:56AM +0100, Kevin Wolf wrote:
> Am 13.02.2019 um 07:58 hat Stefan Hajnoczi geschrieben:
> > On Tue, Feb 12, 2019 at 12:58:40PM +0100, Kevin Wolf wrote:
> > > Am 12.02.2019 um 04:22 hat Stefan Hajnoczi geschrieben:
> > > > On Mon, Feb 11, 2019 at 09:38:37AM +0000, Vladimir Sementsov-Ogievskiy wrote:
> > > > > 11.02.2019 6:42, Stefan Hajnoczi wrote:
> > > > > > On Fri, Feb 08, 2019 at 05:11:22PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > > >> Hi all!
> > > > > >>
> > > > > >> We have a very frequent pattern of wrapping a coroutine_fn function
> > > > > >> to be called from non-coroutine context:
> > > > > >>
> > > > > >>    - create structure to pack parameters
> > > > > >>    - create function to call original function taking parameters from
> > > > > >>      struct
> > > > > >>    - create wrapper, which in case of non-coroutine context will
> > > > > >>      create a coroutine, enter it and start poll-loop.
> > > > > >>
> > > > > >> Here is a draft of template code + example how it can be used to drop a
> > > > > >> lot of similar code.
> > > > > >>
> > > > > >> Hope someone like it except me)
> > > > > > 
> > > > > > My 2 cents.  Cons:
> > > > > > 
> > > > > >   * Synchronous poll loops are an anti-pattern.  They block all of QEMU
> > > > > >     with the big mutex held.  Making them easier to write is
> > > > > >     questionable because we should aim to have as few of these as
> > > > > >     possible.
> > > > > 
> > > > > Understand. Do we have a concept or a kind of target for a future to get rid of
> > > > > these a lot of poll-loops? What is the right way? At least for block-layer?
> > > > 
> > > > It's non-trivial.  The nested event loop could be flattened if there was
> > > > a mechanism to stop further activity on a specific object only (e.g.
> > > > BlockDriverState).  That way the event loop can continue processing
> > > > events for other objects and device emulation could continue for other
> > > > objects.
> > > 
> > > The mechanism to stop activity on BlockDriverStates is bdrv_drain(). But
> > > I don't see how this is related. Nested event loops aren't for stopping
> > > concurrent activity (events related to async operations started earlier
> > > are still processed in nested event loops), but for making progress on
> > > the operation we're waiting for. They happen when synchronous code calls
> > > into asynchronous code.
> > > 
> > > The way to get rid of them is making their callers async. I think we
> > > would come a long way if we ran QMP command handlers (at least the block
> > > related ones) and qemu-img operations in coroutines instead of blocking
> > > while we wait for the result.
> > 
> > A difficult caller is device reset, where we need to drain all requests.
> > But even converting some sync code paths to async is a win because it
> > removes places where QEMU can get stuck.
> 
> Yes, as I said, draining a node can hang. And it can hang not because
> there are nested event loops or because of any other bad design
> decision, but because waiting for all requests to complete is required.
> 
> The only thing we could try to improve this is cancelling requests after
> a timeout (or triggered by an OOB QMP command?) during a drain
> operation, but cancelling requests hasn't really been a success story so
> far.

I agree, cancelling block I/O requests seems unworkable in the general
case because it's not supported across all protocols (POSIX files, etc).

> > Regarding block QMP handlers, do you mean suspending the monitor when
> > a command yields?  The monitor will be unresponsive to the outside
> > world, so this doesn't solve the problem from the QMP client's
> > perspective.  This is why async QMP and jobs are interesting but it's a
> > lot of work both inside QEMU and for clients like libvirt.
> 
> Yes, it wouldn't keep the monitor responsive as long as the monitor
> protocol is synchronous. But it would keep the VM running at least, the
> GUI would stay responsive etc.
> 
> Blocking the monitor is again nothing that restructuring the code could
> fix. It requires a change to the QMP protocol, but then it will easily
> fit in the current design.
> 
> Nested event loops are unrelated.

Okay.

> > > > Unfortunately there are interactions between objects like in block jobs
> > > > that act on multiple BDSes, so it becomes even tricky.
> > > > 
> > > > A simple way of imagining this is to make each object an "actor"
> > > > coroutine.  The coroutine processes a single message (request) at a time
> > > > and yields when it needs to wait.  Callers send messages and expect
> > > > asynchronous responses.  This model is bad for efficiency (parallelism
> > > > is necessary) but at least it offers a sane way of thinking about
> > > > multiple asynchronous components coordinating together.  (It's another
> > > > way of saying, let's put everything into coroutines.)
> > > > 
> > > > The advantage of a flat event loop is that a hang in one object (e.g.
> > > > I/O getting stuck in one file) doesn't freeze the entire event loop.
> > > 
> > > I think this one is more theoretical because you'll still have
> > > dependencies between the components. blk_drain_all() isn't hanging
> > > because the code is designed suboptimally, but because its semantics is
> > > to wait until all requests have completed. And it's called because this
> > > semantics is required.
> > 
> > If we try to convert everything to async there will be two cases:
> > 1. Accidental sync code which can be made async.  (Rare nowadays?)
> > 2. Fundamental synchronization points that require waiting.
> > 
> > When you reach a point that hangs there is still the possibility of a
> > timeout or an explicit cancel.  Today QEMU supports neither, so a
> > command that gets stuck will hang QEMU for as long as it takes.
> > 
> > If QMP clients want timeouts or cancel then making everything async is
> > necessary.  If not, then we can leave it as is and simply audit the code
> > for accidental sync code (there used to be a lot of this but it's rarer
> > now) and convert it.
> 
> I'm not sure what makes sync code only "accidentally" sync, but at least
> everything the monitor does is sync.
> 
> With your assessment that we can leave everything as it is if QMP
> clients don't want improvements, you're ignoring that currently, not
> only the QMP monitor becomes unresponsive, but it holds the BQL while
> doing so and brings down the whole VM this way. So even without changing
> the QMP protocol, there is something to be gained by making it async
> (i.e. executing the block layer command handlers in a coroutine).

True.  Releasing the BQL helps when the hang is brief (a few seconds).
I see that as a performance problem - fixing jitter.

Beyond a few seconds a VM that cannot be observed or controlled via the
management tools is a problem and not very useful.

Stefan