block: Synchronous bdrv_*() from coroutine in different AioContext

[RFC PATCH 2/3] block: Allow bdrv_run_co() from different AioContext

Posted by Kevin Wolf 5 years, 9 months ago

Coroutine functions that are entered through bdrv_run_co() are already
safe to call from synchronous code in a different AioContext because
bdrv_coroutine_enter() will schedule them in the context of the node.

However, the coroutine fastpath still requires that we're already in the
right AioContext when called in coroutine context.

In order to make the behaviour more consistent and to make life a bit
easier for callers, let's check the AioContext and automatically move
the current coroutine around if we're not in the right context yet.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/io.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index c1badaadc9..7808e8bdc0 100644
--- a/block/io.c
+++ b/block/io.c
@@ -895,8 +895,21 @@ static int bdrv_run_co(BlockDriverState *bs, CoroutineEntry *entry,
                        void *opaque, int *ret)
 {
     if (qemu_in_coroutine()) {
-        /* Fast-path if already in coroutine context */
+        Coroutine *self = qemu_coroutine_self();
+        AioContext *bs_ctx = bdrv_get_aio_context(bs);
+        AioContext *co_ctx = qemu_coroutine_get_aio_context(self);
+
+        if (bs_ctx != co_ctx) {
+            /* Move to the iothread of the node */
+            aio_co_schedule(bs_ctx, self);
+            qemu_coroutine_yield();
+        }
         entry(opaque);
+        if (bs_ctx != co_ctx) {
+            /* Move back to the original AioContext */
+            aio_co_schedule(bs_ctx, self);
+            qemu_coroutine_yield();
+        }
     } else {
         Coroutine *co = qemu_coroutine_create(entry, opaque);
         *ret = NOT_DONE;
-- 
2.25.3

Re: [RFC PATCH 2/3] block: Allow bdrv_run_co() from different AioContext

Posted by Thomas Lamprecht 5 years, 9 months ago

On 5/12/20 4:43 PM, Kevin Wolf wrote:
> Coroutine functions that are entered through bdrv_run_co() are already
> safe to call from synchronous code in a different AioContext because
> bdrv_coroutine_enter() will schedule them in the context of the node.
> 
> However, the coroutine fastpath still requires that we're already in the
> right AioContext when called in coroutine context.
> 
> In order to make the behaviour more consistent and to make life a bit
> easier for callers, let's check the AioContext and automatically move
> the current coroutine around if we're not in the right context yet.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/io.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/block/io.c b/block/io.c
> index c1badaadc9..7808e8bdc0 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -895,8 +895,21 @@ static int bdrv_run_co(BlockDriverState *bs, CoroutineEntry *entry,
>                         void *opaque, int *ret)
>  {
>      if (qemu_in_coroutine()) {
> -        /* Fast-path if already in coroutine context */
> +        Coroutine *self = qemu_coroutine_self();
> +        AioContext *bs_ctx = bdrv_get_aio_context(bs);
> +        AioContext *co_ctx = qemu_coroutine_get_aio_context(self);
> +
> +        if (bs_ctx != co_ctx) {
> +            /* Move to the iothread of the node */
> +            aio_co_schedule(bs_ctx, self);
> +            qemu_coroutine_yield();
> +        }
>          entry(opaque);
> +        if (bs_ctx != co_ctx) {
> +            /* Move back to the original AioContext */
> +            aio_co_schedule(bs_ctx, self);

shouldn't it use co_ctx here, as else it's just scheduled again on the one from bs?

Looks OK for me besides that.

> +            qemu_coroutine_yield();
> +        }
>      } else {
>          Coroutine *co = qemu_coroutine_create(entry, opaque);
>          *ret = NOT_DONE;
>

Re: [RFC PATCH 2/3] block: Allow bdrv_run_co() from different AioContext

Posted by Kevin Wolf 5 years, 9 months ago

Am 12.05.2020 um 18:02 hat Thomas Lamprecht geschrieben:
> On 5/12/20 4:43 PM, Kevin Wolf wrote:
> > Coroutine functions that are entered through bdrv_run_co() are already
> > safe to call from synchronous code in a different AioContext because
> > bdrv_coroutine_enter() will schedule them in the context of the node.
> > 
> > However, the coroutine fastpath still requires that we're already in the
> > right AioContext when called in coroutine context.
> > 
> > In order to make the behaviour more consistent and to make life a bit
> > easier for callers, let's check the AioContext and automatically move
> > the current coroutine around if we're not in the right context yet.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  block/io.c | 15 ++++++++++++++-
> >  1 file changed, 14 insertions(+), 1 deletion(-)
> > 
> > diff --git a/block/io.c b/block/io.c
> > index c1badaadc9..7808e8bdc0 100644
> > --- a/block/io.c
> > +++ b/block/io.c
> > @@ -895,8 +895,21 @@ static int bdrv_run_co(BlockDriverState *bs, CoroutineEntry *entry,
> >                         void *opaque, int *ret)
> >  {
> >      if (qemu_in_coroutine()) {
> > -        /* Fast-path if already in coroutine context */
> > +        Coroutine *self = qemu_coroutine_self();
> > +        AioContext *bs_ctx = bdrv_get_aio_context(bs);
> > +        AioContext *co_ctx = qemu_coroutine_get_aio_context(self);
> > +
> > +        if (bs_ctx != co_ctx) {
> > +            /* Move to the iothread of the node */
> > +            aio_co_schedule(bs_ctx, self);
> > +            qemu_coroutine_yield();
> > +        }
> >          entry(opaque);
> > +        if (bs_ctx != co_ctx) {
> > +            /* Move back to the original AioContext */
> > +            aio_co_schedule(bs_ctx, self);
> 
> shouldn't it use co_ctx here, as else it's just scheduled again on the
> one from bs?

Oops, you're right, of course.

> Looks OK for me besides that.

Thanks!

Kevin

Re: [RFC PATCH 2/3] block: Allow bdrv_run_co() from different AioContext

Posted by Stefan Reiter 5 years, 8 months ago

On 5/12/20 4:43 PM, Kevin Wolf wrote:
> Coroutine functions that are entered through bdrv_run_co() are already
> safe to call from synchronous code in a different AioContext because
> bdrv_coroutine_enter() will schedule them in the context of the node.
> 
> However, the coroutine fastpath still requires that we're already in the
> right AioContext when called in coroutine context.
> 
> In order to make the behaviour more consistent and to make life a bit
> easier for callers, let's check the AioContext and automatically move
> the current coroutine around if we're not in the right context yet.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>   block/io.c | 15 ++++++++++++++-
>   1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/block/io.c b/block/io.c
> index c1badaadc9..7808e8bdc0 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -895,8 +895,21 @@ static int bdrv_run_co(BlockDriverState *bs, CoroutineEntry *entry,
>                          void *opaque, int *ret)
>   {
>       if (qemu_in_coroutine()) {
> -        /* Fast-path if already in coroutine context */
> +        Coroutine *self = qemu_coroutine_self();
> +        AioContext *bs_ctx = bdrv_get_aio_context(bs);
> +        AioContext *co_ctx = qemu_coroutine_get_aio_context(self);
> +
> +        if (bs_ctx != co_ctx) {
> +            /* Move to the iothread of the node */
> +            aio_co_schedule(bs_ctx, self);
> +            qemu_coroutine_yield();

I'm pretty sure this can lead to a race: When the thread we're 
re-scheduling to is faster to schedule us than we can reach 
qemu_coroutine_yield, then we'll get an abort ("Co-routine re-entered 
recursively"), since co->caller is still set.

I've seen this happen in our code when I try to do the scheduling 
fandangle there.

Is there a safer way to have a coroutine reschedule itself? Some lock 
missing?

> +        }
>           entry(opaque);
> +        if (bs_ctx != co_ctx) {
> +            /* Move back to the original AioContext */
> +            aio_co_schedule(bs_ctx, self);
> +            qemu_coroutine_yield();
> +        }
>       } else {
>           Coroutine *co = qemu_coroutine_create(entry, opaque);
>           *ret = NOT_DONE;
>

Re: [RFC PATCH 2/3] block: Allow bdrv_run_co() from different AioContext

Posted by Kevin Wolf 5 years, 8 months ago

Am 25.05.2020 um 16:18 hat Stefan Reiter geschrieben:
> On 5/12/20 4:43 PM, Kevin Wolf wrote:
> > Coroutine functions that are entered through bdrv_run_co() are already
> > safe to call from synchronous code in a different AioContext because
> > bdrv_coroutine_enter() will schedule them in the context of the node.
> > 
> > However, the coroutine fastpath still requires that we're already in the
> > right AioContext when called in coroutine context.
> > 
> > In order to make the behaviour more consistent and to make life a bit
> > easier for callers, let's check the AioContext and automatically move
> > the current coroutine around if we're not in the right context yet.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >   block/io.c | 15 ++++++++++++++-
> >   1 file changed, 14 insertions(+), 1 deletion(-)
> > 
> > diff --git a/block/io.c b/block/io.c
> > index c1badaadc9..7808e8bdc0 100644
> > --- a/block/io.c
> > +++ b/block/io.c
> > @@ -895,8 +895,21 @@ static int bdrv_run_co(BlockDriverState *bs, CoroutineEntry *entry,
> >                          void *opaque, int *ret)
> >   {
> >       if (qemu_in_coroutine()) {
> > -        /* Fast-path if already in coroutine context */
> > +        Coroutine *self = qemu_coroutine_self();
> > +        AioContext *bs_ctx = bdrv_get_aio_context(bs);
> > +        AioContext *co_ctx = qemu_coroutine_get_aio_context(self);
> > +
> > +        if (bs_ctx != co_ctx) {
> > +            /* Move to the iothread of the node */
> > +            aio_co_schedule(bs_ctx, self);
> > +            qemu_coroutine_yield();
> 
> I'm pretty sure this can lead to a race: When the thread we're re-scheduling
> to is faster to schedule us than we can reach qemu_coroutine_yield, then
> we'll get an abort ("Co-routine re-entered recursively"), since co->caller
> is still set.
> 
> I've seen this happen in our code when I try to do the scheduling fandangle
> there.

Ah, crap. I guess letting a coroutine re-schedule itself is only safe
within the same thread then.

> Is there a safer way to have a coroutine reschedule itself? Some lock
> missing?

There is no problem that can't be solved by adding another level of
indirection... We would have to schedule a BH in the original thread
that will only schedule the coroutine in its new thread after it has
yielded.

Maybe we should actually introduce a helper function that moves the
current coroutine to a different AioContext this way.

Kevin

Re: [RFC PATCH 2/3] block: Allow bdrv_run_co() from different AioContext

Posted by Kevin Wolf 5 years, 8 months ago

Am 25.05.2020 um 18:41 hat Kevin Wolf geschrieben:
> Am 25.05.2020 um 16:18 hat Stefan Reiter geschrieben:
> > On 5/12/20 4:43 PM, Kevin Wolf wrote:
> > > Coroutine functions that are entered through bdrv_run_co() are already
> > > safe to call from synchronous code in a different AioContext because
> > > bdrv_coroutine_enter() will schedule them in the context of the node.
> > > 
> > > However, the coroutine fastpath still requires that we're already in the
> > > right AioContext when called in coroutine context.
> > > 
> > > In order to make the behaviour more consistent and to make life a bit
> > > easier for callers, let's check the AioContext and automatically move
> > > the current coroutine around if we're not in the right context yet.
> > > 
> > > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > > ---
> > >   block/io.c | 15 ++++++++++++++-
> > >   1 file changed, 14 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/block/io.c b/block/io.c
> > > index c1badaadc9..7808e8bdc0 100644
> > > --- a/block/io.c
> > > +++ b/block/io.c
> > > @@ -895,8 +895,21 @@ static int bdrv_run_co(BlockDriverState *bs, CoroutineEntry *entry,
> > >                          void *opaque, int *ret)
> > >   {
> > >       if (qemu_in_coroutine()) {
> > > -        /* Fast-path if already in coroutine context */
> > > +        Coroutine *self = qemu_coroutine_self();
> > > +        AioContext *bs_ctx = bdrv_get_aio_context(bs);
> > > +        AioContext *co_ctx = qemu_coroutine_get_aio_context(self);
> > > +
> > > +        if (bs_ctx != co_ctx) {
> > > +            /* Move to the iothread of the node */
> > > +            aio_co_schedule(bs_ctx, self);
> > > +            qemu_coroutine_yield();
> > 
> > I'm pretty sure this can lead to a race: When the thread we're re-scheduling
> > to is faster to schedule us than we can reach qemu_coroutine_yield, then
> > we'll get an abort ("Co-routine re-entered recursively"), since co->caller
> > is still set.
> > 
> > I've seen this happen in our code when I try to do the scheduling fandangle
> > there.
> 
> Ah, crap. I guess letting a coroutine re-schedule itself is only safe
> within the same thread then.
> 
> > Is there a safer way to have a coroutine reschedule itself? Some lock
> > missing?
> 
> There is no problem that can't be solved by adding another level of
> indirection... We would have to schedule a BH in the original thread
> that will only schedule the coroutine in its new thread after it has
> yielded.
> 
> Maybe we should actually introduce a helper function that moves the
> current coroutine to a different AioContext this way.

Like this:

https://repo.or.cz/qemu/kevin.git/commitdiff/ed0244ba4ac699f7e8eaf7512ff25645cf43bda2

The series for which I need this isn't quite ready yet, so I haven't
sent it as a patch yet, but if it proves useful in other contexts, we
can always commit it without the rest.

Kevin

Re: [RFC PATCH 2/3] block: Allow bdrv_run_co() from different AioContext

Posted by Stefan Reiter 5 years, 8 months ago

On 5/26/20 6:42 PM, Kevin Wolf wrote:
> Am 25.05.2020 um 18:41 hat Kevin Wolf geschrieben:
>> Am 25.05.2020 um 16:18 hat Stefan Reiter geschrieben:
>>> On 5/12/20 4:43 PM, Kevin Wolf wrote:
>>>> Coroutine functions that are entered through bdrv_run_co() are already
>>>> safe to call from synchronous code in a different AioContext because
>>>> bdrv_coroutine_enter() will schedule them in the context of the node.
>>>>
>>>> However, the coroutine fastpath still requires that we're already in the
>>>> right AioContext when called in coroutine context.
>>>>
>>>> In order to make the behaviour more consistent and to make life a bit
>>>> easier for callers, let's check the AioContext and automatically move
>>>> the current coroutine around if we're not in the right context yet.
>>>>
>>>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>>>> ---
>>>>    block/io.c | 15 ++++++++++++++-
>>>>    1 file changed, 14 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/block/io.c b/block/io.c
>>>> index c1badaadc9..7808e8bdc0 100644
>>>> --- a/block/io.c
>>>> +++ b/block/io.c
>>>> @@ -895,8 +895,21 @@ static int bdrv_run_co(BlockDriverState *bs, CoroutineEntry *entry,
>>>>                           void *opaque, int *ret)
>>>>    {
>>>>        if (qemu_in_coroutine()) {
>>>> -        /* Fast-path if already in coroutine context */
>>>> +        Coroutine *self = qemu_coroutine_self();
>>>> +        AioContext *bs_ctx = bdrv_get_aio_context(bs);
>>>> +        AioContext *co_ctx = qemu_coroutine_get_aio_context(self);
>>>> +
>>>> +        if (bs_ctx != co_ctx) {
>>>> +            /* Move to the iothread of the node */
>>>> +            aio_co_schedule(bs_ctx, self);
>>>> +            qemu_coroutine_yield();
>>>
>>> I'm pretty sure this can lead to a race: When the thread we're re-scheduling
>>> to is faster to schedule us than we can reach qemu_coroutine_yield, then
>>> we'll get an abort ("Co-routine re-entered recursively"), since co->caller
>>> is still set.
>>>
>>> I've seen this happen in our code when I try to do the scheduling fandangle
>>> there.
>>
>> Ah, crap. I guess letting a coroutine re-schedule itself is only safe
>> within the same thread then.
>>
>>> Is there a safer way to have a coroutine reschedule itself? Some lock
>>> missing?
>>
>> There is no problem that can't be solved by adding another level of
>> indirection... We would have to schedule a BH in the original thread
>> that will only schedule the coroutine in its new thread after it has
>> yielded.
>>
>> Maybe we should actually introduce a helper function that moves the
>> current coroutine to a different AioContext this way.
> 
> Like this:
> 
> https://repo.or.cz/qemu/kevin.git/commitdiff/ed0244ba4ac699f7e8eaf7512ff25645cf43bda2
> 

Commit looks good to me, using aio_co_reschedule_self fixes all aborts 
I've been seeing.

> The series for which I need this isn't quite ready yet, so I haven't
> sent it as a patch yet, but if it proves useful in other contexts, we
> can always commit it without the rest.
> 

I did a quick search for places where a similar pattern is used and 
found 'hw/9pfs/coth.h', where this behavior is already described (though 
the bh seems to be scheduled using the threadpool API, which I'm not 
really familiar with). All other places where qemu_coroutine_yield() is 
preceded by a aio_co_schedule() only do so in the same AioContext, which 
should be safe.

> Kevin
> 
>