[Qemu-devel] [PATCH v2 0/5] Fix socket chardev regression

Marc-André Lureau posted 5 patches 7 years, 2 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20180823143125.16767-1-marcandre.lureau@redhat.com
Test docker-clang@ubuntu failed
Test checkpatch passed
chardev/char-socket.c | 86 ++++++++++++++++++-------------------------
tests/test-char.c     | 25 +++++++++++--
2 files changed, 57 insertions(+), 54 deletions(-)
[Qemu-devel] [PATCH v2 0/5] Fix socket chardev regression
Posted by Marc-André Lureau 7 years, 2 months ago
Hi,

In commit 25679e5d58e "chardev: tcp: postpone async connection setup"
(and its follow up 99f2f54174a59), Peter moved chardev socket
connection to machine_done event. However, chardev created later will
no longer attempt to connect, and chardev created in tests do not have
machine_done event (breaking some of vhost-user-test).

The goal was to move the "connect" source to the chardev frontend
context (the monitor thread context in his case). chr->gcontext is set
with qemu_chr_fe_set_handlers(). But there is no guarantee that the
function will be called in general, so we can't delay connection until
then: the chardev should still attempt to connect during open(), using
the main context.

An alternative would be to specify the iothread during chardev
creation. Setting up monitor OOB would be quite different too, it
would take the same iothread as argument.

99f2f54174a595e is also a bit problematic, since it will behave
differently before and after machine_done (the first case gives a
chance to use a different context reliably, the second looks racy)

In the end, I am not sure this is all necessary, as chardev callbacks
are called after qemu_chr_fe_set_handlers(), at which point the
context of sources are updated. In "char-socket: update all ioc
handlers when changing context", I moved also the hup handler to the
updated context. So unless the main thread is already stuck, we can
setup a different context for the chardev at that time. Or not?

v2:
- fix a random socket chardev test failure

Marc-André Lureau (5):
  Revert "chardev: tcp: postpone TLS work until machine done"
  Revert "chardev: tcp: postpone async connection setup"
  char-socket: update all ioc handlers when changing context
  test-char: fix random socket test failure
  test-char: add socket reconnect test

 chardev/char-socket.c | 86 ++++++++++++++++++-------------------------
 tests/test-char.c     | 25 +++++++++++--
 2 files changed, 57 insertions(+), 54 deletions(-)

-- 
2.18.0.547.g1d89318c48


Re: [Qemu-devel] [PATCH v2 0/5] Fix socket chardev regression
Posted by Peter Xu 7 years, 2 months ago
On Thu, Aug 23, 2018 at 04:31:20PM +0200, Marc-André Lureau wrote:
> Hi,
> 
> In commit 25679e5d58e "chardev: tcp: postpone async connection setup"
> (and its follow up 99f2f54174a59), Peter moved chardev socket
> connection to machine_done event. However, chardev created later will
> no longer attempt to connect, and chardev created in tests do not have
> machine_done event (breaking some of vhost-user-test).
> 
> The goal was to move the "connect" source to the chardev frontend
> context (the monitor thread context in his case). chr->gcontext is set
> with qemu_chr_fe_set_handlers(). But there is no guarantee that the
> function will be called in general, so we can't delay connection until
> then: the chardev should still attempt to connect during open(), using
> the main context.
> 
> An alternative would be to specify the iothread during chardev
> creation. Setting up monitor OOB would be quite different too, it
> would take the same iothread as argument.
> 
> 99f2f54174a595e is also a bit problematic, since it will behave
> differently before and after machine_done (the first case gives a
> chance to use a different context reliably, the second looks racy)
> 
> In the end, I am not sure this is all necessary, as chardev callbacks
> are called after qemu_chr_fe_set_handlers(), at which point the
> context of sources are updated. In "char-socket: update all ioc
> handlers when changing context", I moved also the hup handler to the
> updated context. So unless the main thread is already stuck, we can
> setup a different context for the chardev at that time. Or not?
> 
> v2:
> - fix a random socket chardev test failure

I have no problem on patch 3-5, but aren't patch 1-2 still breaking
the context switch of chardev?  Or did I misunderstood?

Regards,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH v2 0/5] Fix socket chardev regression
Posted by Marc-André Lureau 7 years, 2 months ago
Hi

On Fri, Aug 24, 2018 at 5:45 AM, Peter Xu <peterx@redhat.com> wrote:
> On Thu, Aug 23, 2018 at 04:31:20PM +0200, Marc-André Lureau wrote:
>> Hi,
>>
>> In commit 25679e5d58e "chardev: tcp: postpone async connection setup"
>> (and its follow up 99f2f54174a59), Peter moved chardev socket
>> connection to machine_done event. However, chardev created later will
>> no longer attempt to connect, and chardev created in tests do not have
>> machine_done event (breaking some of vhost-user-test).
>>
>> The goal was to move the "connect" source to the chardev frontend
>> context (the monitor thread context in his case). chr->gcontext is set
>> with qemu_chr_fe_set_handlers(). But there is no guarantee that the
>> function will be called in general, so we can't delay connection until
>> then: the chardev should still attempt to connect during open(), using
>> the main context.
>>
>> An alternative would be to specify the iothread during chardev
>> creation. Setting up monitor OOB would be quite different too, it
>> would take the same iothread as argument.
>>
>> 99f2f54174a595e is also a bit problematic, since it will behave
>> differently before and after machine_done (the first case gives a
>> chance to use a different context reliably, the second looks racy)
>>
>> In the end, I am not sure this is all necessary, as chardev callbacks
>> are called after qemu_chr_fe_set_handlers(), at which point the
>> context of sources are updated. In "char-socket: update all ioc
>> handlers when changing context", I moved also the hup handler to the
>> updated context. So unless the main thread is already stuck, we can
>> setup a different context for the chardev at that time. Or not?
>>
>> v2:
>> - fix a random socket chardev test failure
>
> I have no problem on patch 3-5, but aren't patch 1-2 still breaking
> the context switch of chardev?  Or did I misunderstood?

Switching context is broken/racy regardless. My current plan is to
suspend the context to switch to, and update all sources before
thawing the target context.

If you prefer we can delay the revert, but I would rather have them
because the regression seems much worse to me.

>
> Regards,
>
> --
> Peter Xu

Re: [Qemu-devel] [PATCH v2 0/5] Fix socket chardev regression
Posted by Peter Xu 7 years, 2 months ago
On Fri, Aug 24, 2018 at 10:32:58AM +0200, Marc-André Lureau wrote:
> Hi
> 
> On Fri, Aug 24, 2018 at 5:45 AM, Peter Xu <peterx@redhat.com> wrote:
> > On Thu, Aug 23, 2018 at 04:31:20PM +0200, Marc-André Lureau wrote:
> >> Hi,
> >>
> >> In commit 25679e5d58e "chardev: tcp: postpone async connection setup"
> >> (and its follow up 99f2f54174a59), Peter moved chardev socket
> >> connection to machine_done event. However, chardev created later will
> >> no longer attempt to connect, and chardev created in tests do not have
> >> machine_done event (breaking some of vhost-user-test).
> >>
> >> The goal was to move the "connect" source to the chardev frontend
> >> context (the monitor thread context in his case). chr->gcontext is set
> >> with qemu_chr_fe_set_handlers(). But there is no guarantee that the
> >> function will be called in general, so we can't delay connection until
> >> then: the chardev should still attempt to connect during open(), using
> >> the main context.
> >>
> >> An alternative would be to specify the iothread during chardev
> >> creation. Setting up monitor OOB would be quite different too, it
> >> would take the same iothread as argument.
> >>
> >> 99f2f54174a595e is also a bit problematic, since it will behave
> >> differently before and after machine_done (the first case gives a
> >> chance to use a different context reliably, the second looks racy)
> >>
> >> In the end, I am not sure this is all necessary, as chardev callbacks
> >> are called after qemu_chr_fe_set_handlers(), at which point the
> >> context of sources are updated. In "char-socket: update all ioc
> >> handlers when changing context", I moved also the hup handler to the
> >> updated context. So unless the main thread is already stuck, we can
> >> setup a different context for the chardev at that time. Or not?
> >>
> >> v2:
> >> - fix a random socket chardev test failure
> >
> > I have no problem on patch 3-5, but aren't patch 1-2 still breaking
> > the context switch of chardev?  Or did I misunderstood?
> 
> Switching context is broken/racy regardless. My current plan is to
> suspend the context to switch to, and update all sources before
> thawing the target context.
> 
> If you prefer we can delay the revert, but I would rather have them
> because the regression seems much worse to me.

Have you started working on above?  I started to code up some RFC
patch to let the chardev maintain the iothreads (now it'll only have a
monitor iothread) then we don't need to switch context for chardev any
more.  I plan to post it today for a quick look.  With that, this
series will not break anything then.

> 
> >
> > Regards,
> >
> > --
> > Peter Xu

Regards,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH v2 0/5] Fix socket chardev regression
Posted by Marc-André Lureau 7 years, 2 months ago
Hi

On Fri, Aug 24, 2018 at 10:45 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Fri, Aug 24, 2018 at 10:32:58AM +0200, Marc-André Lureau wrote:
> > Hi
> >
> > On Fri, Aug 24, 2018 at 5:45 AM, Peter Xu <peterx@redhat.com> wrote:
> > > On Thu, Aug 23, 2018 at 04:31:20PM +0200, Marc-André Lureau wrote:
> > >> Hi,
> > >>
> > >> In commit 25679e5d58e "chardev: tcp: postpone async connection setup"
> > >> (and its follow up 99f2f54174a59), Peter moved chardev socket
> > >> connection to machine_done event. However, chardev created later will
> > >> no longer attempt to connect, and chardev created in tests do not have
> > >> machine_done event (breaking some of vhost-user-test).
> > >>
> > >> The goal was to move the "connect" source to the chardev frontend
> > >> context (the monitor thread context in his case). chr->gcontext is set
> > >> with qemu_chr_fe_set_handlers(). But there is no guarantee that the
> > >> function will be called in general, so we can't delay connection until
> > >> then: the chardev should still attempt to connect during open(), using
> > >> the main context.
> > >>
> > >> An alternative would be to specify the iothread during chardev
> > >> creation. Setting up monitor OOB would be quite different too, it
> > >> would take the same iothread as argument.
> > >>
> > >> 99f2f54174a595e is also a bit problematic, since it will behave
> > >> differently before and after machine_done (the first case gives a
> > >> chance to use a different context reliably, the second looks racy)
> > >>
> > >> In the end, I am not sure this is all necessary, as chardev callbacks
> > >> are called after qemu_chr_fe_set_handlers(), at which point the
> > >> context of sources are updated. In "char-socket: update all ioc
> > >> handlers when changing context", I moved also the hup handler to the
> > >> updated context. So unless the main thread is already stuck, we can
> > >> setup a different context for the chardev at that time. Or not?
> > >>
> > >> v2:
> > >> - fix a random socket chardev test failure
> > >
> > > I have no problem on patch 3-5, but aren't patch 1-2 still breaking
> > > the context switch of chardev?  Or did I misunderstood?
> >
> > Switching context is broken/racy regardless. My current plan is to
> > suspend the context to switch to, and update all sources before
> > thawing the target context.
> >
> > If you prefer we can delay the revert, but I would rather have them
> > because the regression seems much worse to me.
>
> Have you started working on above?  I started to code up some RFC
> patch to let the chardev maintain the iothreads (now it'll only have a
> monitor iothread) then we don't need to switch context for chardev any
> more.  I plan to post it today for a quick look.  With that, this
> series will not break anything then.

No, I haven't really started the code, but spent some time thinking...

Ok, if you manage to put together that proposal, that sounds interesting.

Nevertheless, I can work on improving/fixing the race when switching
contexts (already used by colo)

>
> >
> > >
> > > Regards,
> > >
> > > --
> > > Peter Xu
>
> Regards,
>
> --
> Peter Xu
>


-- 
Marc-André Lureau

Re: [Qemu-devel] [PATCH v2 0/5] Fix socket chardev regression
Posted by Peter Xu 7 years, 2 months ago
On Fri, Aug 24, 2018 at 10:51:18AM +0200, Marc-André Lureau wrote:
> Hi
> 
> On Fri, Aug 24, 2018 at 10:45 AM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Fri, Aug 24, 2018 at 10:32:58AM +0200, Marc-André Lureau wrote:
> > > Hi
> > >
> > > On Fri, Aug 24, 2018 at 5:45 AM, Peter Xu <peterx@redhat.com> wrote:
> > > > On Thu, Aug 23, 2018 at 04:31:20PM +0200, Marc-André Lureau wrote:
> > > >> Hi,
> > > >>
> > > >> In commit 25679e5d58e "chardev: tcp: postpone async connection setup"
> > > >> (and its follow up 99f2f54174a59), Peter moved chardev socket
> > > >> connection to machine_done event. However, chardev created later will
> > > >> no longer attempt to connect, and chardev created in tests do not have
> > > >> machine_done event (breaking some of vhost-user-test).
> > > >>
> > > >> The goal was to move the "connect" source to the chardev frontend
> > > >> context (the monitor thread context in his case). chr->gcontext is set
> > > >> with qemu_chr_fe_set_handlers(). But there is no guarantee that the
> > > >> function will be called in general, so we can't delay connection until
> > > >> then: the chardev should still attempt to connect during open(), using
> > > >> the main context.
> > > >>
> > > >> An alternative would be to specify the iothread during chardev
> > > >> creation. Setting up monitor OOB would be quite different too, it
> > > >> would take the same iothread as argument.
> > > >>
> > > >> 99f2f54174a595e is also a bit problematic, since it will behave
> > > >> differently before and after machine_done (the first case gives a
> > > >> chance to use a different context reliably, the second looks racy)
> > > >>
> > > >> In the end, I am not sure this is all necessary, as chardev callbacks
> > > >> are called after qemu_chr_fe_set_handlers(), at which point the
> > > >> context of sources are updated. In "char-socket: update all ioc
> > > >> handlers when changing context", I moved also the hup handler to the
> > > >> updated context. So unless the main thread is already stuck, we can
> > > >> setup a different context for the chardev at that time. Or not?
> > > >>
> > > >> v2:
> > > >> - fix a random socket chardev test failure
> > > >
> > > > I have no problem on patch 3-5, but aren't patch 1-2 still breaking
> > > > the context switch of chardev?  Or did I misunderstood?
> > >
> > > Switching context is broken/racy regardless. My current plan is to
> > > suspend the context to switch to, and update all sources before
> > > thawing the target context.
> > >
> > > If you prefer we can delay the revert, but I would rather have them
> > > because the regression seems much worse to me.
> >
> > Have you started working on above?  I started to code up some RFC
> > patch to let the chardev maintain the iothreads (now it'll only have a
> > monitor iothread) then we don't need to switch context for chardev any
> > more.  I plan to post it today for a quick look.  With that, this
> > series will not break anything then.
> 
> No, I haven't really started the code, but spent some time thinking...
> 
> Ok, if you manage to put together that proposal, that sounds interesting.

Not really your iothread proposal; I tried to keep the interface
unchanged, but move the iothread ownership from the monitor code to
chardev code.  Please have a look at:

  [RFC 0/3] chardev: introduce chardev contexts

> 
> Nevertheless, I can work on improving/fixing the race when switching
> contexts (already used by colo)

Ouch, is colo changing context too?  I hope that series won't break it.

(Ok if so I believe it'll very possible to break it...)

Regards,

-- 
Peter Xu