libvirtd: fix potential deadlock when starting vm

[libvirt] [PATCH] libvirtd: fix potential deadlock when starting vm

Posted by Bingsong Si 7 years, 4 months ago

On CentOS 6, udev_monitor_receive_device will block until the socket becomes
readable, udevEventHandleThread will hold the lock all the time and
udevEventHandleCallback hard to get the lock, will block the event poll.
To fix this, set dataReady to false after receive an udev event.

Signed-off-by: Bingsong Si <owen.si@ucloud.cn>
---
 src/node_device/node_device_udev.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c
index 22897591de..ce1101d7cc 100644
--- a/src/node_device/node_device_udev.c
+++ b/src/node_device/node_device_udev.c
@@ -1616,6 +1616,7 @@ udevEventHandleThread(void *opaque ATTRIBUTE_UNUSED)
 
         errno = 0;
         device = udev_monitor_receive_device(priv->udev_monitor);
+        priv->dataReady = false;
         virObjectUnlock(priv);
 
         if (!device) {
@@ -1637,10 +1638,6 @@ udevEventHandleThread(void *opaque ATTRIBUTE_UNUSED)
                 return;
             }
 
-            virObjectLock(priv);
-            priv->dataReady = false;
-            virObjectUnlock(priv);
-
             continue;
         }
 
-- 
2.18.0


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] libvirtd: fix potential deadlock when starting vm

Posted by John Ferlan 7 years, 3 months ago


On 10/11/18 4:13 AM, Bingsong Si wrote:
> On CentOS 6, udev_monitor_receive_device will block until the socket becomes

Is this really CentOS6 only or just where you've seen it?

> readable, udevEventHandleThread will hold the lock all the time and
> udevEventHandleCallback hard to get the lock, will block the event poll.
> To fix this, set dataReady to false after receive an udev event.
> 
> Signed-off-by: Bingsong Si <owen.si@ucloud.cn>
> ---
>  src/node_device/node_device_udev.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 

I've CC'd Erik since he wrote and perhaps remembers all the "gotchas" he
discovered in the udev callback code.

I wonder if this has to do with the EAGAIN and EWOULDBLOCK @errno checks
done in the !device loop that are different in "older" (much older) code.

Although I have this very vague recollection that there was some problem
with centos6 that was fixed by some OS patch.  Hopefully Erik remembers
(and maybe we should log it in the code at this point ;-)) - I did do
some searching, but came up empty.

John

> diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c
> index 22897591de..ce1101d7cc 100644
> --- a/src/node_device/node_device_udev.c
> +++ b/src/node_device/node_device_udev.c
> @@ -1616,6 +1616,7 @@ udevEventHandleThread(void *opaque ATTRIBUTE_UNUSED)
>  
>          errno = 0;
>          device = udev_monitor_receive_device(priv->udev_monitor);
> +        priv->dataReady = false;
>          virObjectUnlock(priv);
>  
>          if (!device) {
> @@ -1637,10 +1638,6 @@ udevEventHandleThread(void *opaque ATTRIBUTE_UNUSED)
>                  return;
>              }
>  
> -            virObjectLock(priv);
> -            priv->dataReady = false;
> -            virObjectUnlock(priv);
> -
>              continue;
>          }
>  
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] libvirtd: fix potential deadlock when starting vm

Posted by Erik Skultety 7 years, 3 months ago

On Tue, Oct 16, 2018 at 05:57:17PM -0400, John Ferlan wrote:
>
>
> On 10/11/18 4:13 AM, Bingsong Si wrote:
> > On CentOS 6, udev_monitor_receive_device will block until the socket becomes
>
> Is this really CentOS6 only or just where you've seen it?
>
> > readable, udevEventHandleThread will hold the lock all the time and
> > udevEventHandleCallback hard to get the lock, will block the event poll.
> > To fix this, set dataReady to false after receive an udev event.
> >
> > Signed-off-by: Bingsong Si <owen.si@ucloud.cn>
> > ---
> >  src/node_device/node_device_udev.c | 5 +----
> >  1 file changed, 1 insertion(+), 4 deletions(-)
> >
>
> I've CC'd Erik since he wrote and perhaps remembers all the "gotchas" he
> discovered in the udev callback code.
>
> I wonder if this has to do with the EAGAIN and EWOULDBLOCK @errno checks
> done in the !device loop that are different in "older" (much older) code.
>
> Although I have this very vague recollection that there was some problem
> with centos6 that was fixed by some OS patch.  Hopefully Erik remembers
> (and maybe we should log it in the code at this point ;-)) - I did do
> some searching, but came up empty.

Remembering a year old issue, let me tell you, my head hurts :) (and we probably
should put a note somewhere, so that we don't have to dig out dinosaurs
again)...the only thing I remember is that there was a reason why I did things
this way and not the way this patch is proposing, and indeed I then found this:

https://www.redhat.com/archives/libvir-list/2017-September/msg00683.html

TL;DR:
The scheduler comes into play here. The problem I had was that the event loop
could be scheduled (and it in fact was) earlier than the handler thread here.
What that essentially means is that by the time the thread actually handled the
event and read the data from the monitor, the event loop fired the very same
event, simply because the data hadn't been retrieved from the socket at that
point yet.
This was mainly connected to the design flaw of that specific version of patch
series. With the current design, setting dataReady immediately after reading the
data or after encoutering the first EAGAIN doesn't matter and the scheduler
wouldn't have an impact either way, that's true. However, with CentOS 6 the
scheduler would still come into play even with your patch (it was much more
noticeable the more devices you had in/added into the system), you'd still
remain blocking on the recv call. The correct fix would be more
complex and IIRC it would involve pulling the monitor object out of the private
data lockable object and would need to be guarded by a separate lock (I haven't
thought about it much though, so I might be wrong).

That said, we already dropped upstream support for CentOS 6, so I'm
not really keen on "fixing" anything, unless the currently supported platforms
suffer from a related issue which would require code changes in which case we
could merge a patch like this upstream. You should upgrade your platform to a
newer CentOS if you want to rely on features provided by new(ish) libvirt.

Erik

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] libvirtd: fix potential deadlock when starting vm

Posted by John Ferlan 7 years, 3 months ago


On 10/18/18 8:23 AM, Erik Skultety wrote:
> On Tue, Oct 16, 2018 at 05:57:17PM -0400, John Ferlan wrote:
>>
>>
>> On 10/11/18 4:13 AM, Bingsong Si wrote:
>>> On CentOS 6, udev_monitor_receive_device will block until the socket becomes
>>
>> Is this really CentOS6 only or just where you've seen it?
>>
>>> readable, udevEventHandleThread will hold the lock all the time and
>>> udevEventHandleCallback hard to get the lock, will block the event poll.
>>> To fix this, set dataReady to false after receive an udev event.
>>>
>>> Signed-off-by: Bingsong Si <owen.si@ucloud.cn>
>>> ---
>>>  src/node_device/node_device_udev.c | 5 +----
>>>  1 file changed, 1 insertion(+), 4 deletions(-)
>>>
>>
>> I've CC'd Erik since he wrote and perhaps remembers all the "gotchas" he
>> discovered in the udev callback code.
>>
>> I wonder if this has to do with the EAGAIN and EWOULDBLOCK @errno checks
>> done in the !device loop that are different in "older" (much older) code.
>>
>> Although I have this very vague recollection that there was some problem
>> with centos6 that was fixed by some OS patch.  Hopefully Erik remembers
>> (and maybe we should log it in the code at this point ;-)) - I did do
>> some searching, but came up empty.
> 
> Remembering a year old issue, let me tell you, my head hurts :) (and we probably
> should put a note somewhere, so that we don't have to dig out dinosaurs
> again)...the only thing I remember is that there was a reason why I did things
> this way and not the way this patch is proposing, and indeed I then found this:
> 
> https://www.redhat.com/archives/libvir-list/2017-September/msg00683.html
> 
> TL;DR:
> The scheduler comes into play here. The problem I had was that the event loop
> could be scheduled (and it in fact was) earlier than the handler thread here.
> What that essentially means is that by the time the thread actually handled the
> event and read the data from the monitor, the event loop fired the very same
> event, simply because the data hadn't been retrieved from the socket at that
> point yet.
> This was mainly connected to the design flaw of that specific version of patch
> series. With the current design, setting dataReady immediately after reading the
> data or after encoutering the first EAGAIN doesn't matter and the scheduler
> wouldn't have an impact either way, that's true. However, with CentOS 6 the
> scheduler would still come into play even with your patch (it was much more
> noticeable the more devices you had in/added into the system), you'd still
> remain blocking on the recv call. The correct fix would be more
> complex and IIRC it would involve pulling the monitor object out of the private
> data lockable object and would need to be guarded by a separate lock (I haven't
> thought about it much though, so I might be wrong).
> 
> That said, we already dropped upstream support for CentOS 6, so I'm
> not really keen on "fixing" anything, unless the currently supported platforms
> suffer from a related issue which would require code changes in which case we
> could merge a patch like this upstream. You should upgrade your platform to a
> newer CentOS if you want to rely on features provided by new(ish) libvirt.
> 
> Erik
> 

Thanks for the details - I could support such a patch (or write it using
the above description).

The one thing I have forgotten or perhaps I should say struck me -
should this code use virCondWaitUntil or would it just not matter?

Let's say 10 devices were added and on the 10th one we had this issue,
set dataReady to false, but then didn't get another udev device ready
event for "a while" leaving that 10th one in limbo until such time a new
device was available (e.g. udevEventHandleCallback is called by udev).

Usage of the *Until would only occur if we've fallen into the !device
and continue code path and would be cleared prior before getting @device
again. Too much over thinking ;->?

John

John

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] libvirtd: fix potential deadlock when starting vm

Posted by Erik Skultety 7 years, 3 months ago

On Thu, Oct 18, 2018 at 10:16:19AM -0400, John Ferlan wrote:
>
>
> On 10/18/18 8:23 AM, Erik Skultety wrote:
> > On Tue, Oct 16, 2018 at 05:57:17PM -0400, John Ferlan wrote:
> >>
> >>
> >> On 10/11/18 4:13 AM, Bingsong Si wrote:
> >>> On CentOS 6, udev_monitor_receive_device will block until the socket becomes
> >>
> >> Is this really CentOS6 only or just where you've seen it?
> >>
> >>> readable, udevEventHandleThread will hold the lock all the time and
> >>> udevEventHandleCallback hard to get the lock, will block the event poll.
> >>> To fix this, set dataReady to false after receive an udev event.
> >>>
> >>> Signed-off-by: Bingsong Si <owen.si@ucloud.cn>
> >>> ---
> >>>  src/node_device/node_device_udev.c | 5 +----
> >>>  1 file changed, 1 insertion(+), 4 deletions(-)
> >>>
> >>
> >> I've CC'd Erik since he wrote and perhaps remembers all the "gotchas" he
> >> discovered in the udev callback code.
> >>
> >> I wonder if this has to do with the EAGAIN and EWOULDBLOCK @errno checks
> >> done in the !device loop that are different in "older" (much older) code.
> >>
> >> Although I have this very vague recollection that there was some problem
> >> with centos6 that was fixed by some OS patch.  Hopefully Erik remembers
> >> (and maybe we should log it in the code at this point ;-)) - I did do
> >> some searching, but came up empty.
> >
> > Remembering a year old issue, let me tell you, my head hurts :) (and we probably
> > should put a note somewhere, so that we don't have to dig out dinosaurs
> > again)...the only thing I remember is that there was a reason why I did things
> > this way and not the way this patch is proposing, and indeed I then found this:
> >
> > https://www.redhat.com/archives/libvir-list/2017-September/msg00683.html
> >
> > TL;DR:
> > The scheduler comes into play here. The problem I had was that the event loop
> > could be scheduled (and it in fact was) earlier than the handler thread here.
> > What that essentially means is that by the time the thread actually handled the
> > event and read the data from the monitor, the event loop fired the very same
> > event, simply because the data hadn't been retrieved from the socket at that
> > point yet.
> > This was mainly connected to the design flaw of that specific version of patch
> > series. With the current design, setting dataReady immediately after reading the
> > data or after encoutering the first EAGAIN doesn't matter and the scheduler
> > wouldn't have an impact either way, that's true. However, with CentOS 6 the
> > scheduler would still come into play even with your patch (it was much more
> > noticeable the more devices you had in/added into the system), you'd still
> > remain blocking on the recv call. The correct fix would be more
> > complex and IIRC it would involve pulling the monitor object out of the private
> > data lockable object and would need to be guarded by a separate lock (I haven't
> > thought about it much though, so I might be wrong).
> >
> > That said, we already dropped upstream support for CentOS 6, so I'm
> > not really keen on "fixing" anything, unless the currently supported platforms
> > suffer from a related issue which would require code changes in which case we
> > could merge a patch like this upstream. You should upgrade your platform to a
> > newer CentOS if you want to rely on features provided by new(ish) libvirt.
> >
> > Erik
> >
>
> Thanks for the details - I could support such a patch (or write it using
> the above description).
>
> The one thing I have forgotten or perhaps I should say struck me -
> should this code use virCondWaitUntil or would it just not matter?
>
> Let's say 10 devices were added and on the 10th one we had this issue,
> set dataReady to false, but then didn't get another udev device ready
> event for "a while" leaving that 10th one in limbo until such time a new
> device was available (e.g. udevEventHandleCallback is called by udev).

Can you be more specific on how we could utilize virCondWaitUntil? Because the
way I see it, it wouldn't help us in either of the cases because the problem
isn't the thread waiting to be woken up to process an event rather it's the
recvmsg syscall in libudev which would cause the issue on old CentOS.
Looking at your example, if you put virCondWaitUntil in the !device conditional
path, how would that prevent us to hang after processing the last device if we
never got a chance to even hit the !device code path, since we kept getting
legitimate events until the last device where we got several events for a
single piece of data? It very well might be the jetlag speaking for me, but I
just don't see it in your example.

Erik

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] libvirtd: fix potential deadlock when starting vm

Posted by John Ferlan 7 years, 3 months ago

[...]

>>
>> The one thing I have forgotten or perhaps I should say struck me -
>> should this code use virCondWaitUntil or would it just not matter?
>>
>> Let's say 10 devices were added and on the 10th one we had this issue,
>> set dataReady to false, but then didn't get another udev device ready
>> event for "a while" leaving that 10th one in limbo until such time a new
>> device was available (e.g. udevEventHandleCallback is called by udev).
> 
> Can you be more specific on how we could utilize virCondWaitUntil? Because the
> way I see it, it wouldn't help us in either of the cases because the problem
> isn't the thread waiting to be woken up to process an event rather it's the
> recvmsg syscall in libudev which would cause the issue on old CentOS.
> Looking at your example, if you put virCondWaitUntil in the !device conditional
> path, how would that prevent us to hang after processing the last device if we
> never got a chance to even hit the !device code path, since we kept getting
> legitimate events until the last device where we got several events for a
> single piece of data? It very well might be the jetlag speaking for me, but I
> just don't see it in your example.
> 
> Erik
> 

I guess it was the "getting of legitimate events" that I couldn't recall
the exact workings. My "theory" was we got a group of 10 devices, failed
on the last one, and didn't get another call for let's say hours. Then
we have this device in limbo just because udev told us too soon an we
acted, but the scheduler had other ideas vis-a-vis filling in the
@device.  I'm fine with things as they are still.

More details if you care to read... First, I think the theory of
thinking about using Until processing works better than the actual
practice of coding it - Until would requiring using virTimeMillisNow
which can fail so we need to account for that...

My general thought was without writing a lick of code was:

    bool timedWait = false;
    int rc;

...

    while (!priv->dataReady && !priv->threadQuit) {

    errno = 0;
    if (!timedWait)
        rc = virCondWait(...)
    else
        rc = virCondWaitUntil(...);

    if (rc < 0 && errno != ETIMEDOUT) {
        virReportSystemError ...
    }

    timedWait = false;

...

    if (!device) {
...
            virObjectLock(priv);
            priv->dataReady = false;
            virObjectUnlock(priv);
            timedWait = true;
...


IOW: Only use the timedWait when/if !device caused the "continue;". I
also note now that the virCondWait error check doesn't check < 0, just
that the return is not 0... Maybe that was copied from virFDStreamThread
since other callers check < 0.

Once I started writing code I realized the virTimeMillisNow issue, so
then things would become:

            errno = 0;
            if (timedWait) {
                /* If we fail this, then just use virCondWait */
                if (virTimeMillisNow(&now) < 0) {
                    timedWait = false;
                } else {
                    when = now + 1000ull;
                    rc = virCondWait(&priv->threadCond,
                                     &priv->parent.lock,
                                     when);
                }
            }

            if (!timedWait)
                rc = virCondWait(&priv->threadCond, &priv->parent.lock);

            if (rc < 0 && errno != ETIMEDOUT) {

Much uglier and scarier than I first thought.

John

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] libvirtd: fix potential deadlock when starting vm

Posted by Erik Skultety 7 years, 3 months ago

On Thu, Oct 18, 2018 at 12:04:55PM -0400, John Ferlan wrote:
> [...]
>
> >>
> >> The one thing I have forgotten or perhaps I should say struck me -
> >> should this code use virCondWaitUntil or would it just not matter?
> >>
> >> Let's say 10 devices were added and on the 10th one we had this issue,
> >> set dataReady to false, but then didn't get another udev device ready
> >> event for "a while" leaving that 10th one in limbo until such time a new
> >> device was available (e.g. udevEventHandleCallback is called by udev).
> >
> > Can you be more specific on how we could utilize virCondWaitUntil? Because the
> > way I see it, it wouldn't help us in either of the cases because the problem
> > isn't the thread waiting to be woken up to process an event rather it's the
> > recvmsg syscall in libudev which would cause the issue on old CentOS.
> > Looking at your example, if you put virCondWaitUntil in the !device conditional
> > path, how would that prevent us to hang after processing the last device if we
> > never got a chance to even hit the !device code path, since we kept getting
> > legitimate events until the last device where we got several events for a
> > single piece of data? It very well might be the jetlag speaking for me, but I
> > just don't see it in your example.
> >
> > Erik
> >
>
> I guess it was the "getting of legitimate events" that I couldn't recall
> the exact workings. My "theory" was we got a group of 10 devices, failed
> on the last one, and didn't get another call for let's say hours. Then
> we have this device in limbo just because udev told us too soon an we
> acted, but the scheduler had other ideas vis-a-vis filling in the
> @device.  I'm fine with things as they are still.

Aha, now I see your thoughts. So, what you're mentioning was a different, kinda
related issue with mdevs where the sysfs tree wasn't completely ready at the
time of emission of the event. However, we need to forget about that udev might
tell us about the device too soon, because we technically got the device, it's
just the device is not not fully ready, but that's a different piece of code
handling that situation.
With your code however, you wouldn't really remedy the problem, because on
CentOS 6 the !device branch is essentially dead code and on CentOS 7 the code
is just unnecessarily more complex, less readable and doesn't really change
(let alone help) the status quo. Thanks for sharing the details though,
it helped me understand your thoughts.

Erik

>
> More details if you care to read... First, I think the theory of
> thinking about using Until processing works better than the actual
> practice of coding it - Until would requiring using virTimeMillisNow
> which can fail so we need to account for that...
>
> My general thought was without writing a lick of code was:
>
>     bool timedWait = false;
>     int rc;
>
> ...
>
>     while (!priv->dataReady && !priv->threadQuit) {
>
>     errno = 0;
>     if (!timedWait)
>         rc = virCondWait(...)
>     else
>         rc = virCondWaitUntil(...);
>
>     if (rc < 0 && errno != ETIMEDOUT) {
>         virReportSystemError ...
>     }
>
>     timedWait = false;
>
> ...
>
>     if (!device) {
> ...
>             virObjectLock(priv);
>             priv->dataReady = false;
>             virObjectUnlock(priv);
>             timedWait = true;
> ...
>
>
> IOW: Only use the timedWait when/if !device caused the "continue;". I
> also note now that the virCondWait error check doesn't check < 0, just
> that the return is not 0... Maybe that was copied from virFDStreamThread
> since other callers check < 0.
>
> Once I started writing code I realized the virTimeMillisNow issue, so
> then things would become:
>
>             errno = 0;
>             if (timedWait) {
>                 /* If we fail this, then just use virCondWait */
>                 if (virTimeMillisNow(&now) < 0) {
>                     timedWait = false;
>                 } else {
>                     when = now + 1000ull;
>                     rc = virCondWait(&priv->threadCond,
>                                      &priv->parent.lock,
>                                      when);
>                 }
>             }
>
>             if (!timedWait)
>                 rc = virCondWait(&priv->threadCond, &priv->parent.lock);
>
>             if (rc < 0 && errno != ETIMEDOUT) {
>
> Much uglier and scarier than I first thought.
>
> John

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list