[PATCH] qemu: Avoid crash in qemuStateShutdownPrepare() and qemuStateShutdownWait()

Michal Privoznik posted 1 patch 3 years, 2 months ago
Test syntax-check failed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/867dc80cdb246f1410415f32ee4e12caab41f2ec.1611308730.git.mprivozn@redhat.com
src/qemu/qemu_driver.c | 6 ++++++
1 file changed, 6 insertions(+)
[PATCH] qemu: Avoid crash in qemuStateShutdownPrepare() and qemuStateShutdownWait()
Posted by Michal Privoznik 3 years, 2 months ago
If libvirtd is sent SIGTERM while it is still initializing, it
may crash. The following scenario was observed (using 'stress' to
slow down CPU so much that the window where the problem exists is
bigger):

1) The main thread is already executing virNetDaemonRun() and is
   in virEventRunDefaultImpl().
2) The thread that's supposed to run daemonRunStateInit() is
   spawned already, but daemonRunStateInit() is in its very early
   stage (in the stack trace I see it's executing
   virIdentityGetSystem()).

If SIGTERM (or any other signal that we don't override handler
for) arrives at this point, the main thread jumps out from
virEventRunDefaultImpl() and enters virStateShutdownPrepare()
(via shutdownPrepareCb which was set earlier). This iterates
through stateShutdownPrepare() callbacks of state drivers and
reaching qemuStateShutdownPrepare() eventually only to
dereference qemu_driver. But since thread 2) has not been
scheduled/not proceeded yet, qemu_driver was not allocated yet.

Solution is simple - just check if qemu_driver is not NULL. But
doing so only in qemuStateShutdownPrepare() would push the
problem down to virStateShutdownWait(), well
qemuStateShutdownWait(). Therefore, duplicate the trick there
too.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1895359#c14
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
---
 src/qemu/qemu_driver.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index 027617deef..ca4f366323 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -1072,6 +1072,9 @@ qemuStateStop(void)
 static int
 qemuStateShutdownPrepare(void)
 {
+    if (!qemu_driver)
+        return 0;
+
     virThreadPoolStop(qemu_driver->workerPool);
     return 0;
 }
@@ -1091,6 +1094,9 @@ qemuDomainObjStopWorkerIter(virDomainObjPtr vm,
 static int
 qemuStateShutdownWait(void)
 {
+    if (!qemu_driver)
+        return 0;
+
     virDomainObjListForEach(qemu_driver->domains, false,
                             qemuDomainObjStopWorkerIter, NULL);
     virThreadPoolDrain(qemu_driver->workerPool);
-- 
2.26.2

Re: [PATCH] qemu: Avoid crash in qemuStateShutdownPrepare() and qemuStateShutdownWait()
Posted by Nikolay Shirokovskiy 3 years, 2 months ago
On Fri, Jan 22, 2021 at 12:45 PM Michal Privoznik <mprivozn@redhat.com>
wrote:

> If libvirtd is sent SIGTERM while it is still initializing, it
> may crash. The following scenario was observed (using 'stress' to
> slow down CPU so much that the window where the problem exists is
> bigger):
>
> 1) The main thread is already executing virNetDaemonRun() and is
>    in virEventRunDefaultImpl().
> 2) The thread that's supposed to run daemonRunStateInit() is
>    spawned already, but daemonRunStateInit() is in its very early
>    stage (in the stack trace I see it's executing
>    virIdentityGetSystem()).
>
> If SIGTERM (or any other signal that we don't override handler
> for) arrives at this point, the main thread jumps out from
> virEventRunDefaultImpl() and enters virStateShutdownPrepare()
> (via shutdownPrepareCb which was set earlier). This iterates
> through stateShutdownPrepare() callbacks of state drivers and
> reaching qemuStateShutdownPrepare() eventually only to
> dereference qemu_driver. But since thread 2) has not been
> scheduled/not proceeded yet, qemu_driver was not allocated yet.
>
> Solution is simple - just check if qemu_driver is not NULL. But
> doing so only in qemuStateShutdownPrepare() would push the
> problem down to virStateShutdownWait(), well
> qemuStateShutdownWait(). Therefore, duplicate the trick there
> too.
>

I guess this is a partial solution. Initialization may be in a state when
qemu_driver is initialized but qemu_driver->workerPool is still NULL
for example.

Maybe we'd better delay shutdown until initialization is finished?

Nikolay



>
> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1895359#c14
> Signed-off-by
> <https://bugzilla.redhat.com/show_bug.cgi?id=1895359#c14Signed-off-by>:
> Michal Privoznik <mprivozn@redhat.com>
> ---
>  src/qemu/qemu_driver.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> index 027617deef..ca4f366323 100644
> --- a/src/qemu/qemu_driver.c
> +++ b/src/qemu/qemu_driver.c
> @@ -1072,6 +1072,9 @@ qemuStateStop(void)
>  static int
>  qemuStateShutdownPrepare(void)
>  {
> +    if (!qemu_driver)
> +        return 0;
> +
>      virThreadPoolStop(qemu_driver->workerPool);
>      return 0;
>  }
> @@ -1091,6 +1094,9 @@ qemuDomainObjStopWorkerIter(virDomainObjPtr vm,
>  static int
>  qemuStateShutdownWait(void)
>  {
> +    if (!qemu_driver)
> +        return 0;
> +
>      virDomainObjListForEach(qemu_driver->domains, false,
>                              qemuDomainObjStopWorkerIter, NULL);
>      virThreadPoolDrain(qemu_driver->workerPool);
> --
> 2.26.2
>
>
Re: [PATCH] qemu: Avoid crash in qemuStateShutdownPrepare() and qemuStateShutdownWait()
Posted by Michal Privoznik 3 years, 2 months ago
On 1/22/21 12:09 PM, Nikolay Shirokovskiy wrote:
> On Fri, Jan 22, 2021 at 12:45 PM Michal Privoznik <mprivozn@redhat.com>
> wrote:
> 
>> If libvirtd is sent SIGTERM while it is still initializing, it
>> may crash. The following scenario was observed (using 'stress' to
>> slow down CPU so much that the window where the problem exists is
>> bigger):
>>
>> 1) The main thread is already executing virNetDaemonRun() and is
>>     in virEventRunDefaultImpl().
>> 2) The thread that's supposed to run daemonRunStateInit() is
>>     spawned already, but daemonRunStateInit() is in its very early
>>     stage (in the stack trace I see it's executing
>>     virIdentityGetSystem()).
>>
>> If SIGTERM (or any other signal that we don't override handler
>> for) arrives at this point, the main thread jumps out from
>> virEventRunDefaultImpl() and enters virStateShutdownPrepare()
>> (via shutdownPrepareCb which was set earlier). This iterates
>> through stateShutdownPrepare() callbacks of state drivers and
>> reaching qemuStateShutdownPrepare() eventually only to
>> dereference qemu_driver. But since thread 2) has not been
>> scheduled/not proceeded yet, qemu_driver was not allocated yet.
>>
>> Solution is simple - just check if qemu_driver is not NULL. But
>> doing so only in qemuStateShutdownPrepare() would push the
>> problem down to virStateShutdownWait(), well
>> qemuStateShutdownWait(). Therefore, duplicate the trick there
>> too.
>>
> 
> I guess this is a partial solution. Initialization may be in a state when
> qemu_driver is initialized but qemu_driver->workerPool is still NULL
> for example.

Yes.

> 
> Maybe we'd better delay shutdown until initialization is finished?

I'm not exactly sure how to achieve that. Do you have a hint? Also, part 
of qemu driver state init is autostarting domains (which may take ages).

Michal

Re: [PATCH] qemu: Avoid crash in qemuStateShutdownPrepare() and qemuStateShutdownWait()
Posted by Nikolay Shirokovskiy 3 years, 2 months ago
On Fri, Jan 22, 2021 at 2:24 PM Michal Privoznik <mprivozn@redhat.com>
wrote:

> On 1/22/21 12:09 PM, Nikolay Shirokovskiy wrote:
> > On Fri, Jan 22, 2021 at 12:45 PM Michal Privoznik <mprivozn@redhat.com>
> > wrote:
> >
> >> If libvirtd is sent SIGTERM while it is still initializing, it
> >> may crash. The following scenario was observed (using 'stress' to
> >> slow down CPU so much that the window where the problem exists is
> >> bigger):
> >>
> >> 1) The main thread is already executing virNetDaemonRun() and is
> >>     in virEventRunDefaultImpl().
> >> 2) The thread that's supposed to run daemonRunStateInit() is
> >>     spawned already, but daemonRunStateInit() is in its very early
> >>     stage (in the stack trace I see it's executing
> >>     virIdentityGetSystem()).
> >>
> >> If SIGTERM (or any other signal that we don't override handler
> >> for) arrives at this point, the main thread jumps out from
> >> virEventRunDefaultImpl() and enters virStateShutdownPrepare()
> >> (via shutdownPrepareCb which was set earlier). This iterates
> >> through stateShutdownPrepare() callbacks of state drivers and
> >> reaching qemuStateShutdownPrepare() eventually only to
> >> dereference qemu_driver. But since thread 2) has not been
> >> scheduled/not proceeded yet, qemu_driver was not allocated yet.
> >>
> >> Solution is simple - just check if qemu_driver is not NULL. But
> >> doing so only in qemuStateShutdownPrepare() would push the
> >> problem down to virStateShutdownWait(), well
> >> qemuStateShutdownWait(). Therefore, duplicate the trick there
> >> too.
> >>
> >
> > I guess this is a partial solution. Initialization may be in a state when
> > qemu_driver is initialized but qemu_driver->workerPool is still NULL
> > for example.
>
> Yes.
>
> >
> > Maybe we'd better delay shutdown until initialization is finished?
>
> I'm not exactly sure how to achieve that. Do you have a hint? Also, part
> of qemu driver state init is autostarting domains (which may take ages).
>

I'm thinking of adding a new variable 'initialized' to virnetdaemon. It can
be
set by call from daemonRunStateInit after initialization is finished.
And we should call shutdownPrepare/shutdownWait only when both 'quit' and
'initialized' are true. It will require another pipe pair probably to wake
up
event loop thread when 'initialized' it set to true.

As initialization can take a lot of time (autostart as you mentioned) we
can arm finishTimer right when quit is set without waiting for
initialization
to finish. This way we exit ungracefully just as in other cases when
shutdown
finishing takes too much time. Libvirtd has a goal to handle crashes at
any time so this exit should be fine.

Nikolay
Re: [PATCH] qemu: Avoid crash in qemuStateShutdownPrepare() and qemuStateShutdownWait()
Posted by Jiri Denemark 3 years, 2 months ago
On Fri, Jan 22, 2021 at 14:09:52 +0300, Nikolay Shirokovskiy wrote:
> On Fri, Jan 22, 2021 at 12:45 PM Michal Privoznik <mprivozn@redhat.com>
> wrote:
> 
> > If libvirtd is sent SIGTERM while it is still initializing, it
> > may crash. The following scenario was observed (using 'stress' to
> > slow down CPU so much that the window where the problem exists is
> > bigger):
> >
> > 1) The main thread is already executing virNetDaemonRun() and is
> >    in virEventRunDefaultImpl().
> > 2) The thread that's supposed to run daemonRunStateInit() is
> >    spawned already, but daemonRunStateInit() is in its very early
> >    stage (in the stack trace I see it's executing
> >    virIdentityGetSystem()).
> >
> > If SIGTERM (or any other signal that we don't override handler
> > for) arrives at this point, the main thread jumps out from
> > virEventRunDefaultImpl() and enters virStateShutdownPrepare()
> > (via shutdownPrepareCb which was set earlier). This iterates
> > through stateShutdownPrepare() callbacks of state drivers and
> > reaching qemuStateShutdownPrepare() eventually only to
> > dereference qemu_driver. But since thread 2) has not been
> > scheduled/not proceeded yet, qemu_driver was not allocated yet.
> >
> > Solution is simple - just check if qemu_driver is not NULL. But
> > doing so only in qemuStateShutdownPrepare() would push the
> > problem down to virStateShutdownWait(), well
> > qemuStateShutdownWait(). Therefore, duplicate the trick there
> > too.
> >
> 
> I guess this is a partial solution. Initialization may be in a state when
> qemu_driver is initialized but qemu_driver->workerPool is still NULL
> for example.
> 
> Maybe we'd better delay shutdown until initialization is finished?

The fix may not be ideal or complete, but it definitely fixes one
non-race issue (see
https://www.redhat.com/archives/libvir-list/2021-January/msg01134.html)

Even if we come up with a flag indicating whether a driver was fully
initialized or not, we still need to check whether it was initialized at
all. Unless we want to introduce yet another static variable in the
driver, but I think qemu_driver is already enough.

So I guess if you reword the commit message to say this is fixing a case
when driver initialization fails (rather than a race), you get my

Reviewed-by: Jiri Denemark <jdenemar@redhat.com>